Note: This post first appeared on the H2O.ai blog
Notes and learnings from the GrandMaster Panel at H2OWorld
Personally, I’m a firm believer and fan of Kaggle and definitely look at it as the home of Data Science. Kaggle Grandmasters are the heroes of Kaggle or definitely mine. I’ve been on a pursuit to depict and understand their journey into the field also if they’re still humans or have passed onto an alternate reality (not still sure about that one).
H2O World event recently had the biggest Kaggle Grandmaster Panel. This post will share my takeaways from the panel discussions along with a few notes from my previous interviews. Fun fact: I’ve interviewed all of the grandmasters on the panel that wear spectacles 😎
Note to the reader, most of the notes include comments from the Grandmasters with added context for readability.
The questions were asked by Arno Candel, the same are represented here as headings.
The panel consisted of 10/13 of H2O’s Grandmasters.
So, you thought cool ML Engineers work with models?
I get to talk to Kaggle GMs every day at work 😛.
Sorry, I had to say that (I still pinch myself every day), back to the GMs.
An “Avengers Assemble” Moment from the video, where every GM introduced themself and their strengths:
It really speaks about the passion of the best Kagglers, the majority of the panel agreed about spending a significant few hours, even half of their days on Kaggle.
Kaggle is really a great learning platform if you are willing to put in the hours. GM Mark Landry, calls it “Homework” - you take an assignment home, work on it for a few months and then realize you couldn’t do as good as other competitors. This leads you to improve your skills and closing your gaps.
“The learnings are unlike any classroom or book, you won’t find the knowledge anywhere that you could by competing on Kaggle” — GM Babakhin
More than often, you’d team up and you’d end up working with a team of people that you wouldn’t have met and remotely contributing, GM Pavel says teaming up remotely and pushing a team to it’s best was one of his favorite takeaways.
After competing on a multitude of competitions and spending the insane amount of hours on the problems, you find common patterns in the problems and start to think in a more structured fashion when approaching these problems. Critical thinking and breaking these problems into steps, getting creative with every step is a takeaway for GM Shivam, SRK.
If you’d like to know more about Shivam and SRK’s journey check out their complete interviews here.
“You also learn one of the most important real-world skills: Making models that generalize well” To Quote GM Olivier.
Kaggle Competitions are like a Game of PUBG where everyone starts from scratch but the seasoned Kagglers know where to find the loot. To me, it feels like a race where noobs (Myself and alike) are running barefoot and the Kaggle GMs and Masters just whooze past us in their supercars of knowledge.
Babakhin says:
Competitions are more like a marathon than a sprint so you should be prepared to run a lot of ideas, a lot of which will fail and you should be prepared for that.
“Data Science is all about the data and modeling, you really need to understand how to validate your data and the rest follows after that”, according to GM Dmitry
If you’re chasing the win, you’d want to ooze out every single digit to get to the top of the leaderboard, this would require building a lot of models and would require you to have the right ensembling strategy in place — added GM Shivam
Dmitry and Mark agreed on the point that deep learning could you help with modeling but in terms of automatically creating features, validating ideas, specifically to Kaggle validating an idea and thinking critically if the feature will reflect on Kaggle’s Private Leaderboard- Deep Learning may not be able to do that.
The answers differed here depending on the particular kaggler’s style or as GM Olivier pointed out it might also depend on how far away is the competition end, that would affect if he would take a relaxed or more serious approach. Each avenger has their own fighting style though, right?
Kim would spend a lot of the time initially on feature engineering and focus on modeling towards the end of the competition.
Rohan, would focus on just 1 competition at a time and run multiple experiments in parallel.
Rohan Rao uses the help of Driverless AI for a lot of FE now 😁
At one point in time, a smart person would have a great library as a wealth of knowledge. In 2019, a smart programmer has a rich library of code.
For Dmitry, it depends if the competition is similar to one he has competed in earlier. According to him, most grandmasters have pre-ready scripts that they can leverage.
As GM Olivier mentioned, everyone would have sometime made a submission on kaggle that they regret.
Note to the reader: When you compete on Kaggle, your final rankings are evaluated on a private leaderboard (which is the true rank). To get your rank, you are required to select your final submission at the end.
Olivier shared his takeaway from such an experience which made him think about generalizing better than just focussing on a public Leaderboard.
Mark shared a battle story from a competition where a public kernel that looked promising could have cost his team to lose a lot of positions.
Rohan advises looking at outliers, based on a competition where removing just ONE outlier would have landed him 1st position.
GM Babakhin also had a very interesting battle story where he just missed the submission deadline by 10 seconds. I can only imagine the adrenaline rush. 😃
This also speaks to the dedication or 10,000 rule broadly speaking. As Dmitry says, generally any skill require a lot of time, dedication and focus. 📓
Pavel suggests focussing time on writing quality code. 📝
Most Underrated skills that Grandmasters have?
Even personally, my recent favorite quote was by Rohan:
“Kaggle is my favorite second Full-Time Job but it comes at a Sacrifice”
From the panel, he added, Management of time is crucial. A lot of things happen on and off Kaggle.
If it wasn’t obvious, I’m kidding but do check out the channel or podcast, you can expect interviews from all of the GMs on the panel soon 😉
For Pavel, his favorite course is fast.ai, which is one of the rare courses that always stays at the cutting-edge of Tech.
Shivam and SRK use Twitter where they follow top researchers and practitioners from the field, along with a few blogs.
If you’d like to find my favorite practitioners on twitter, you can subscribe to my list.
For Mark, he likes to work on a problem with a person from the start till the end. Many GMs are more strategic about teaming up towards the end, bringing more models, etc.
For Branden, the reason was a drive to be working on true data science products and made a switch of industries.
For Shivam, The vision and products of the company are one of the best in industry.
A unanimous agreement being that the company has THE STRONGEST DATA SCIENCE, TEAM. 😎 Even Kaggle doesn’t allow you to team up with 13 Grandmasters on any competition (as per rules)
So what were my new impressions and takeaways after the interview?
Here are a few qualities of the panel that inspired me and even Kaggle:
Finally, the spirit of Kaggle-ing. There is a lot to be learned and gained by competing, engaging in forums and teaming up with people half-way across the world. At the end of the day, Kaggle is the home of Data Science and it has to be one of the greatest learning platforms on there.
If you’d like to check out interviews with Top Practitioners, Researchers and Kagglers about their Journey. Check out the Chai Time Data Science Podcast. Available both as audio and video.
You can find me on Twitter @bhutanisanyam1
Subscribe to my Newsletter for a weekly curated list of Deep Learning and Computer Vision Reads