sankalp's blog

[Thread Summary] On learning ML in 2024

Version 2: 28th June (edit: Update to recommendation section)

Version 1: 14th June

ML Twitter

(People use ML and deep learning interchangeably when they are not the same. For the sake of consistency, I use the word ML)

I like to see several students learning ML and posting about on Twitter. They post their progress updates, hype and follow each other. It's good I think, you can learn from your n + 1 counterparts instead of following to the top brass.

I have seen several people gain a lot of followers over a short span of time by consistently posting updates and sometimes projects that are in the discourse (like replicating something in C because Andrej Karpathy has been doing). I have seen low-level implementations get lot of attention (survivorship bias alert).

I still find it cute and funny to see people learn ML from scratch - like someone posts I learnt about hyperparameter tuning today and it makes me crack. Probably because there is such a long learning curve to make it to transformers. These guys also remind me of my days when I was learning the foundations (I don't know how much I even remember but the intuition is still there)

I tweeted about this and it blew up. In good will, I will summarise some insightful points and resources accumulated from the thread in this post. Don't take this post as prescription, take it as useful data points.

it's funny to see people learning ml from scratch via fast dot ai. it's such a long learning curve to transformers. linear regression, logistic regression, grad descent, neural net, mlps, rnn, lstm, rnn with attention, causal self-attention, decoder-decoder arch, finally gpt

— sankalp (@dejavucoder) June 10, 2024

sankalp's blog

A bunch of my mutuals agreed that it's better to learn from both the sides. Build something with help of LLMs (literally API calls and open source models accessed via libraries). Learn to prompt. Study to fill the gaps. Learn bottom up way as well building from the foundations.

Remember that you can learn a lot of things on the go. If you get stuck, ask for help or revisit after sometime, explore other concepts.


I think, learning fast dot ai from the side or during free times to cover up your intermediate / any vague points will be much helpful rather than learning from zero.

people can directly start from genai, if that's their fav field, but not to forget to keep learning the insider…

— Manish Sharma 📊 - e/acc 🤖 (@lucifer_x007) June 10, 2024

sankalp's blog


main_horse advice

main is one of the best ml people and generally gives good advice too

sankalp's blog

Find and learn what you find interesting and just do stuff. As you learn and build projects, you will find more direction too.

Resources mentioned for reference

Some resources that I mention here are not in the thread.

So apparently xjdr who is a guy at a top research lab is recommending Andrej Karpathy's course from scratch to people. It does make sense though as you can straight up learn things from transformers. It will be a bit shallow. (Learn up on design intuition like skip connections from CNN and sequence to sequence architecture)

Screenshot 2024-06-28 at 3

For pre-transformer stuff

fast.ai is one of the top resources still.

I had learnt deep learning basics (note not machine learning) via Andrew NG's deeplearning.ai on Coursera.

Stanford channel has several lectures like CS 231N (CNNs), Chriss Manning's NLP

There are blogs by Chris Olah, Jay Alammar. You literally just need to search on Youtube and there are tonnes of resources.

Frameworks of interest: Pytorch, MLX, Jax

Beyond Transformers

I recommend Stanford CS 25 (v3 is complete and v4 is airing)

Another great resource that rounds up a lot of the extensions of transformers and hacking around with LLMs is https://genai-handbook.github.io/ by by @willccbb

Update 24th June

Once you are up to date with basics of LLMs (pre-training, transformers, fine-tuning, GPT), you can move onto learning about more architectures like Llama3 (rotary embeddings), Mistral, extensions of self-attention like Grouped query attention. Read about Mixture of experts. Follow people on AI twitter - like this list. There are some people like Teortaxes who mostly follow other AI researchers and hackers. People who don't post as much as hang out in his replies. Another recommendation: TDM