sankalp's blog

Learning LLM optimization notes (WIP)

Inference

To be read

General thoughts

VLLM docs colossal ai medusa sglang

you will find basically all the concepts in the docs of these websites. i think understanding stuff is important but maybe, just maybe i will be good enough if i know how to apply stuff.

Speculative decoding

I am referring to this video for overview of kv cache, continous batching, speculative decoding (using draft model and target model, medusa, n-grams method)

Another good read which I understood like 50% on my first read is the Pytorch's hitchhikers-guide-speculative-decoding

What is a kernel

Screenshot 2024-08-26 at 4

Kernels is too low level stuff. Maybe I can look into them in the long term. However, pytorch and triton level code is ok.

If I think about it, it's time for to go into the deeper layers than API. I mean Product level stuff is fine but a deeper layer moat is liberating and might give me more ideas.

#AI #ml #notes #technical