11868 LLM Sys: GPU Programming & Acceleration
Notes and summaries for CMU 11-868 LLM Systems: GPU Programming & Acceleration.
11711 Advanced NLP: Fundamentals
Notes and summaries for CMU 11-711 Advanced NLP.
15618 Assignment 1 Report
Assignment 1 report for CMU 15-618, covering pthreads speedup analysis and SIMD vectorization.
15645 Database systems: Relational Model and SQL
Notes and summaries for CMU 15-645 Database Systems.
15618 Parallel Programming Lecture (1-4) Notes
Notes and summaries for CMU 15-618 Parallel Programming.
Deconstructing Agentic Coding with First Principles: From Theory to Practice
This article is from a ByteDance expert, titled "Deconstructing Agentic Coding with First Principles: From Theory to Practice." I gained a lot from reading it.
CS336-Lec4 Mixture of Experts
This article summarizes the content of Lecture 4 of the CS336 course, focusing on the principles, implementation methods, and applications of the Mixture of Experts model within the Transformer architecture, including recent advancements in expert selection mechanisms, routing strategies, and training techniques.
CS336-Lec3 Architectures & Hyperparameters
This article summarizes the content of the third lecture of the CS336 course, focusing on the evolution of Transformer architectures and their hyperparameter choices, including the latest developments in normalization methods, activation functions, position encoding, and more.
CS336-Lec2 PyTorch & Resource accounting
This section focuses on the 'compute black box' behind model training. Starting from the microscopic details of floating-point formats, it delves into FLOPs calculation formulas, analyzes the characteristics of modern hardware, and finally provides a comprehensive optimization guide ranging from mathematical principles to PyTorch code implementation.
CS336-Lec1 Tokenization
Lec1 mainly introduces the basic concepts of Tokenization and several common Tokenizer methods, including Character Tokenizer, Byte Tokenizer, Word Tokenizer, and BPE Tokenizer, analyzing their advantages, disadvantages, and applicable scenarios.





