May 252025 The Art of LLM Inference: Fast, Fit, and Free What 20+ papers and open-source projects taught me about cracking LLM inference