During the process of writing AI Engineering, I went through so many papers, case studies, blog posts, repos, tools, etc. This repo contains ~100 resources that really helped me understand various aspects of building with foundation models. https://
github.com/chiphuyen/aie-
book/blob/main/resources.md
… Here are the
@chipro
-
Essential Resources for Building with Foundation Models
By
–
-

Building Generative AI Application Platforms: Common Components
By
–
Building a platform for generative AI applications https://
huyenchip.com/2024/07/25/gen
ai-platform.html
… After studying how companies deploy generative AI applications, I noticed many similarities in their platforms. This post outlines these common components, what they do, and implementation -
GenAI Hallucinations: Why Models Fail Without Information
By
–
3. Hallucinations make GenAI applications unusable Models hallucinate because they are probabilistic. However, a model is much more likely to hallucinate when it doesn’t have access to the right information. Multiple studies have shown that hallucinations can be
-
Foundation Models Won’t Completely Replace Classical Machine Learning
By
–
2. Foundation models will completely replace classical ML In my observation, most GenAI applications in production have classical ML components. Outside of leveraging information retrieval, around 30 – 50% of applications have a classification component, such as: – Intent
-
Common Misconceptions About Generative AI Technologies Explained
By
–
In many conversations, I noticed several common misperceptions about generative AI. 1. Technologies behind GenAI are new While many applications made possible by GenAI are new, the technologies surrounding it are not. – Retrieval, the backbone of RAG, is also the backbone of
-
Evaluating AI Systems: The Overlooked Evaluation Pipeline Problem
By
–
A big issue I see with AI systems is that people aren't spending enough time evaluating their evaluation pipeline. 1. Most teams use more than one metrics (3-7 metrics in general) to evaluate their applications, which is a good practice. However, very few are measuring the
-
LLM Product Development: Beyond Initial Success Plateau
By
–
4. Initial success with LLMs can be misleading It took them 1 month to achieve 80% of the experience they wanted, and additional 4 months to surpass 95%. The initial success made them underestimate how challenging it is to improve the product, especially dealing with
-
Automatic Evaluation Challenges in AI Response Assessment
By
–
3. Automatic evaluation is hard
One core challenge of evaluation is coming up with a guideline on what a good response is. For example, for skill fit assessment, the response: “You’re not a good fit” is correct, but not helpful. Originally, evaluation was ad-hoc. Everyone could -
Trading Throughput for Latency: LLM Performance Optimization
By
–
2. Sacrificing throughput for latency
Originally, they focused on TTFT (Time To First Token), but realized that TBT (Time Between Token) hurt them more, especially with Chain-of-Thought queries where users don’t see the intermediate outputs. They found that TTFT and TBT -

LinkedIn’s LLM deployment insights: YAML outputs and token optimization
By
–
Really enjoyed LinkedIn's report on what worked and what didn't when deploying LLM applications. 4 takeaways. 1. Structured outputs
They chose YAML over JSON as the output format because YAML uses less tokens. Initially, only 90% of the outputs are correctly formatted YAML. They