Chapter 3: The Two Stages of Building an LLM
π Reading Time: 35 minutes
Welcome to Chapter 3! In the previous chapters, we learned what LLMs are and why theyβre revolutionary. Now itβs time to understand how theyβre actually built.
The big reveal: Building an LLM is not one giant step - itβs a two-stage process:
- Pre-training (Stage 1)
- Fine-tuning (Stage 2)
By the end of this chapter, youβll understand:
- What is pre-training and why itβs called βpreβ
- How LLMs learn from billions of words
- What is fine-tuning and when itβs needed
- Real examples from top companies
- The complete lifecycle from data to deployment
Letβs dive in! π
π Table of Contents
- Quick Recap: Where We Are
- The Two-Stage Building Process
- Stage 1: Pre-training Explained
- Stage 2: Fine-tuning Explained
- Pre-training vs Fine-tuning: Key Differences
- Real-World Examples
- The Complete LLM Lifecycle
- Labeled vs Unlabeled Data
- Types of Fine-tuning
- Cost Analysis: The Money Behind LLMs
- Chapter Summary
Quick Recap: Where We Are
In Chapter 1, we introduced the LLM series and our goals.
In Chapter 2, we learned:
- What LLMs are (neural networks for text)
- Why theyβre called βLargeβ (billions of parameters)
- The secret sauce (Transformer architecture)
- Difference between AI, ML, DL, and LLM
In Chapter 3 (Today), weβll learn:
- How these massive models are actually built
- The step-by-step process from raw data to ChatGPT
The Two-Stage Building Process
π The Two Acts of Building an LLM
Think of building an LLM like training for the Olympics:
Stage 1: General Training (Pre-training)
- Like an athlete doing years of general fitness training
- Builds overall strength, endurance, speed
- Not focused on one specific sport yet
Stage 2: Specialized Training (Fine-tuning)
- Like an athlete specializing in javelin throw or swimming
- Focuses on specific skills for a particular event
- Refines what was learned in general training
For LLMs:
Stage 1: Pre-training
βββ Train on EVERYTHING (entire internet)
βββ Result: General-purpose AI
Stage 2: Fine-tuning
βββ Train on SPECIFIC data (your company's data)
βββ Result: Specialized AI for your needs
π€ Why Two Stages? Why Not Just One?
Great question!
Analogy: Education System
General Education (Pre-training):
- Kindergarten to 12th grade
- Learn everything: Math, Science, Languages, History
- Become a well-rounded person
Specialization (Fine-tuning):
- College/University - Choose Engineering
- Medical school - Become a doctor
- Law school - Become a lawyer
Same logic for LLMs:
You need general knowledge first (pre-training), then specialize (fine-tuning) for your specific use case.
Stage 1: Pre-training Explained
π What is Pre-training?
Simple Definition:
Training the LLM on a massive and diverse dataset so it learns general language understanding.
What βmassiveβ means:
GPT-3 was trained on 300 billion words!
Letβs put that in perspective:
Average book: 80,000 words
300 billion words = 3,750,000 books
If you read one book per day:
It would take 10,274 YEARS to read all that!
π Where Does This Training Data Come From?
GPT-3βs Training Data Sources:
| Source | Words | Percentage | What It Contains |
|---|---|---|---|
| Common Crawl | 410 billion | 60% | Entire internet content |
| WebText2 | 20 billion | 22% | Reddit posts, quality articles |
| Books1 & Books2 | 67 billion | 16% | Published books |
| Wikipedia | 3 billion | 3% | Wikipedia articles |
Total: 500 billion words (300 billion used for training)
π Letβs Explore These Sources
1. Common Crawl
What is it?
- An open repository of web data
- Crawls and stores content from billions of websites
- Anyone can access it for free
Example content:
- News articles
- Blog posts
- Product reviews
- Social media discussions
- Scientific papers
- Forums and Q&A sites
Try it: Visit commoncrawl.org
2. WebText2
What is it?
- High-quality text from Reddit
- Only upvoted content (quality filter)
- Includes Stack Overflow (programming Q&A)
Why Reddit?
- Redditβs upvote system = quality filter
- Diverse topics (technology, cooking, science, history)
- Human-written, conversational language
3. Books
Why include books?
- Proper grammar and structure
- Long-form storytelling
- Diverse vocabulary
- Different writing styles (fiction, non-fiction, technical)
Example books included:
- Classic literature
- Technical manuals
- Science textbooks
- Fiction novels
4. Wikipedia
Why Wikipedia?
- Factual, well-structured information
- Covers millions of topics
- Multiple languages
- Regularly updated
π― The Training Task: Next Word Prediction
Hereβs the fascinating part:
LLMs learn by playing a simple game: βGuess the next wordβ
Example:
Given: "The lion is in the ___"
LLM predicts: "forest"
Given: "I went to the ___"
LLM predicts: "store" (or "park", "school", "mall")
Given: "The capital of France is ___"
LLM predicts: "Paris"
Thatβs it!
Just train on predicting the next word, billions of times, with billions of examples.
π€― The Surprising Discovery
What researchers found:
When you train an LLM ONLY for βnext word predictionβ on massive data, something magical happens:
It learns to do MANY other tasks automatically!
Tasks LLMs can do (without specific training):
β Translation
Input: Translate "Hello" to Spanish
Output: Hola
β Summarization
Input: Summarize this 10-page article
Output: [Concise 3-sentence summary]
β Question Answering
Input: What is the capital of Japan?
Output: Tokyo
β Multiple Choice Questions
Input: What is 2+2? A) 3 B) 4 C) 5
Output: B) 4
β Sentiment Analysis
Input: "This movie was terrible!"
Output: Negative sentiment
β Code Generation
Input: Write Python code to reverse a string
Output: [Working Python code]
All of this WITHOUT being specifically trained for these tasks!
π The Pre-training Result: Foundation Model
After pre-training, you get:
Foundation Model (also called Base Model or Pre-trained Model)
Characteristics:
- β General-purpose
- β Can do many tasks
- β Understands language deeply
- β Butβ¦ not specialized for anything specific
Example: GPT-4 is a foundation model
When you use ChatGPT without any customization, youβre using a foundation model (with some basic fine-tuning).
π‘ Key Takeaway
Pre-training is like giving an LLM a complete education:
- Read everything on the internet
- Learn general language patterns
- Become a jack-of-all-trades
But itβs not specialized yet. Thatβs where fine-tuning comes in!
Stage 2: Fine-tuning Explained
π― What is Fine-tuning?
Simple Definition:
Taking a pre-trained model and refining it on a specific, narrow dataset for a particular task or domain.
Analogy: Doctor Specialization
Medical School (Pre-training)
βββ Learn general medicine
βββ Graduate: General doctor
Specialization (Fine-tuning)
βββ Cardiologist (heart specialist)
βββ Neurologist (brain specialist)
βββ Pediatrician (child specialist)
Same for LLMs:
Pre-trained GPT-4 (Foundation Model)
βββ Knows everything generally
Fine-tuned for Banking (JP Morgan's AI)
βββ Specializes in financial analysis
Fine-tuned for Legal (Harvey AI)
βββ Specializes in legal cases
Fine-tuned for Telecom (SK Telecom AI)
βββ Specializes in customer support (Korean)
π€ Why Not Just Use Pre-trained Models?
Great question! Letβs see with examples:
Scenario 1: You Run an Airline Company
You want: AI chatbot for customer support
Question to AI:
βWhatβs the price for Lufthansa flight leaving at 6 PM to Munich?β
If you use pre-trained GPT-4 (without fine-tuning):
Response: "I don't have access to real-time flight prices.
Please check the Lufthansa website or contact
their customer service at..."
β Not helpful! Itβs generic.
If you fine-tune on YOUR airlineβs data:
Response: "The Lufthansa flight LH456 departing at 6 PM
to Munich costs β¬235 (Economy) or β¬890 (Business Class).
Would you like me to check availability?"
β Perfect! Specific to your company.
Scenario 2: Youβre an Educational Platform
You want: AI to generate high-quality exam questions
If you use pre-trained GPT-4:
- Questions are okay but generic
- May not match your curriculum
- Quality varies
If you fine-tune on YOUR past exam papers:
- Questions match your style exactly
- Difficulty levels are consistent
- Covers your specific syllabus
π’ When Do You Need Fine-tuning?
You DONβT need fine-tuning if:
- β Youβre a student using ChatGPT for homework
- β Youβre using AI for general tasks (writing emails, summaries)
- β Youβre exploring AI capabilities
- β Generic responses are good enough
You NEED fine-tuning if:
- β Youβre a company with proprietary data
- β Your domain is highly specialized (legal, medical, finance)
- β You need consistent, high-quality responses
- β Your data is not publicly available
- β Youβre building a production application
- β Generic AI responses are not good enough
π Pre-training vs Fine-tuning: Key Differences
| Aspect | Pre-training | Fine-tuning |
|---|---|---|
| Data | 300 billion+ words | 10,000 - 10 million examples |
| Data Source | Entire internet | Your specific dataset |
| Data Type | Unlabeled (raw text) | Labeled (with answers/tags) |
| Goal | Learn general language | Specialize for specific task |
| Cost | $4.6 million (GPT-3) | $1,000 - $100,000 |
| Time | Weeks to months | Hours to days |
| Result | Foundation model | Specialized model |
| Who Does It | OpenAI, Google, Meta | Companies, developers, you! |
| Examples | GPT-4, Claude, Gemini | Harvey (legal), Your chatbot |
Real-World Examples
Letβs see how top companies use fine-tuning:
1οΈβ£ SK Telecom (South Korea)
Company: Major telecommunications provider
Problem:
- Needed AI customer support chatbot
- Must understand Korean telecom terminology
- Generic GPT-4 doesnβt understand telecom jargon
Solution: Fine-tuned GPT-4 on:
- Past customer service conversations (Korean)
- Telecom-specific terminology
- Company policies and procedures
Results:
- β 35% improvement in conversation summarization
- β 33% improvement in understanding customer intent
- β Handles Korean telecom queries perfectly
Source: OpenAI case studies
2οΈβ£ Harvey AI (Legal Industry)
Company: AI assistant for lawyers and attorneys
Website: harvey.ai
Problem:
- Lawyers need AI that understands legal case history
- Generic GPT-4 lacks extensive legal knowledge
- Legal terminology and precedents are crucial
Solution: Fine-tuned LLM on:
- Millions of legal case documents
- Court rulings and precedents
- Legal contracts and agreements
- Jurisdiction-specific laws
What Harvey can do:
- β Research legal cases in seconds
- β Draft legal documents
- β Analyze contracts
- β Provide case law references
Used by:
- Top law firms globally
- Corporate legal teams
- Legal professionals
Why itβs better than ChatGPT:
- Trained specifically on legal data
- Understands legal jargon
- Provides case law citations
- Domain expertise in law
3οΈβ£ JP Morgan Chase (Banking)
Company: Major investment bank
Product: Internal AI-powered LLM suite
Announcement: βJP Morgan unveils AI-powered LLM suite - may replace research analystsβ
Why build their own LLM?
Problem with using GPT-4 directly:
- β Not trained on JP Morganβs proprietary data
- β Lacks internal banking insights
- β Doesnβt understand company-specific terminology
- β Canβt access confidential financial models
Solution: Fine-tuned LLM on:
- Internal research reports
- Financial analysis documents
- Market data and trends
- Company-specific methodologies
Use cases:
- β Generate financial research reports
- β Analyze market trends
- β Summarize earnings calls
- β Draft investment recommendations
Benefits:
- Speeds up analyst work
- Maintains confidentiality
- Uses proprietary insights
- Consistent with company standards
π― Pattern in All Examples
Notice the pattern:
Generic LLM (GPT-4)
β
+ Company's specific data
β
Fine-tuning
β
Specialized LLM for that industry
Key Insight:
Every major company building serious AI applications does fine-tuning. They NEVER just use the foundation model as-is.
The Complete LLM Lifecycle
π From Raw Data to Production
Let me show you the complete journey:
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β STAGE 1: PRE-TRAINING β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Step 1: Collect Massive Data
βββ Common Crawl (410B words)
βββ WebText2 (20B words)
βββ Books (67B words)
βββ Wikipedia (3B words)
β
Total: 500 billion words
Step 2: Train the Model
βββ Task: Predict next word
βββ Hardware: 1000s of GPUs
βββ Time: Several weeks
βββ Cost: $4.6 million (GPT-3)
βββ Parameters: 175 billion
Step 3: Result
βββ Foundation Model (Pre-trained LLM)
βββ Can do many tasks
βββ General-purpose
βββ Not specialized
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β STAGE 2: FINE-TUNING β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Step 4: Collect Specific Data
βββ Your company data (10K-1M examples)
βββ Domain-specific (legal, medical, finance)
βββ Labeled data (with answers)
βββ High-quality examples
Step 5: Fine-tune the Model
βββ Start from pre-trained model
βββ Train on your specific data
βββ Time: Hours to days
βββ Cost: $1,000 - $100,000
Step 6: Result
βββ Specialized Model
βββ Perfect for your use case
βββ Domain expert
βββ Ready for production
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β STAGE 3: DEPLOYMENT β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Step 7: Build Application
βββ Chatbot
βββ Document analyzer
βββ Customer support
βββ Code assistant
Step 8: Deploy to Users
βββ Companies, employees, customers
π Visual Schematic
Hereβs a simplified view:
RAW DATA FOUNDATIONAL MODEL SPECIALIZED APPLICATIONS
(Unlabeled) (Pre-trained) (Fine-tuned)
βββββββββββββββ ββββββββββββββββ βββββββββββββββββββ
β Internet ββββββ β βββββ¬βββββ Personal β
β Text β β β β β β Assistant β
βββββββββββββββ€ β β β β βββββββββββββββββββ
β Books β ββββββ Foundation β β
βββββββββββββββ€ β β Model β β βββββββββββββββββββ
β Research β β β (GPT-4) β ββββββ Language β
β Papers β β β β β β Translator β
βββββββββββββββ€ β β β β βββββββββββββββββββ
β Wikipedia ββββββ β β β
βββββββββββββββ β β β βββββββββββββββββββ
ββββββββββββββββ ββββββ Code β
β β Assistant β
β βββββββββββββββββββ
β
β βββββββββββββββββββ
ββββββ Classification β
β Bot β
βββββββββββββββββββ
Labeled vs Unlabeled Data
π€ Whatβs the Difference?
This is super important to understand!
π Unlabeled Data
Definition: Raw text without any extra information or tags
Examples:
Example 1: News Article
"Scientists discover new planet orbiting distant star.
The planet, named Kepler-452b, is located 1,400 light-years
from Earth and may have conditions suitable for life..."
No labels needed! Just the text.
Example 2: Book
"It was the best of times, it was the worst of times,
it was the age of wisdom, it was the age of foolishness..."
No labels needed! Just the story.
Used for: Pre-training
π·οΈ Labeled Data
Definition: Text with associated labels, tags, or answers
Examples:
Example 1: Email Classification
Text: "Congratulations! You've won $1 million! Click here..."
Label: SPAM β
Text: "Meeting rescheduled to 3 PM tomorrow"
Label: NOT SPAM β
Example 2: Sentiment Analysis
Text: "This movie was amazing! Best film ever!"
Label: POSITIVE π
Text: "Waste of time and money. Terrible acting."
Label: NEGATIVE π‘
Example 3: Question-Answer Pairs
Question: "What is the capital of France?"
Answer: "Paris"
Question: "Who wrote Romeo and Juliet?"
Answer: "William Shakespeare"
Example 4: Legal Case Data
Case Description: "Contract dispute over intellectual property rights..."
Relevant Precedent: "Smith v. Jones (1995) - similar case ruling..."
Expected Outcome: "Likely favorable to defendant based on precedent"
Used for: Fine-tuning
π Comparison Table
| Feature | Unlabeled Data | Labeled Data |
|---|---|---|
| Structure | Just text | Text + tags/answers |
| Example | βThe cat sat on the matβ | Text: βMovie was great!β Label: Positive |
| Cost | Cheap (scrape internet) | Expensive (humans label it) |
| Availability | Abundant (billions of words) | Limited (thousands of examples) |
| Used in | Pre-training | Fine-tuning |
| Purpose | Learn general language | Learn specific task |
π‘ Why This Matters
Pre-training:
- Needs MASSIVE amounts of data
- Unlabeled data is easy to get (entire internet!)
- Learns general patterns
Fine-tuning:
- Needs SMALLER amounts of data
- But data must be labeled (expensive!)
- Learns specific tasks
This is why:
- Pre-training costs millions (data + compute)
- Fine-tuning costs thousands (mostly labeling)
Types of Fine-tuning
Not all fine-tuning is the same! There are two main types:
1οΈβ£ Instruction Fine-tuning
What it is: Teaching the LLM to follow specific instructions
Format: Instruction β Response pairs
Examples:
Example 1: Translation
Instruction: "Translate this English text to French: Hello, how are you?"
Response: "Bonjour, comment allez-vous?"
Example 2: Summarization
Instruction: "Summarize this article in 3 sentences: [article text]"
Response: "[3-sentence summary]"
Example 3: Customer Support
Instruction: "Customer says: 'My flight is cancelled, what should I do?'"
Response: "I apologize for the inconvenience. Here are your options:
1. Rebook on the next available flight
2. Request a full refund
3. Hotel accommodation for tonight
Which would you prefer?"
Use cases:
- β Chatbots
- β Virtual assistants
- β Translation services
- β Summarization tools
- β Educational tutors
2οΈβ£ Classification Fine-tuning
What it is: Teaching the LLM to categorize or classify text
Format: Text β Label/Category
Examples:
Example 1: Spam Detection
Input: "Congratulations! You won a free iPhone! Click now!"
Output: SPAM
Input: "Meeting agenda for tomorrow attached"
Output: NOT SPAM
Example 2: Sentiment Analysis
Input: "This product is absolutely terrible. Don't buy it."
Output: NEGATIVE
Input: "Amazing quality! Highly recommend to everyone."
Output: POSITIVE
Example 3: Topic Classification
Input: "Scientists discover new cancer treatment breakthrough..."
Output: SCIENCE
Input: "Stock market hits record high as tech companies surge..."
Output: BUSINESS
Example 4: Intent Detection
Input: "What time does the store close?"
Output: HOURS_INQUIRY
Input: "I want to return this product"
Output: RETURN_REQUEST
Use cases:
- β Email filtering (spam/not spam)
- β Sentiment analysis (positive/negative/neutral)
- β Content moderation (appropriate/inappropriate)
- β Topic categorization (sports/politics/tech)
- β Intent recognition (in chatbots)
π Comparison
| Aspect | Instruction Fine-tuning | Classification Fine-tuning |
|---|---|---|
| Output | Free-form text (answers, translations) | Fixed categories/labels |
| Complexity | More complex | Simpler |
| Examples | Q&A, translation, summarization | Spam detection, sentiment |
| Flexibility | Very flexible responses | Predefined categories only |
| Training Data | Instruction-response pairs | Text-label pairs |
Cost Analysis: The Money Behind LLMs
π° Letβs Talk Numbers
Building LLMs is EXPENSIVE. Letβs break down the costs:
π Pre-training Costs
GPT-3 Pre-training:
| Resource | Quantity | Cost |
|---|---|---|
| GPUs | ~10,000 NVIDIA V100 | $3 million |
| Electricity | Several megawatts | $500,000 |
| Cloud Infrastructure | AWS/Azure | $1 million |
| Data Collection | 500B words | $100,000 |
| Engineers | 50+ AI researchers | Priceless |
| Total | - | $4.6 million |
Training Duration: 30+ days continuously
π‘ Why So Expensive?
1. GPUs Are Expensive
Single NVIDIA A100 GPU: $10,000
For GPT-3: 10,000 GPUs needed
Cost: $100 million in hardware
(but rented from cloud, so cheaper)
2. Electricity Costs
10,000 GPUs running 24/7 for a month
= Electricity for a small town!
= $500,000+ in power bills
3. Expertise Needed
AI researchers: $300,000+/year salary
50 researchers Γ 1 year
= $15 million+ in salaries
π Fine-tuning Costs (Much Cheaper!)
Typical Fine-tuning Project:
| Resource | Quantity | Cost |
|---|---|---|
| Compute | Few GPUs for days | $1,000 - $10,000 |
| Data Labeling | 10,000 examples | $5,000 - $50,000 |
| API Costs | Using OpenAI API | $100 - $1,000 |
| Total | - | $6,000 - $60,000 |
Training Duration: Few hours to few days
π― Cost Comparison
Pre-training: $4,600,000 πΈπΈπΈπΈπΈ
Fine-tuning: $ 10,000 πΈ
ββββββββββ
Difference: 460x cheaper!
This is why:
- Only big companies (OpenAI, Google, Meta) do pre-training
- Everyone else uses their pre-trained models and fine-tunes
- You can fine-tune GPT-4 for your own use case!
π’ Who Can Afford Pre-training?
Companies that have done pre-training:
β
OpenAI (GPT series) - Backed by Microsoft
β
Google (Gemini, PaLM) - Tech giant
β
Meta (Llama series) - Tech giant
β
Anthropic (Claude) - $7B funding
β
Mistral AI (Mistral) - $400M funding
Total companies globally: ~10-15
Everyone else: Uses fine-tuning on existing models
π‘ Good News for You!
You DONβT need to pre-train!
You can:
- Use OpenAIβs API
- Fine-tune GPT-4 on your data
- Build amazing applications
- Total cost: $100 - $10,000
In this series:
- Weβll learn both pre-training AND fine-tuning
- But in practice, youβll mostly fine-tune existing models
- Understanding pre-training helps you understand how it all works!
Chapter Summary
π What We Learned Today
Letβs recap the major concepts:
1. The Two Stages
Building an LLM = Pre-training + Fine-tuning
Stage 1 (Pre-training):
βββ Train on 300 billion+ words
βββ Learn general language
βββ Result: Foundation Model
Stage 2 (Fine-tuning):
βββ Train on 10,000-1M specific examples
βββ Specialize for a task
βββ Result: Your Custom AI
2. Pre-training
β
Massive dataset (entire internet)
β
Unlabeled data (raw text)
β
Task: Predict next word
β
Duration: Weeks to months
β
Cost: Millions of dollars
β
Result: Foundation model (GPT-4, Claude)
β
Done by: OpenAI, Google, Meta
3. Fine-tuning
β
Smaller dataset (your company data)
β
Labeled data (with answers/tags)
β
Task: Specific application
β
Duration: Hours to days
β
Cost: Thousands of dollars
β
Result: Specialized model (Harvey AI, JP Morgan AI)
β
Done by: Companies, developers, YOU!
4. Key Differences
| Pre-training | Fine-tuning |
|---|---|
| 300B+ words | 10K-1M examples |
| Unlabeled | Labeled |
| General-purpose | Specific task |
| $4.6M | $10K |
| Weeks | Days |
| Foundation model | Custom model |
5. Real-World Examples
SK Telecom: Fine-tuned for Korean telecom support
βββ 35% better conversation quality
Harvey AI: Fine-tuned for legal case research
βββ Trusted by top law firms
JP Morgan: Fine-tuned for financial analysis
βββ May replace research analysts
6. Data Types
Unlabeled (Pre-training):
"The cat sat on the mat."
[Just text, no labels]
Labeled (Fine-tuning):
Text: "This movie was great!"
Label: Positive
[Text + Label]
7. Types of Fine-tuning
1. Instruction Fine-tuning:
- Format: Instruction β Response
- Use: Chatbots, translation, Q&A
2. Classification Fine-tuning:
- Format: Text β Category
- Use: Spam detection, sentiment analysis
π― The Big Picture
Remember this flow:
1. OpenAI/Google pre-trains β Creates GPT-4
2. You fine-tune GPT-4 β Your custom AI
3. You deploy β Real-world application
4. Profit! π°
π Before Next Chapter
Make sure you understand:
- [ ] What is pre-training?
- [ ] What is fine-tuning?
- [ ] Why are there two stages?
- [ ] Difference between labeled and unlabeled data
- [ ] When do you need fine-tuning?
- [ ] At least 2 real-world examples
- [ ] Cost difference (millions vs thousands)
If anything is unclear, read this chapter again!
π Whatβs Next?
In Chapter 4, weβll start diving into the technical details:
- Introduction to Transformer architecture
- Brief look at βAttention is All You Needβ paper
- Understanding the building blocks
- Preparing for actual coding!
Get ready to go deeper! π
π Take Action Now!
What to do next:
- π¬ Comment Below - Which stage interested you more: pre-training or fine-tuning?
- β Check Your Understanding - Can you explain both stages to a friend?
- π Bookmark - Save for reference
- π Think About Use Cases - What would YOU fine-tune an LLM for?
- βοΈ Stay Tuned - Chapter 4 coming soon!
Quick Reference
Key Terms Learned:
| Term | Meaning |
|---|---|
| Pre-training | Training on massive unlabeled data (Stage 1) |
| Fine-tuning | Refining on specific labeled data (Stage 2) |
| Foundation Model | Pre-trained LLM (base model) |
| Unlabeled Data | Raw text without tags |
| Labeled Data | Text with answers/categories |
| Instruction Fine-tuning | Teaching specific tasks (Q&A, translation) |
| Classification Fine-tuning | Teaching categorization (spam detection) |
Important Numbers:
- GPT-3 training data: 300 billion words
- Pre-training cost: $4.6 million
- Fine-tuning cost: $1,000 - $100,000
- Pre-training duration: Weeks to months
- Fine-tuning duration: Hours to days
Real Companies Using Fine-tuning:
- β SK Telecom (Telecom support)
- β Harvey AI (Legal research)
- β JP Morgan Chase (Financial analysis)
- β And thousands more!
Thank You!
Youβve completed Chapter 3! π
You now understand the complete lifecycle of building an LLM - from raw internet data to production-ready specialized AI. This knowledge is crucial for everything that follows!
Remember:
- Pre-training = General education (expensive, done by big companies)
- Fine-tuning = Specialization (affordable, YOU can do this!)
In the next chapter, weβll start exploring the βsecret sauceβ - the Transformer architecture that makes all of this possible!
See you in Chapter 4! π
Questions? Drop them in the comments below! We respond to every single one.