🤖 How Do You Train a Model in Hugging Face?
You already know:
- Input = numbers (tokens)
- Answer = number (label, like 0 or 1)
So now… how do we tell Hugging Face the correct answer and help the model learn?
We use a Trainer, and here’s what happens behind the scenes:
đź§ľ 1. Prepare Sentences and Answers
Let’s say you have this sentence:
text = "I love chocolate!"
label = 1 # 1 = happy
Before the model can learn, we must turn the sentence into numbers using a tokenizer:
inputs = tokenizer(text, truncation=True, padding="max_length")
inputs["label"] = label
⚠️ In Hugging Face — and in most machine learning — the word “label” means the correct answer we want the model to learn.
Now Hugging Face knows:
- 🤖 The sentence as tokens (
input_ids
) - âś… The correct answer (
label
)
đź§ 2. Use Trainer to Let the Model Learn
Trainer works like Robo’s classroom. It gives the model:
- Sentences as tokens
- The correct answer (label)
- A way to check if the model is right or wrong
Trainer does the following:
- Model makes a prediction
- Trainer checks if it’s correct
- If it’s wrong, the model gets a little smarter
⚠️ You don’t have to manually say “wrong!” — Hugging Face does it automatically using something called loss (just a math score of “how wrong the model is”).
âś… Real Example
from transformers import AutoTokenizer, AutoModelForSequenceClassification, Trainer, TrainingArguments
from datasets import load_dataset
# Load tokenizer and model
tokenizer = AutoTokenizer.from_pretrained("bert-base-uncased")
model = AutoModelForSequenceClassification.from_pretrained("bert-base-uncased", num_labels=2)
# Load and tokenize dataset
dataset = load_dataset("imdb")
# Turn every sentence in the dataset into something the model can read.
def preprocess(example):
return tokenizer(example["text"], truncation=True, padding="max_length")
tokenized_dataset = dataset.map(preprocess, batched=True)
# Training arguments
training_args = TrainingArguments(output_dir="./results", num_train_epochs=1)
# Trainer setup
trainer = Trainer(
model=model,
args=training_args,
train_dataset=tokenized_dataset["train"].select(range(2000)),
eval_dataset=tokenized_dataset["test"].select(range(500)),
)
# Start training
trainer.train()
And that’s it! Hugging Face:
- Feeds the model examples
- Checks if it got the answer right
- Adjusts the model if it was wrong
đź§ So, How Do You Tell Hugging Face the Right Answer?
# A Python dictionary containing an input sentence and its correct label (answer)
example = {
"text": "That was awesome!",
"label": 1 # 1 is the correct answer the model should learn (e.g., positive sentiment)
}
The Trainer takes care of the rest — comparing predictions with labels and teaching the model.