Dark versionDefault version

Hugging Face

Model

A model is a machine learning algorithm (like BERT, GPT, etc.) trained to perform a task: classification, translation, generation, etc.
Example: "bert-base-uncased" is a pre-trained BERT model.

Tokenizer

A tokenizer converts raw text into tokens (numbers) that models can understand.
Example: "Hello"[101, 7592, 102] for BERT

Dataset

A dataset is a collection of labeled or unlabeled examples used to train or evaluate models.
Example: "imdb" for sentiment analysis, "squad" for Q&A.

Pipeline

A pipeline is a high-level interface that wraps everything (tokenizer + model) for quick inference.
Example: pipeline("sentiment-analysis")("I love this!")

Space

A Space is a public or private app hosted on Hugging Face (like a mini web app), often built using Gradio or Streamlit.
Example: A web demo where you paste text and get a sentiment prediction.

Token (API Key)

A token is your personal access key to use Hugging Face Hub programmatically (e.g., to upload models or access private ones).
Get it from https://huggingface.co/settings/tokens