Common ML Terminologies

2 min read

(Features, Labels, Training, Testing, Models — all the buzzwords decoded)

🎯 1. Features #

Features are the input variables or the measurable properties that you give to the machine to learn from.

Think of them as columns in a dataset — like “Age,” “Salary,” “Education Level,” etc.

🧠 Real-Life Analogy: #

Imagine you’re a recruiter trying to guess if a candidate will be a good hire.
You look at:

  • Age ✅
  • Education ✅
  • Years of Experience ✅

These are all features you’re using to make that prediction.

🧪 Example: #

AgeSalaryOwns a Car?Will Buy a House?
2540kYesYes
3060kNoYes
2225kNoNo

👉 Features: Age, Salary, Owns a Car
👉 (The last column is the label, which we’ll explain next 👇)


🎯 2. Label #

A label is the answer or output you’re trying to predict.

In supervised learning, every input (feature) comes with a corresponding label (output) — it’s the thing we want the model to learn to predict.

🧠 Real-Life Analogy: #

Back to our recruiter — after interviewing candidates, you eventually find out whether they performed well or not. That final decision becomes the label you’ll use for future hiring predictions.

🧪 Example: #

In a spam filter:

  • Email content = Features
  • “Spam” or “Not Spam” = Label

In house pricing:

  • Features: Location, Area, Bedrooms
  • Label: Price of the house

🎯 3. Training #

Training is the process of teaching the machine using historical data.

You feed the algorithm features + labels, and the model learns how to map inputs to outputs.

🧠 Real-Life Analogy: #

It’s like giving a student a bunch of math problems with solutions so they can learn how to solve new ones.

🧪 Example: #

If you give the model 1000 rows of customer data, where each row includes features (age, salary, etc.) and a label (bought or not), the model learns the pattern.


🎯 4. Testing #

Testing is when you evaluate how well your model has learned — by giving it new data it has never seen before (with known labels) and checking how accurately it can predict.

🧠 Real-Life Analogy: #

After a student has studied, you give them a test to see how well they’ve learned — same idea!

⚖️ Why It Matters: #

You don’t want a model that just memorizes — you want one that generalizes well to new data.

🧪 Typical Split: #

  • 80% Training Data
  • 20% Testing Data

This helps prevent overfitting (when a model is too tailored to training data and fails on new data).


🎯 5. Model #

A model is the final system that you get after training. It’s the mathematical or logical structure that takes features as input and gives predictions as output.

🧠 Real-Life Analogy: #

A model is like a recipe you’ve mastered — once you’ve trained (learned) it, you can use it anytime to cook (predict).

📦 What’s Inside a Model? #

It could be:

  • A linear equation (like in regression)
  • A tree of decisions (like in decision trees)
  • A multi-layer neural net (in deep learning)

Once trained, you can deploy this model to start making real-time predictions.


🧩 Bonus Terms (Quick Definitions) #

TermMeaning
DatasetA collection of data, usually structured into rows (records) and columns (features)
InstanceA single row or example from the dataset
TargetAnother word for “label” (especially in regression/classification)
AlgorithmThe method or logic used to build the model (e.g., Decision Tree, SVM)
OverfittingWhen the model performs well on training data but badly on new data
UnderfittingWhen the model is too simple to capture patterns, performs poorly everywhere
Evaluation MetricsMeasures like accuracy, precision, recall used to test model performance

🧠 Putting It All Together: Simple Example #

Let’s say you’re building a model to predict house prices.

  • Features: Size of the house, number of bedrooms, location
  • Label: House price
  • Training: Feed the algorithm with 1000 old sales records
  • Model: A trained system that can now predict prices of new houses
  • Testing: You check how well it performs on 200 new unseen examples

🔁 Visual Summary #

Features → [Training Data] → Model → Predict Labels on [Test Data]

Or in emoji form 😄

🏠 + 📏 + 🌍 ➡️ 🧠 ➡️ 💰


✅ Final Thoughts #

These five terms — Features, Labels, Training, Testing, and Model — form the backbone of your understanding in ML. If you’re clear on these, you’ll understand how every ML project is structured.

Updated on June 5, 2025