Covers chapter 1 from Géron’s book
Machine learning: Definition and applications
Machine learning is the art of programming computers so that they can learn from data. A more precise definition is: “A computer program is said to learn from experience E with respect to some task T and some performance measure P, if its performance on T, as measured by P, improves with experience E.”
Some jargon: Training set: the examples a system uses to learn
Training instance or sample: an individual training example
Model: The part of a machine learning system that learns and makes predictions (ex. neural networks and random forests)
Data mining: digging into large amounts of data to discover hidden patterns
Why should I use machine learning?
- If my code has a long list of rules (say regex) or requires excessive fine-tuning it’s better (in both simplicity and performance parameters) to train a model for that problem.
- Finding underlying patterns within my data and insights for complex problem. I am doing a version of this in magpie.
- A machine learning system can be automated, as in it can be retrained on new data.

- Some complex problems (as seen in applications) for which no traditional approach has a satisfactory solution.
Some examples of applications include:
- Detecting tumors in brain scans (image segmentation using CNNs or transformers)
- Classification of news articles (natural language processing, text classification using RNNs and CNNs or transformers)
- Forecasting based on performance metrics (regression, value prediction)
- Voice commands (speech recognition, again using RNNs, CNNs or transformers)
- Segmenting data (clustering)
- Representing a high dimension dataset into an insightful diagram (dimensionality reduction and data visualisation)
- Building an intelligent bot (reinforcement learning)
Types of machine learning systems
We use the following criteria to classify ML systems:
- How they’re sueprvised during training (supervised, unsupervised, semi-supervised, self-supervised, etc.)
- Whether they can learn incrementally on the fly for latest data (online and batch)
- Whether they are comparing new data points to old known data points or are detecting patterns within the data using a predictive model like irl scientists (instance-based and model-based)
Training supervision
Classification on basis of amount and type of supervision received during training.
Supervised learning: Here, the training fed to the models includes desired solutions called labels. Things that can be done in supervised learning include classification of new data (training data would also have class assigned to it), target value prediction (ex. price of a car given a set of features like brand, mileage, color)- this is called regression. Regression models can also be used for classification and vice versa.

NOTE
target and label are generally synonyms in supervised learning but target is more common for regression tasks and label is more common for classification tasks. Features are also called predictors/attributes. Ex. “this video’s viewcount feature is 100,000” or for entire sample, “the tags feature is not correlated with the view count”.
Unsupervised learning: Training data is unlabeled in this. Example - running a clustering algorithm on a dataset to try to detect similar groups of data. Heirarchal clustering may also help to find sub-groups within those groups. Another example of unsupervised learning is data visualisation - feed them lots of complex unlabeled data and they’ll spit out a 2d/3d representation that can be plotted. This is good for identifying patterns within your data too. Another task is dimensionality reduction where you try to simplify data without losing too much information by trying to merge several correlated features into one (ex. car’s mileage is correlated to its age). This is called feature extraction. Yet another task would be anomaly detection: the system is shown normal instances during training and when it sees a new not-normal instance it flags it as an anomaly. Similarly there’s novelty detection: requires a clean state training data devoid of the thing you would like to detect. Then when something new is seen it’s flagged as a novelty. The last one is association rule learning - the goal is to dig into large amounts of data and find interesting relationships between the attributes.
