Summary and Outlook · ML Algorithms Review

We started this chapter with a discussion of model complexity, then discussed generalization, or learning a model that is able to perform well on new, previously unseen data. This led us to the concepts of underfitting, which describes a model that cannot capture the variations present in the training data, and overfitting, which describes a model that focuses too much on the training data and is not able to generalize to new data very well.

We then discussed a wide array of machine learning models for classification and regression, what their advantages and disadvantages are, and how to control model complexity for each of them. We saw that for many of the algorithms, setting the right parameters is important for good performance. Some of the algorithms are also sensitive to how we represent the input data, and in particular to how the features are scaled. Therefore, blindly applying an algorithm to a dataset without understanding the assumptions the model makes and the meanings of the parameter settings will rarely lead to an accurate model.

This chapter contains a lot of information about the algorithms, and it is not necessary for you to remember all of these details for the following chapters. However, some knowledge of the models described here - and which to use in a specific situation - is important for successfully applying machine learning in practice. Here is a quick summary of when to use each model:

Nearest Neighbors

For small datasets, good as a baseline, easy to explain

Linear Models

Go-to as a first algorithm to try, good for very large datasets, good for very high-dimensional data.

Naive Bayes

Only for classification. Even faster than linear models, good for very large datasets and high-dimensional data. Often less accurate than linear models.

Decision trees

Very fast, don’t need scalling of the data, can be visualized and easy explained.

Random Forests

Nearly always perform betten than single decision tree, very robust and powerful. Don’t need scaling of data. Not good for very high-dimensional sparse data.

Gradient Boosted decision trees

Often slightly more accurate than random forests. Slower train but faster to predict than random forests, and smaller in memory. Need more parameter tunning than random forests.

Support Vector Machines

Powerful for medium-sized datasets of features with similar meaning. Require scaling of data, sensitive to parameters.

Neural Networks

Can build bery complex models, particularly for large datasets. Sensitive to scaling of the data and to the choice of parameters. Large models need a long time to train.

When working with a new dataset, it is in general a good idea to start with a simple model, such as a linear model or a naive Bayes or nearest neighbors classifier, and see how far you can get. After understanding more about the data, you can consider moving to an algorithm that can build more complex models, such as random forests, gradient boosted decision trees, SVMs, or neural networks.

Supervised Learning Generalization Underfitting Overfitting Regression Classification

Related Posts

Stop Words 11 Jun 2021

Rescaling the Data with tf–idf 11 Jun 2021

Representing Data as a Bag of Words 11 Jun 2021