Foundations of Machine Learning

Course name: MATE5424-3001 Foundations of Machine Learning

Short description

This course focuses on the mathematical foundations of basic machine learning concepts and algorithms. Using mathematical language we aim to express widely used machine learning concepts that seem intuitively obvious, but turn out to be surprisingly difficult to use optimally in practice. The aim is to gain insights into several basic machine learning tasks, to understand what they do, what they are best at, and what their limitations are.

The course is an excellent introduction to machine learning for mathematics students. It is also highly suitable to computer science students as a companion to the machine learning engineering courses, providing the mathematical background to the algorithmics and the programming methods they introduce.

Course textbooks

Marc Peter Deisenroth, A. Aldo Faisal, and Cheng Soon Ong. Mathematics for Machine Learning. Cambridge University Press, 2020.
Mehryar Mohri, Afshin Rostamizadeh, and Ameet Talwalkar. Foundations of Machine Learning. MIT Press, Second Edition, 2018.
Avrim Blum, John Hopcroft, and Ravindran Kannan. Foundations of Data Science. Cambridge University Press, 2020.

Mathematical background

The mathematical background needed in the course is well presented in the first part (“Mathematical Foundations”) of the Deisenroth et al textbook. It is also well presented in the appendices A-E of the Mohri et al textbook. We assume the students will consult these books whenever needed. Only essential parts of the mathematical background will be introduced in the lectures.

Syllabus

Supervised learning
- Linear regression
- Generalised linear regression (incl. logistic regression)
- Classification with support-vector machines
Unsupervised learning
- Density estimation (incl. clustering)
Dimensionality reduction
- Principal component analysis

Lectures

Introduction
8. Models and data
9. Linear regression
- 9.1 Problem formulation
- 9.2 Parameter estimation
- 9.3 Bayesian linear regression
- 9.4 A brief geometric view on linear regression
- 9.5 Generalised linear models (e.g., logistic regression and neural networks)
10. Dimensionality reduction with principal component analysis
- 10.1 Problem setting
- 10.2 PCA as a problem of maximising the data variance
  - Finding the direction with the single highest variance
  - M-dimensional subspace with maximal variance
- 10.3 PCA as a problem of minimising the average reconstruction error
- 10.5 PCA in high dimensions
- 10.6 Key steps of PCA in practice
- 10.7 Probabilistic PCA
- 10.8 Connections to other topics
11. Density estimation with Gaussian mixture models
- 11.0 Introduction
- 11.1 Model formulation
- 11.2 Parameter learning with maximum likelihood
- 11.3 The EM algorithm
- 11.4 The latent-variable perspective on GMM
12. Classification with support vector machines
- 12.1 Separating hyperplanes
- 12.2 Hard-margin SVM: part 1, part 2
- 12.2 Soft-margin SVM: geometric view, loss function view
- 12.3 Dual SVM: via Lagrange multipliers, via convex hulls
- 12.4 Kernels

Course feedback

The Department of Mathematics and Statistics is collecting periodic feedback on its courses. It is important for the department and for the lecturer to get this feedback. The questionnaire will only take a few minutes to answer. The questions (in Finnish) can be found at https://webropol.com/s/mattilpalauteIV2020