Maths foundations for statistics and machine learning

While (and after) being a COVID hermit, I took/re-took a journey from basic maths to try to use the momentum to get better at the mathematics of statistics and machine learning. It was a winding path but, with hindsight, the list below shows the steps I would have taken in an ideal world (in order, for the first two sub-lists) with the resources I found most helpful along the way (or, in some cases - Devlin, Kline, Jones & Jones - ones I didn't use myself originally but found when looking for a reference to fill out a point on the list). The first sub-list could be good preparation for statistics modules in e.g. an undergraduate psychology course.

The list

  1. General prerequisites
    1. Back-to-basics to make sure you're not missing anything that'll trip you up later: The first four parts of the MathTrackX series on edX: Polynomials, Functions and Graphs; Special Functions; Differential Calculus; Integral Calculus. These is about having the "school maths" and numeracy that everything else will assume is known, similarly to literacy. If you find it's needed to go back further, to arithmetic or wherever you need to start, then that's where to start instead!
    2. Any introduction to very basic linear algebra and matrices. E.g., Savov's No Bullshit Guide to Linear Algebra, chapters 2 and 3. This is mostly to learn the simple but critical language of how data are typically represented and organized - in columns and rows - and later on there will also be necessary maths building on these basic concepts.
    3. An introduction to the language of proofs and sets: Section 1.16 of Savov.
    4. Introduction to Probability (STAT110x) on edX: Really well designed and accessible online introduction to foundational probability maths. While statistics isn't the same as probability, statistics is fundamentally about probabilities, so having this basis will make your life infinitely easier in stats modules.
  2. Fundamental probability and statistics
    1. Just for awareness at this point, to avoid confusion since they're briefly alluded to here and there: Section 1.14 of Savov covering complex numbers (depending on how secure you are in the "school maths" from the first sub-list, it could be worth running through all of Chapter 1 to avoid gaps).
    2. An introduction to logic, proofs, and set theory and Boolean algebra (including truth tables); e.g., chapters 1 - 3 of Devlin's Sets, Functions, and Logic: An Introduction to Abstract Mathematics. This is the more formal mathematical language used in sources in further steps below; statistics modules and textbooks might also implicitly assume at least some familiarity with the concepts.
    3. Introduction to Probability by Hwang & Blitzstein, chapters 1 - 4. This is the book the STAT110x online course is based on, but the next step after completing the course is to really work through the book - including the (standard) exercises - you can trust the authors that they're doable; more difficult exercises are clearly marked. It's a time investment but I found it very worth it, as someone who wanted to start properly understanding scientific methods. Subsequent topics in this list will assume a good grasp of probability concepts covered in the book (although which specific ones will vary per topic).
    4. Quick digression specifically for the arithmetic series, since that'll be assumed to be known below. This Intro video on Khan academy covers it.
    5. Revisit and consolidate basic calculus, since the next bit on probability will heavily involve that: Kline's Calculus: An Intuitive and Physical Approach, up to chapter 12, especially sections 1, 2, 3.1 - 3.2, 5.1 - 5.2 and 5.4, 6.3 - 6.7, 7.1 - 7.2 and 7.7, 8.1 - 8.2, 9.1 - 9.4, 12.1-12.4. In this context, the other sections are more useful for general awareness than for actually needing the machinery for finding formulas for integrals yourself; so I think those can be skimmed in order to get back to Hwang & Blitzstein. (See also an interesting free alternative on OpenStax.)
    6. Introduction to Probability by Hwang & Blitzstein, chapter 5 onwards. I'd suggest studying up to at least conditional expectations, since those will come up a lot in basic statistical methods.
    7. Further linear algebra, needed for subsequent techniques like regression or Principal Component Analysis. E.g., chapters 4 through 6.6 of Savov's No Bullshit Guide to Linear Algebra. I'd also recommend starting on the great Strang lectures here, up to projections and least squares; but be aware that the best edition of the associated book might not be the latest (6th) one (the online course materials refer to the 4th and 5th, most consistently the 4th which is what the assignment numbers refer to, although the assignments are repeated in the solution PDFs).
    8. Regression by Bingham & Fry, fully covers linear regression including the contents of the usual mathematical black box where Psychology statistics teaching ends.
    9. Finish the Strang lectures.