Maths foundations for statistics and machine learning

While (and after) being a COVID hermit, I took/re-took a journey from basic maths to try to use the momentum to get better at "formal" (i.e., real) mathematics for statistics and machine learning. It was a winding path but, with hindsight, the list below shows the steps I would have taken in an ideal, efficient world, in order (the first six steps anyway), with the resources I found most helpful along the way (or, in the case of Devlin, one I didn't use myself originally but found when looking for a reference to fill out the list). For an interested psychology student missing a solid maths background, I think even the first four will make a big difference in being prepared for stats modules.

I've tried to indicate the minimal subset of chapters needed per book, per step, but if additional concepts do pop up they should be findable elsewhere in the same books.

General point with maths books, especially earlier on when you're building skills: You have to do the exercises. Just reading them does pretty much nothing - it's all about active learning. A lot of maths books don't provide answers, and while this can be shocking, dealing with that is how you really learn. Some books have special difficult exercises, which are helpfully marked in some way; I tend to skip them personally, the normal ones taking enough time, and I'll usually have some personal project to have something more complex to use the knowledge in. However, this "git gud" aspect doesn't excuse everything: Some maths books have a questionable attitude about making things arbitrarily difficult or unclear - I'd phrase this as "withholding scaffolding". Some, and these could well be the same books, are sloppy about presenting readers with unfair (not the same as challenging) exercises, where you need to have some factoid or foreknowledge a reader might not be aware of. I'm avoiding recommending any of those, at least without caveats - I think they're didactically awful, especially for independent study, and suspect they play a significant role in widespread negative attitudes to maths.

Preparation for undergraduate statistics modules

  1. Back-to-basics to make sure you're not missing anything that'll trip you up later: The first four parts of the MathTrackX series on edX: Polynomials, Functions and Graphs; Special Functions; Differential Calculus; Integral Calculus. These is about having the "school maths" and numeracy that everything else will assume is known, similarly to literacy. If you find it's needed to go back further, to arithmetic or wherever you need to start, then that's where to start instead!
  2. A first introduction to logic, proofs, and set theory and Boolean algebra (including truth tables); e.g., chapters 1 - 3 of Devlin's Sets, Functions, and Logic: An Introduction to Abstract Mathematics. This is kind of "the language of mathematics" that is used all the time in later steps.
  3. Any introduction to very basic linear algebra and matrices. E.g., Savov's No Bullshit Guide to Linear Algebra, chapters 2 and 3. This is mostly to learn the simple but critical language of how data are typically represented and organized - in columns and rows - and later on there will also be necessary maths building on these basic concepts.
  4. Introduction to Probability (STAT110x) on edX: Really well designed and accessible online introduction to foundational probability maths. While statistics isn't the same as probability, statistics is fundamentally about probabilities, so having this basis will make your life infinitely easier in stats modules.

Preparation for statistics for social scientists

  1. Introduction to Probability by Hwang & Blitzstein. This is the book the online course is based on, but the next step after completing the course is to really work through the book - including the (standard) exercises - you can trust the authors that they're doable. It's a time investment but I found it very worth it, as someone who wanted to start properly understanding scientific methods. Subsequent topics in this list will assume a good grasp of probability concepts covered here (although which specific ones will vary per topic). I'd suggest studying up to at least conditional expectations, since those will come up a lot in basic statistical methods.
  2. Further linear algebra, needed for subsequent techniques like regression or Principal Component Analysis. E.g., chapters 4 through 6.6 of Savov's No Bullshit Guide to Linear Algebra. A full, maximally rigorous treatment would be good to add here but I haven't yet found one I'd wholeheartedly recommend for that.
  3. Regression by Bingham & Fry, fully covers linear regression including the contents of the usual mathematical black box where Psychology statistics teaching ends.

Further reading

  1. Principal Component Analysis (PCA): e.g., chapter 15 in Shalizi's Advanced Data Analysis from an Elementary Point of View. PCA is used a lot inside other methods and serves as an examplar of dimension reduction in general.
  2. The Elements of Statistical Learning by Hastie, Tibshirani and Friedman. This isn't particularly accessibly written (although that's all relative! but let's say it won't handhold and will assume foreknowledge or willingness to look up further information) but it's the Bible of machine learning - I recognized a lot of the book in other texts on machine learning after reading it. I just read this through for the concepts and used relevant parts for reference when working in detail on something.
  3. Sutton & Barto's Reinforcement Learning. The Bible of reinforcement learning, working up to, e.g., actor-critic models.
  4. Probability and Measure by Billingsley. Another one I just read versus doing exercises, but even that's worth it to see what the fuss is about and to not be intimidated by talk about "sigma algebras".
  5. Mendelson's Introduction to Topology. Maybe beyond what's strictly needed for data analysis purposes so more for general interest, but I thought this was an amazingly well-written book, especially the didactic thought that clearly went into exercises (the last chapter maybe a little spottier). It covers ideas about the "skin" connecting elements of sets to turn them into spaces with a concept of nearness; lots of vaguely familiar concepts get fully explained. (For awareness, based on reviews I read the specific terminology is dated.)