Maths foundations for statistics and machine learning

While (and after) being a COVID hermit, I took/re-took a journey from basic maths to try to use the momentum to get better at the mathematics of statistics and machine learning. It was a winding path but, with hindsight, the list below shows the steps I would have taken in an ideal world (in order, for the first two sub-lists) with the resources I found most helpful along the way (or, in some cases - Devlin, Kline, Jones & Jones - ones I didn't use myself originally but found when looking for a reference to fill out a point on the list).

The first sub-list could be good preparation for statistics modules in e.g. an undergraduate psychology course. The idea of the second sub-list is to interweave general maths with the probability maths that's probably the motivating material for the intended reader; this sub-list is what I think should be covered at latest by the end of a social science PhD, so you have a good enough foundation to understand typical statistics and read up on further topics and methods relatively efficiently.

Note: While I've excluded almost all books that are such didactic garbage they're pretty much sabotaging their readers (I've made allowances for what happens in "starred" sections, and there's one noted exception because the good bits are really good), only a few are really good about helping readers select necessary subsets of exercises. So there's some judgement needed; don't let ego get in the way of being efficient with time, e.g., by looking up information or solutions online at some points (it can be fun and maybe even important to heroically struggle and transcend, but not if it turns out you're just being blocked by some "cute" exercise you need some unmentioned, non-obvious factoid for).

The list

General prerequisites
1. Back-to-basics to make sure you're not missing anything that'll trip you up later: The first four parts of the MathTrackX series on edX: Polynomials, Functions and Graphs; Special Functions; Differential Calculus; Integral Calculus. These is about having the "school maths" and numeracy that everything else will assume is known, similarly to literacy. If you find it's needed to go back further, to arithmetic or wherever you need to start, then that's where to start instead!
2. Any introduction to very basic linear algebra and matrices. E.g., Savov's No Bullshit Guide to Linear Algebra, chapters 2 and 3. This is mostly to learn the simple but critical language of how data are typically represented and organized - in columns and rows - and later on there will also be necessary maths building on these basic concepts.
3. An introduction to the language of proofs and sets: Section 1.16 of Savov.
4. Introduction to Probability (STAT110x) on edX: Really well designed and accessible online introduction to foundational probability maths. While statistics isn't the same as probability, statistics is fundamentally about probabilities, so having this basis will make your life infinitely easier in stats modules.
Fundamental probability and statistics
1. Just for awareness at this point, to avoid confusion since they're briefly alluded to here and there: Section 1.14 of Savov covering complex numbers (depending on how secure you are in the "school maths" from the first sub-list, it could be worth running through all of Chapter 1 to avoid gaps).
2. An introduction to logic, proofs, and set theory and Boolean algebra (including truth tables); e.g., chapters 1 - 3 of Devlin's Sets, Functions, and Logic: An Introduction to Abstract Mathematics. This is the more formal mathematical language used in sources in further steps below; statistics modules and textbooks might also implicitly assume at least some familiarity with the concepts.
3. Introduction to Probability by Hwang & Blitzstein, chapters 1 - 4. This is the book the STAT110x online course is based on, but the next step after completing the course is to really work through the book - including the (standard) exercises - you can trust the authors that they're doable; more difficult exercises are clearly marked. It's a time investment but I found it very worth it, as someone who wanted to start properly understanding scientific methods. Subsequent topics in this list will assume a good grasp of probability concepts covered in the book (although which specific ones will vary per topic).
4. Quick digression specifically for the arithmetic series, since that'll be assumed to be known below. This Intro video on Khan academy covers it.
5. Revisit and consolidate calculus, since the next bits on probability will heavily involve that. I haven't found anything I've fully vetted and been satisfied with but Kline's Calculus: An Intuitive and Physical Approach gives a great start with chapters 1 through 9 (the non-starred subsections), explaining a lot of points in a helpful conversational/lecture style. You need 1 through 12 though for the next part of Hwang & Blitzstein. (My main issue with Kline concerns the chapters on trigonometry, 10 and 11, which are a bit of a slog that doesn't quite feel optimal or necessary to me, with all the secants and cosecants etc in addition to just sines and cosines; but the chapters aren't skippable since they introduce general points.)
6. Introduction to Probability by Hwang & Blitzstein, chapter 5.
7. Further calculus up to the basic idea of multiple integrals. Finishing Kline's Calculus: An Intuitive and Physical Approach is currently still the best single option I'm aware of (again, just the standard, non-starred sections). It's a fair bit of material but I think you can get away with reading-through more than doing many exercises for current purposes.
8. Introduction to Probability by Hwang & Blitzstein, chapter 6 onwards. I'd suggest studying up to at least conditional expectations, since those will come up a lot in basic statistical methods.
9. Further linear algebra, needed for subsequent techniques like regression or Principal Component Analysis. E.g., chapters 4 through 6.6 of Savov's No Bullshit Guide to Linear Algebra. I'd also recommend starting on the great Strang lectures here, up to projections and least squares; but be aware that the best edition of the associated book might not be the latest (6th) one (the online course materials refer to the 4th and 5th, most consistently the 4th which is what the assignment numbers refer to, although the assignments are repeated in the solution PDFs anyway).
10. Regression by Bingham & Fry, fully covers linear regression including the contents of the usual mathematical black box where Psychology statistics teaching ends.
11. Finish the Strang lectures.

Further statistics and machine learning
- Programming for Computations - Python: A Gentle Introduction to Numerical Simulations with Python 3.6. Very nice introduction to computational methods using Python, including numerical methods for integration and differential equations.
- Principal Component Analysis (PCA): e.g., chapter 15 in Shalizi's Advanced Data Analysis from an Elementary Point of View. PCA is used a lot inside other methods and serves as an examplar of dimension reduction in general.
- The Elements of Statistical Learning by Hastie, Tibshirani and Friedman. This isn't particularly accessibly written (although that's all relative! but let's say it won't handhold and will assume foreknowledge or willingness to look up further information) but it's the Bible of machine learning - I recognized a lot of the book in other texts on machine learning after reading it. I just read this through for the concepts and used relevant parts for reference when working in detail on something.
- Sutton & Barto's Reinforcement Learning. The Bible of reinforcement learning, working up to, e.g., actor-critic models.
Continued maths
- Probability and Measure by Billingsley. One I just skim-read versus doing exercises, but even that's worth it to see what the fuss is about and to not be intimidated by talk about "sigma algebras".
- Mendelson's Introduction to Topology. Maybe beyond what's strictly needed for basic data analysis purposes so more for general interest and a foundation for getting into more advanced maths, but I thought this was an amazingly well-written book, especially the didactic thought that clearly went into exercises (the very last chapter maybe a little spottier). It covers ideas about the "skin" connecting elements of sets to turn them into spaces with a concept of nearness; lots of vaguely familiar concepts get fully explained. (For awareness, based on reviews I read the specific terminology is dated.)
- An introduction to the very basics of number theory: Jones and Jones' Elementary Number Theory, chapters 1 through 3. This is included here mainly for completeness and since it could pop up in exercises in other books as expected knowledge.
- Spivak's Calculus. This is one you might want to read at some point but only when you're ready for it (I found it had a kind of didactic landmines in it) - basically when you've already done "normal" calculus courses but now want everything to be really solidly mathematically grounded, and won't get sabotaged by questionable sections.
- Edward's Advanced Calculus: A Differential Forms Approach. At the time of writing I've just started this, but it's yet another pre-21st century gem so far in actually properly explaining stuff (especially demystifying the symbols around differentials forms and integrals, so the maths you apparently need for flows and forcefields in arbitrary shapes), so I wanted to mention it.