Maths foundations for statistics and machine learning
While (and after) being a COVID hermit, I took/re-took a journey from basic maths to try to use the momentum to get better at the mathematics of statistics and machine learning. It was a winding path but, with hindsight, the list below shows the steps I would have taken in an ideal world (in order, for the first two sub-lists) with the resources I found most helpful along the way (or, in some cases - Devlin, Kline, Jones & Jones - ones I didn't use myself originally but found when looking for a reference to fill out a point on the list). The first sub-list could be good preparation for statistics modules in e.g. an undergraduate psychology course.The list
- General prerequisites
- Back-to-basics to make sure you're not missing anything that'll trip you up later: The first four parts of the MathTrackX series on edX: Polynomials, Functions and Graphs; Special Functions; Differential Calculus; Integral Calculus. These is about having the "school maths" and numeracy that everything else will assume is known, similarly to literacy. If you find it's needed to go back further, to arithmetic or wherever you need to start, then that's where to start instead!
- Any introduction to very basic linear algebra and matrices. E.g., Savov's No Bullshit Guide to Linear Algebra, chapters 2 and 3. This is mostly to learn the simple but critical language of how data are typically represented and organized - in columns and rows - and later on there will also be necessary maths building on these basic concepts.
- An introduction to the language of proofs and sets: Section 1.16 of Savov.
- Introduction to Probability (STAT110x) on edX: Really well designed and accessible online introduction to foundational probability maths. While statistics isn't the same as probability, statistics is fundamentally about probabilities, so having this basis will make your life infinitely easier in stats modules.
- Fundamental probability and statistics
- Just for awareness at this point, to avoid confusion since they're briefly alluded to here and there: Section 1.14 of Savov covering complex numbers (depending on how secure you are in the "school maths" from the first sub-list, it could be worth running through all of Chapter 1 to avoid gaps).
- An introduction to logic, proofs, and set theory and Boolean algebra (including truth tables); e.g., chapters 1 - 3 of Devlin's Sets, Functions, and Logic: An Introduction to Abstract Mathematics. This is the more formal mathematical language used in sources in further steps below; statistics modules and textbooks might also implicitly assume at least some familiarity with the concepts.
- Introduction to Probability by Hwang & Blitzstein, chapters 1 - 4. This is the book the STAT110x online course is based on, but the next step after completing the course is to really work through the book - including the (standard) exercises - you can trust the authors that they're doable; more difficult exercises are clearly marked. It's a time investment but I found it very worth it, as someone who wanted to start properly understanding scientific methods. Subsequent topics in this list will assume a good grasp of probability concepts covered in the book (although which specific ones will vary per topic).
- Quick digression specifically for the arithmetic series, since that'll be assumed to be known below. This Intro video on Khan academy covers it.
- Revisit and consolidate basic calculus, since the next bit on probability will heavily involve that: Kline's Calculus: An Intuitive and Physical Approach, sections 1, 2, 3.1 - 3.2, 5.1 - 5.2 and 5.4, 6.3 - 6.7, 7.1 - 7.2 and 7.7, 8.1 - 8.2, 9.1 - 9.4, 12.1-12.4, 20.1-20.6. In this context, the other sections are more useful for general awareness than for actually needing the machinery for finding formulas for integrals yourself; so I think those can be skimmed or returned to if necessary (and sections noted as non essential for continuity skipped) in order to get back to Hwang & Blitzstein.
- Introduction to Probability by Hwang & Blitzstein, chapter 5 onwards. I'd suggest studying up to at least conditional expectations, since those will come up a lot in basic statistical methods.
- Further linear algebra, needed for subsequent techniques like regression or Principal Component Analysis. E.g., chapters 4 through 6.6 of Savov's No Bullshit Guide to Linear Algebra. I'd also recommend starting on the great Strang lectures here, up to projections and least squares; but be aware that the best edition of the associated book might not be the latest (6th) one (the online course materials refer to the 4th and 5th, most consistently the 4th which is what the assignment numbers refer to, although the assignments are repeated in the solution PDFs anyway).
- Regression by Bingham & Fry, fully covers linear regression including the contents of the usual mathematical black box where Psychology statistics teaching ends.
- Finish the Strang lectures.
- Further statistics and machine learning
- Principal Component Analysis (PCA): e.g., chapter 15 in Shalizi's Advanced Data Analysis from an Elementary Point of View. PCA is used a lot inside other methods and serves as an examplar of dimension reduction in general.
- The Elements of Statistical Learning by Hastie, Tibshirani and Friedman. This isn't particularly accessibly written (although that's all relative! but let's say it won't handhold and will assume foreknowledge or willingness to look up further information) but it's the Bible of machine learning - I recognized a lot of the book in other texts on machine learning after reading it. I just read this through for the concepts and used relevant parts for reference when working in detail on something.
- Sutton & Barto's Reinforcement Learning. The Bible of reinforcement learning, working up to, e.g., actor-critic models.
- Continued maths
- Probability and Measure by Billingsley. Another one I just read versus doing exercises, but even that's worth it to see what the fuss is about and to not be intimidated by talk about "sigma algebras".
- Mendelson's Introduction to Topology. Maybe beyond what's strictly needed for data analysis purposes so more for general interest, but I thought this was an amazingly well-written book, especially the didactic thought that clearly went into exercises (the last chapter maybe a little spottier). It covers ideas about the "skin" connecting elements of sets to turn them into spaces with a concept of nearness; lots of vaguely familiar concepts get fully explained. (For awareness, based on reviews I read the specific terminology is dated.)
- An introduction to the very basics of number theory: Jones and Jones' Elementary Number Theory, chapters 1 through 3. This is included here mainly for completeness and since it could pop up in exercises in other books as expected knowledge.