The matrix calculus you need for deep learning (2018)
224 points by cpp_frog 1 year ago | 40 comments- dang 1 year agoRelated:
The matrix calculus you need for deep learning (2018) - https://news.ycombinator.com/item?id=26676729 - April 2021 (40 comments)
Matrix calculus for deep learning part 2 - https://news.ycombinator.com/item?id=23358761 - May 2020 (6 comments)
Matrix Calculus for Deep Learning - https://news.ycombinator.com/item?id=21661545 - Nov 2019 (47 comments)
The Matrix Calculus You Need for Deep Learning - https://news.ycombinator.com/item?id=17422770 - June 2018 (77 comments)
Matrix Calculus for Deep Learning - https://news.ycombinator.com/item?id=16267178 - Jan 2018 (81 comments)
- quanto 1 year agoThe article/webpage is a nice walk-through for the uninitiated. Half the challenge of doing matrix calculus is remembering the dimension of the object you are dealing with (scalar, vector, matrix, higher-dim tensor).
Ultimately, the point of using matrix calculus (or matrices in general) is not just concision of notation but also understanding that matrices are operators acting on members of some spaces, i.e. vectors. It is this higher level abstraction that makes matrices powerful.
For people who are familiar with the concepts but need a concise refresher, the Wikipedia page serves well:
- PartiallyTyped 1 year agoAdding, these operators are also "polymorphic"; for matrix multiplication the only operations you need are (non commutative) multiplication and addition; thus you can use elements of any non-commutative ring, i.e. a set of elements with those two operations :D
Matrices themselves form non-commutative rings too; and based on this, you can think of a 4N x 4N matrix as a 4x4 matrix whose elements are NxN matrices [1] :D
[1] https://youtu.be/FX4C-JpTFgY?list=PL49CF3715CB9EF31D&t=1107
You already know whose lecture it is :D
I love math.. I should have become a mathematician ...
- tikhonj 1 year agoYou can even generalize linear algebra algorithms to closed semirings and have some really cool algorithms pop out, like finding the shortest path in graphs. There's a great paper called "Fun with Semirings" that goes into more details; unfortunately looks like the PDF isn't easily available online any more, but I found some slides[1] that seem to cover the same ideas well enough.
[1]: https://pdfs.semanticscholar.org/2e43/477e26a54b2d1a046c2140...
- PartiallyTyped 1 year agoOkay I went over the slides and good lord this would have made my life easier not too long ago.
- PartiallyTyped 1 year agoThis deserves its own HN post imho.
- PartiallyTyped 1 year ago
- mrfox321 1 year agoRe [1]: it's fairly concrete to simply say that matrix multiplication can be performed block-wise.
- PartiallyTyped 1 year agoI don’t disagree; but that is just an example of MM. The gist is not that you can do block multiplication; but that you can define matrices over any non commutative ring, which includes other matrices - ie blocks.
- PartiallyTyped 1 year ago
- tikhonj 1 year ago
- PartiallyTyped 1 year ago
- SnooSux 1 year agoThis is the resource I wish I had in 2018. Every grad school course had a Linear Algebra review lecture but never got into the Matrix Calculus I actually needed.
- ayhanfuat 1 year agoThat was my struggle, too. Imperial College London has a small online course which covers similar topics (https://www.coursera.org/learn/multivariate-calculus-machine...). It helped a lot.
- unpaddedantacid 1 year agoI just finished my first year in an AI bachelors, we saw Linear Algebra with basic matrix calculations and theorems, so much calculus that the notes take up 3GB space, physics, phycology and very outdated logic classes and basics to python which left many of the students wondering how to import a library
- dpflan 1 year agoTrue, this was a designated resource during my studies (2020/2022), but they were post-2018.
- ayhanfuat 1 year ago
- cs702 1 year agoPlease change the link to the original source:
https://arxiv.org/abs/1802.01528
---
EDIT: It turns out explained.ai is the personal website of one of the authors, so there's no need to change the link. See comment below.
- parrt 1 year ago:) Yeah, I use my own internal markdown to generate really nice html (with fast latex-derived images for equations) and then full-on latex. (tool is https://github.com/parrt/bookish)
I prefer reading on the web unless I'm offline. The latex its super handy for printing a nice document.
- cs702 1 year agoEven though it's shockingly common, I never cease to be surprised and delighted when authors who are on HN take the time to reply to comments about their work.
Thank you for doing this with Jeremy and sharing it with the world!
- parrt 1 year agoSure thing! Very enjoyable to have people use our work.
- parrt 1 year ago
- cs702 1 year ago
- liorben-david 1 year agoExplained.ai seems to be Terrence Parr's personal site
- cs702 1 year agoThank you for pointing it out. I edited my comment.
- cs702 1 year ago
- parrt 1 year ago
- trolan 1 year agoI finished Vector Calculus last year and have no experience in machine learning but this seems exceptionally thorough and would have made my life easier having a practical explanation over a mathematical one, but woe is the life of the engineering student I guess.
- parrt 1 year agoGlad to be of assistance! Yeah, It really annoyed me that this critical information was not listed in any one particular spot.
- parrt 1 year ago
- rdedev 1 year agoI had followed this when I was learning DL through Andrew NG's course. In one of the lessons, he had the formula for calculating the loss as well as it's derivatives.
I tried driving these formulas from scratch using what I learned from OP's post but it felt like there was something missing. I think it boils down to me not knowing how to aggregate those element wise derivatives into a matrix form. Afaik the Matrix cookbook and certain notes from Stanford cs231n that helped me grok it fully
- bluerooibos 1 year agoOh nice, I did most of this in school, and during my non-CS engineering degree. Thanks for sharing!
Always wanted to dip my toes into ML, but I've never been convinced of it's usefulness to the average solo developer, in terms of things you can build with this new knowledge. Likely I don't know enough about it to make that call though.
- williamcotton 1 year agoHere’s an ML project I’ve been working on as a solo dev:
https://github.com/williamcotton/chordviz
Labeling software in React, CNN in PyTorch, prediction on app in SwiftUI. 12,000 and counting hand labeled images of my hand on a guitar fretboard!
- williamcotton 1 year ago
- godelski 1 year agoThere's a common belief that you don't need math for ML or that you need a lot of math for ML. So let me clarify:
You don't need math to make a model perform well, but you do need math to know why your model is wrong.
- nsajko 1 year agoAnother matrix math reference: https://github.com/r-barnes/MatrixForensics
- _the_inflator 1 year agoI just had a glimpse look at it. A good sum-up.
It seems that these topics are covered by the first one or two semesters of a Math degree. Of course university is a bit more advanced.
- jayro 1 year agoWe just released a comprehensive online course on Multivariable Calculus (https://mathacademy.com/courses/multivariable-calculus), and we also have a course on Mathematics for Machine Learning (https://mathacademy.com/courses/mathematics-for-machine-lear...) that covers just the matrix calculus you need in addition to just the linear algebra and statistics you need, etc. I'm a founder and would be happy to answer any questions you might have.
- thewataccount 1 year agoI understand you don't have a free trial, is there any chance you have a demo somewhere of what it actually looks like though? Like a tiny sample lesson or something along those lines? It looks interesting but I'm just uncertain as to what it actually "feels" like in practice vs lets say Brilliant, etc.
I only see pictures, I'm curious the extent of the interaction in the linear algebra/matrix calc specifically
- jayro 1 year agoThat's a good point! We definitely need to add some more information to the website. In the meantime, if you send an email to support@mathacademy.com, I'd be happy to give you demo over Zoom and answer any questions you might have.
- jayro 1 year ago
- barrenko 1 year agoWhom do you think Mathematics for Machine Learning benefits? In my personal opinion the only audience for a plethora of courses and articles available in that regard is useful mostly to the people that recently went through college level Linear Algebra.
I'd like more resources geared for people that are done with Khan Academy and want something as well made for more advanced topics.
- jayro 1 year agoThe Mathematics for Machine Learning course doesn't assume knowledge of Linear Algebra, but covers the basics of Linear Algebra you'll need along with the basics of Multivariable Calculus, Statistics, Probability, etc. it does however, assume knowledge of high-school math and Single Variable Calculus. If you've been out of school for while, our adaptive diagnostic exam will identify your knowledge gaps and create a custom course for you that includes the necessary remediation.
If you're REALLY rusty (maybe you've been out of school for a while 5+ years), or maybe you just never learned the material that well in the first place, then you might want to start with one of our Mathematical Foundations courses that will scaffold you up to the level where you can handle the content in Mathematics for Machine Learning. More info can be found here: https://mathacademy.com/courses
The Mathematics for Machine Learning course would be ideal for anyone who majored in a STEM subject like CS (or at least has a solid mathematical foundation) and is interested in doing work in machine learning.
- barrenko 1 year agoAppreciate the reply, hopefully subscribing to your service beginning of next year (after I am done with Khan Academy math).
- barrenko 1 year ago
- jayro 1 year ago
- thewataccount 1 year ago
- thatsadude 1 year agovec(ABC)=kron(C.T,A)vec(C) is all your need for matrix calculus!
- esafak 1 year agoCan anyone provide an intuitive explanation?
- fjkdlsjflkds 1 year agoI guess op meant "vec(ABC)=kron(B.T,A)vec(C)", and my attempt at explaining it would be:
If you take the result of transforming the columns vectors in the C matrix by AB and vectorize it you get the same as vectorizing first C and then transforming it by a block matrix obtained as the Kronecker product of B transposed and A.
The significance is that it performs a reduction of matrix calculus to vector calculus (i.e., it shows that you can convert any matrix calculus operation/formula/statement into a vector calculus operation/formula/statement).
- hayasaki 1 year agoThey have an error in their formula, but the vectorized form(stacking columns of the matrix to form a vector) of the triple matrix multiplication(A times B times C) can be changed to a form involving kronecker products against another vectorized matrix.
I wouldn't say that is everything, but it is a useful trick.
- esafak 1 year agoThat is just reading out the equation in English. My question is, why is it so?
- esafak 1 year ago
- fjkdlsjflkds 1 year ago
- esafak 1 year ago
- scrubs 1 year agoDarn good post!