The matrix calculus you need for deep learning (2018)

224 points by cpp_frog 1 year ago | 40 comments
  • dang 1 year ago
    Related:

    The matrix calculus you need for deep learning (2018) - https://news.ycombinator.com/item?id=26676729 - April 2021 (40 comments)

    Matrix calculus for deep learning part 2 - https://news.ycombinator.com/item?id=23358761 - May 2020 (6 comments)

    Matrix Calculus for Deep Learning - https://news.ycombinator.com/item?id=21661545 - Nov 2019 (47 comments)

    The Matrix Calculus You Need for Deep Learning - https://news.ycombinator.com/item?id=17422770 - June 2018 (77 comments)

    Matrix Calculus for Deep Learning - https://news.ycombinator.com/item?id=16267178 - Jan 2018 (81 comments)

    • quanto 1 year ago
      The article/webpage is a nice walk-through for the uninitiated. Half the challenge of doing matrix calculus is remembering the dimension of the object you are dealing with (scalar, vector, matrix, higher-dim tensor).

      Ultimately, the point of using matrix calculus (or matrices in general) is not just concision of notation but also understanding that matrices are operators acting on members of some spaces, i.e. vectors. It is this higher level abstraction that makes matrices powerful.

      For people who are familiar with the concepts but need a concise refresher, the Wikipedia page serves well:

      https://en.wikipedia.org/wiki/Matrix_calculus

      • PartiallyTyped 1 year ago
        Adding, these operators are also "polymorphic"; for matrix multiplication the only operations you need are (non commutative) multiplication and addition; thus you can use elements of any non-commutative ring, i.e. a set of elements with those two operations :D

        Matrices themselves form non-commutative rings too; and based on this, you can think of a 4N x 4N matrix as a 4x4 matrix whose elements are NxN matrices [1] :D

        [1] https://youtu.be/FX4C-JpTFgY?list=PL49CF3715CB9EF31D&t=1107

        You already know whose lecture it is :D

        I love math.. I should have become a mathematician ...

        • tikhonj 1 year ago
          You can even generalize linear algebra algorithms to closed semirings and have some really cool algorithms pop out, like finding the shortest path in graphs. There's a great paper called "Fun with Semirings" that goes into more details; unfortunately looks like the PDF isn't easily available online any more, but I found some slides[1] that seem to cover the same ideas well enough.

          [1]: https://pdfs.semanticscholar.org/2e43/477e26a54b2d1a046c2140...

          • PartiallyTyped 1 year ago
            Okay I went over the slides and good lord this would have made my life easier not too long ago.
            • PartiallyTyped 1 year ago
              This deserves its own HN post imho.
            • mrfox321 1 year ago
              Re [1]: it's fairly concrete to simply say that matrix multiplication can be performed block-wise.
              • PartiallyTyped 1 year ago
                I don’t disagree; but that is just an example of MM. The gist is not that you can do block multiplication; but that you can define matrices over any non commutative ring, which includes other matrices - ie blocks.
          • SnooSux 1 year ago
            This is the resource I wish I had in 2018. Every grad school course had a Linear Algebra review lecture but never got into the Matrix Calculus I actually needed.
            • ayhanfuat 1 year ago
              That was my struggle, too. Imperial College London has a small online course which covers similar topics (https://www.coursera.org/learn/multivariate-calculus-machine...). It helped a lot.
              • unpaddedantacid 1 year ago
                I just finished my first year in an AI bachelors, we saw Linear Algebra with basic matrix calculations and theorems, so much calculus that the notes take up 3GB space, physics, phycology and very outdated logic classes and basics to python which left many of the students wondering how to import a library
                • dpflan 1 year ago
                  True, this was a designated resource during my studies (2020/2022), but they were post-2018.
                • cs702 1 year ago
                  Please change the link to the original source:

                  https://arxiv.org/abs/1802.01528

                  ---

                  EDIT: It turns out explained.ai is the personal website of one of the authors, so there's no need to change the link. See comment below.

                  • parrt 1 year ago
                    :) Yeah, I use my own internal markdown to generate really nice html (with fast latex-derived images for equations) and then full-on latex. (tool is https://github.com/parrt/bookish)

                    I prefer reading on the web unless I'm offline. The latex its super handy for printing a nice document.

                    • cs702 1 year ago
                      Even though it's shockingly common, I never cease to be surprised and delighted when authors who are on HN take the time to reply to comments about their work.

                      Thank you for doing this with Jeremy and sharing it with the world!

                      • parrt 1 year ago
                        Sure thing! Very enjoyable to have people use our work.
                    • liorben-david 1 year ago
                      Explained.ai seems to be Terrence Parr's personal site
                      • cs702 1 year ago
                        Thank you for pointing it out. I edited my comment.
                    • trolan 1 year ago
                      I finished Vector Calculus last year and have no experience in machine learning but this seems exceptionally thorough and would have made my life easier having a practical explanation over a mathematical one, but woe is the life of the engineering student I guess.
                      • parrt 1 year ago
                        Glad to be of assistance! Yeah, It really annoyed me that this critical information was not listed in any one particular spot.
                      • rdedev 1 year ago
                        I had followed this when I was learning DL through Andrew NG's course. In one of the lessons, he had the formula for calculating the loss as well as it's derivatives.

                        I tried driving these formulas from scratch using what I learned from OP's post but it felt like there was something missing. I think it boils down to me not knowing how to aggregate those element wise derivatives into a matrix form. Afaik the Matrix cookbook and certain notes from Stanford cs231n that helped me grok it fully

                        • bluerooibos 1 year ago
                          Oh nice, I did most of this in school, and during my non-CS engineering degree. Thanks for sharing!

                          Always wanted to dip my toes into ML, but I've never been convinced of it's usefulness to the average solo developer, in terms of things you can build with this new knowledge. Likely I don't know enough about it to make that call though.

                          • williamcotton 1 year ago
                            Here’s an ML project I’ve been working on as a solo dev:

                            https://github.com/williamcotton/chordviz

                            Labeling software in React, CNN in PyTorch, prediction on app in SwiftUI. 12,000 and counting hand labeled images of my hand on a guitar fretboard!

                          • godelski 1 year ago
                            There's a common belief that you don't need math for ML or that you need a lot of math for ML. So let me clarify:

                            You don't need math to make a model perform well, but you do need math to know why your model is wrong.

                            • nsajko 1 year ago
                              Another matrix math reference: https://github.com/r-barnes/MatrixForensics
                              • _the_inflator 1 year ago
                                I just had a glimpse look at it. A good sum-up.

                                It seems that these topics are covered by the first one or two semesters of a Math degree. Of course university is a bit more advanced.

                                • jayro 1 year ago
                                  We just released a comprehensive online course on Multivariable Calculus (https://mathacademy.com/courses/multivariable-calculus), and we also have a course on Mathematics for Machine Learning (https://mathacademy.com/courses/mathematics-for-machine-lear...) that covers just the matrix calculus you need in addition to just the linear algebra and statistics you need, etc. I'm a founder and would be happy to answer any questions you might have.
                                  • thewataccount 1 year ago
                                    I understand you don't have a free trial, is there any chance you have a demo somewhere of what it actually looks like though? Like a tiny sample lesson or something along those lines? It looks interesting but I'm just uncertain as to what it actually "feels" like in practice vs lets say Brilliant, etc.

                                    I only see pictures, I'm curious the extent of the interaction in the linear algebra/matrix calc specifically

                                    • jayro 1 year ago
                                      That's a good point! We definitely need to add some more information to the website. In the meantime, if you send an email to support@mathacademy.com, I'd be happy to give you demo over Zoom and answer any questions you might have.
                                    • barrenko 1 year ago
                                      Whom do you think Mathematics for Machine Learning benefits? In my personal opinion the only audience for a plethora of courses and articles available in that regard is useful mostly to the people that recently went through college level Linear Algebra.

                                      I'd like more resources geared for people that are done with Khan Academy and want something as well made for more advanced topics.

                                      • jayro 1 year ago
                                        The Mathematics for Machine Learning course doesn't assume knowledge of Linear Algebra, but covers the basics of Linear Algebra you'll need along with the basics of Multivariable Calculus, Statistics, Probability, etc. it does however, assume knowledge of high-school math and Single Variable Calculus. If you've been out of school for while, our adaptive diagnostic exam will identify your knowledge gaps and create a custom course for you that includes the necessary remediation.

                                        If you're REALLY rusty (maybe you've been out of school for a while 5+ years), or maybe you just never learned the material that well in the first place, then you might want to start with one of our Mathematical Foundations courses that will scaffold you up to the level where you can handle the content in Mathematics for Machine Learning. More info can be found here: https://mathacademy.com/courses

                                        The Mathematics for Machine Learning course would be ideal for anyone who majored in a STEM subject like CS (or at least has a solid mathematical foundation) and is interested in doing work in machine learning.

                                        • barrenko 1 year ago
                                          Appreciate the reply, hopefully subscribing to your service beginning of next year (after I am done with Khan Academy math).
                                    • thatsadude 1 year ago
                                      vec(ABC)=kron(C.T,A)vec(C) is all your need for matrix calculus!
                                      • esafak 1 year ago
                                        Can anyone provide an intuitive explanation?
                                        • fjkdlsjflkds 1 year ago
                                          I guess op meant "vec(ABC)=kron(B.T,A)vec(C)", and my attempt at explaining it would be:

                                          If you take the result of transforming the columns vectors in the C matrix by AB and vectorize it you get the same as vectorizing first C and then transforming it by a block matrix obtained as the Kronecker product of B transposed and A.

                                          The significance is that it performs a reduction of matrix calculus to vector calculus (i.e., it shows that you can convert any matrix calculus operation/formula/statement into a vector calculus operation/formula/statement).

                                          • hayasaki 1 year ago
                                            They have an error in their formula, but the vectorized form(stacking columns of the matrix to form a vector) of the triple matrix multiplication(A times B times C) can be changed to a form involving kronecker products against another vectorized matrix.

                                            I wouldn't say that is everything, but it is a useful trick.

                                            • esafak 1 year ago
                                              That is just reading out the equation in English. My question is, why is it so?
                                        • scrubs 1 year ago
                                          Darn good post!