An Empirical Study of Mamba-Based Language Models

43 points by panabee 1 year ago | 3 comments
  • jiggawatts 1 year ago
    What’s the largest Mamba model that has been trained so far?

    Seems like it scales better than transformers, but this would only be really obvious at parameter counts far in excess of the experiments in this paper.

    • Lerc 1 year ago
      I feel like this is the problem of do we send a spaceship on a hundred year journey or spend a decade trying to make a spaceship that could do the trip in fifty years.

      The rate of improvements seems quick enough right now that if you started training a huge model now that you might regret spending all that money on an architecture that is obsolete by the time you are finished.

      That said, if you keep waiting you never get around to it.

      It would be nice to see a large parameter mamba family data point though.

      • edflsafoiewq 1 year ago
        The problem with LLMs is there's no apparent analytical theory and no way to scale up from small results. The only way to find out how fast the spaceship is is to build it, make the whole journey, and then see how long it took.
    • 1 year ago
      • 1 year ago