AITemplate, a revolutionary new inference engine by Meta AI

73 points by azurezyq 2 years ago | 35 comments
  • haolu7 2 years ago
    AITemplate-PyTorch Stable Diffusion is the fastest stable diffusion inference solution by pushing image generation below one second on A100 (batch 1: 0.7s / 25 steps, 1.3s / 50 steps; batch 3: 1.6s / 25 steps, per image 0.55s; batch 16 7.9s / 25 steps, per image 0.49s) for the first time, 2.57X faster than Keras' XLA-based GPU compilation solution.

    More benchmark numbers and repro at: https://github.com/facebookincubator/AITemplate/tree/main/ex...

    • Llamamoe 2 years ago
      Wow. Considering that with the better samplers you can reduce steps to 10-15, this is getting close to near-instant results.

      One or two more optimizations and we're gonna have live-update results.

      • tveita 2 years ago
        This lists "OOM" for PyTorch on a RTX 3080-10GB, but I believe people have optimized the PyTorch SD model to run on even 6GiB GPUs.

        Would AITemplate be able to run with those constraints?

      • PresentHarmony 2 years ago
        Or if you count in another way. In one second, how many pictures it will be able to generate, with these parameters. It could be 1.05, 1.1, or say 1.5 or even 2 pictures. Thank you very much for your post! I will be very grateful for the answer!
        • PresentHarmony 2 years ago
          Can you please eloborate, how many milliseconds does it take to generate 1 image with these wonderful improvements? I will be very grateful for your answer! Thank you very much!
          • PresentHarmony 2 years ago
            Do I get it right, it takes 0.55 second or 0.49 second to generate an image depending on the batch?

            Thank you so much for your post! I would be very grateful for the response!

            • ipiszy 2 years ago
              Yes this is correct. batch 16 7.9s / 25 steps, per image 0.49s: it generates 16 images for each prompt within 7.9s, so it's 0.49s per image.
              • PresentHarmony 2 years ago
                One more question, if you don't mind. 1 image is generated in 0.7 seconds (25 steps ) and the same single image with 50 steps will be generated in 1.3 seconds. So it's much cheaper to generate more images for the same promt. Am I right or am I missing something ? Thanks in advance for your answer.

                P.S. Though it should be 1.4 seconds. 0.7*2=14.If you think twice the speps, twice the time.

                • PresentHarmony 2 years ago
                  Thank you indeed, my friend!
            • ghoomketu 2 years ago
              For all the hate that Facebook gets their only redeeming quality is these open source projects they have been releasing all along.

              Maybe this is to attract better engineers but all in all this has been a net postive for software development. So credit where it is due.

              • version_five 2 years ago
                Yes, it's hard to know in the balance whether the net contribution of these advertising companies (fb and google mainly) is a net positive, but their contribution to ML research is unmatched and has created an insane amount of value (I'd speculate rivaling their market caps but someone can probably prove me wrong) in business and research that uses the tools they've built.
                • ETH_start 2 years ago
                  The net impact of these companies is massively positive. Facebook, with its trust-engendering social graph, enables huge numbers of businesses and social groups to exist that otherwise couldn't while Google has enabled so much information discovery that we just take for granted now.

                  Of course I would argue there's a better way to provide these kinds of services that concentrates power less, and that's decentralization with cryptoeconomic incentives to maintain consensus, but for their generation, they did well.

              • azurezyq 2 years ago
                • yinghai83 2 years ago
                  Very impressive results!
                • ipiszy 2 years ago
                  tl;dr:

                  Meta is open sourcing AITemplate, an inference engine for both Nvidia and AMD GPUs. Code: https://github.com/facebookincubator/AITemplate.

                  AITemplate delivers much better perf (1.9x ~ 12.8x) compared to PyTorch eager on SOTA models, including Bert, ResNet, VIT and StableDiffusion.

                  AITemplate also delivers high perf numbers using AMD GPUs (MI-250). With AITemplate, MI-250 achieves 80% ~ 96% A100 perf on various ResNet / Bert / VIT models.

                  AITemplate uses sophisticated fusion techniques to optimize perf, including vertical, horizontal, and memory fusions.

                  btw, I'm one of the authors of AITemplate, happy to answer any questions.

                  • Narew 2 years ago
                    How did AITemplate performance to state of art inference engine like tvm or onnx runtime ? Did AITemplate optimize/quantify network?

                    Edit: link for TVM https://tvm.apache.org/

                  • throwaway81523 2 years ago
                    Thanks, that is very helpful. Do you have to train the model differently for use with AITemplate? Could it be helpful for Leela Chess Zero (LC0)? I think LC0 has a generic Pytorch backend, that is several times slower than its NVidia specific CUDA backend. I'm not very clueful about this stuff though.
                    • haolu7 2 years ago
                      No, you don't need to train the model differently to use it with AITemplate. Here is an intro example to do inference with AITemplate with a very simple PyTorch model: https://facebookincubator.github.io/AITemplate/tutorial/how_.... For more advanced examples, check out https://github.com/facebookincubator/AITemplate/tree/main/ex...
                      • ipiszy 2 years ago
                        As @haolu7 mentioned, you could take a pre-trained model and use AITemplate to do model inference. All you need to do is to re-write the model using AITemplate frontend and map PyTorch params to AITemplate params. Besides, AITemplate has a limited operator coverage compared to mature frameworks like PyTorch so you may need to implement your own kernels if necessary (though it already supports Bert, VIT, StableDiffusion, ResNet, Detectron, and general recommendation models).
                      • fooblaster 2 years ago
                        How does the performance compare with tensor rt? I didn't see any benchmarks comparing against that. I expect it to be lower for now, but excited for see what the future brings.
                        • upbeat_general 2 years ago
                          Do you know of any good explanations of the techniques you used for those who only touch PyTorch Eager + occasionally torchscript?
                        • papersnake 2 years ago
                          Have you tested this on big models involving multi-gpu communication, or any plans?
                          • ipiszy 2 years ago
                            For now it's for single GPU inference only.
                          • pretty_dumm_guy 2 years ago
                            How do you verify the correctness of your fusion operation ?
                            • ipiszy 2 years ago
                              We have a bunch of unittests and E2E tests to compare numeric numbers between AITemplate and PyTorch eager.
                          • house_road 2 years ago
                            It supports both Nvidia and AMD, and both got pretty good speedup. This is a great achievement!
                            • enoch2090 2 years ago
                              How would this perform compared with Tensorflow?
                              • devcat 2 years ago
                                Sadly it doesn't have Apple GPU backend
                                • mbroncano 2 years ago
                                  It mentions it is in the works
                                • throwaway81523 2 years ago
                                  Tldr?
                                  • theflyingelvis 2 years ago
                                    Unfortunately your comment was too long. I didn’t read it. Try being more succinct next time.