Silkenweb Example: Hackernews Clone

AITemplate, a revolutionary new inference engine by Meta AI

73 points by azurezyq 2 years ago | 35 comments

haolu7 2 years ago
AITemplate-PyTorch Stable Diffusion is the fastest stable diffusion inference solution by pushing image generation below one second on A100 (batch 1: 0.7s / 25 steps, 1.3s / 50 steps; batch 3: 1.6s / 25 steps, per image 0.55s; batch 16 7.9s / 25 steps, per image 0.49s) for the first time, 2.57X faster than Keras' XLA-based GPU compilation solution.
More benchmark numbers and repro at: https://github.com/facebookincubator/AITemplate/tree/main/ex...
- Llamamoe 2 years ago
  Wow. Considering that with the better samplers you can reduce steps to 10-15, this is getting close to near-instant results.
  One or two more optimizations and we're gonna have live-update results.
- tveita 2 years ago
  This lists "OOM" for PyTorch on a RTX 3080-10GB, but I believe people have optimized the PyTorch SD model to run on even 6GiB GPUs.
  Would AITemplate be able to run with those constraints?
  - ipiszy 2 years ago
    RTX 3080-10GB should work. You could check https://github.com/facebookincubator/AITemplate/tree/main/ex..., and https://www.reddit.com/r/StableDiffusion/comments/xv7m89/met....
- PresentHarmony 2 years ago
  Or if you count in another way. In one second, how many pictures it will be able to generate, with these parameters. It could be 1.05, 1.1, or say 1.5 or even 2 pictures. Thank you very much for your post! I will be very grateful for the answer!
- PresentHarmony 2 years ago
  Can you please eloborate, how many milliseconds does it take to generate 1 image with these wonderful improvements? I will be very grateful for your answer! Thank you very much!
- PresentHarmony 2 years ago
  Do I get it right, it takes 0.55 second or 0.49 second to generate an image depending on the batch?
  Thank you so much for your post! I would be very grateful for the response!
  - ipiszy 2 years ago
    Yes this is correct. batch 16 7.9s / 25 steps, per image 0.49s: it generates 16 images for each prompt within 7.9s, so it's 0.49s per image.
    - PresentHarmony 2 years ago
      One more question, if you don't mind. 1 image is generated in 0.7 seconds (25 steps ) and the same single image with 50 steps will be generated in 1.3 seconds. So it's much cheaper to generate more images for the same promt. Am I right or am I missing something ? Thanks in advance for your answer.
      P.S. Though it should be 1.4 seconds. 0.7*2=14.If you think twice the speps, twice the time.
    - PresentHarmony 2 years ago
      Thank you indeed, my friend!
ghoomketu 2 years ago
For all the hate that Facebook gets their only redeeming quality is these open source projects they have been releasing all along.
Maybe this is to attract better engineers but all in all this has been a net postive for software development. So credit where it is due.
- version_five 2 years ago
  Yes, it's hard to know in the balance whether the net contribution of these advertising companies (fb and google mainly) is a net positive, but their contribution to ML research is unmatched and has created an insane amount of value (I'd speculate rivaling their market caps but someone can probably prove me wrong) in business and research that uses the tools they've built.
  - ETH_start 2 years ago
    The net impact of these companies is massively positive. Facebook, with its trust-engendering social graph, enables huge numbers of businesses and social groups to exist that otherwise couldn't while Google has enabled so much information discovery that we just take for granted now.
    Of course I would argue there's a better way to provide these kinds of services that concentrates power less, and that's decentralization with cryptoeconomic incentives to maintain consensus, but for their generation, they did well.
azurezyq 2 years ago
https://github.com/facebookincubator/AITemplate
- yinghai83 2 years ago
  Very impressive results!
ipiszy 2 years ago
tl;dr:
Meta is open sourcing AITemplate, an inference engine for both Nvidia and AMD GPUs. Code: https://github.com/facebookincubator/AITemplate.
AITemplate delivers much better perf (1.9x ~ 12.8x) compared to PyTorch eager on SOTA models, including Bert, ResNet, VIT and StableDiffusion.
AITemplate also delivers high perf numbers using AMD GPUs (MI-250). With AITemplate, MI-250 achieves 80% ~ 96% A100 perf on various ResNet / Bert / VIT models.
AITemplate uses sophisticated fusion techniques to optimize perf, including vertical, horizontal, and memory fusions.
btw, I'm one of the authors of AITemplate, happy to answer any questions.
- Narew 2 years ago
  How did AITemplate performance to state of art inference engine like tvm or onnx runtime ? Did AITemplate optimize/quantify network?
  Edit: link for TVM https://tvm.apache.org/
  - ipiszy 2 years ago
    AITemplate only supports fp16 data types with fp16 or fp32 accumulation right now. We are working on supporting more data types and quantization.
    We don't have an official comparison between AITemplate and tvm / onnx for now, but we do have perf numbers like https://github.com/facebookincubator/AITemplate/tree/main/ex..., https://github.com/facebookincubator/AITemplate/tree/main/ex.... Feel free to run these examples on other frameworks and compare perf.
  - davidatbu 2 years ago
    I'd love to hear about this too: especially after running the model through an onnx optimizer, like this one [0].
    [0] https://github.com/daquexian/onnx-simplifier
- throwaway81523 2 years ago
  Thanks, that is very helpful. Do you have to train the model differently for use with AITemplate? Could it be helpful for Leela Chess Zero (LC0)? I think LC0 has a generic Pytorch backend, that is several times slower than its NVidia specific CUDA backend. I'm not very clueful about this stuff though.
  - haolu7 2 years ago
    No, you don't need to train the model differently to use it with AITemplate. Here is an intro example to do inference with AITemplate with a very simple PyTorch model: https://facebookincubator.github.io/AITemplate/tutorial/how_.... For more advanced examples, check out https://github.com/facebookincubator/AITemplate/tree/main/ex...
  - ipiszy 2 years ago
    As @haolu7 mentioned, you could take a pre-trained model and use AITemplate to do model inference. All you need to do is to re-write the model using AITemplate frontend and map PyTorch params to AITemplate params. Besides, AITemplate has a limited operator coverage compared to mature frameworks like PyTorch so you may need to implement your own kernels if necessary (though it already supports Bert, VIT, StableDiffusion, ResNet, Detectron, and general recommendation models).
- fooblaster 2 years ago
  How does the performance compare with tensor rt? I didn't see any benchmarks comparing against that. I expect it to be lower for now, but excited for see what the future brings.
- upbeat_general 2 years ago
  Do you know of any good explanations of the techniques you used for those who only touch PyTorch Eager + occasionally torchscript?
  - ipiszy 2 years ago
    You could check "AITemplate optimizations" section in the blog (https://ai.facebook.com/blog/gpu-inference-engine-nvidia-amd...), and https://github.com/facebookincubator/AITemplate#more-about-a.... The basic idea is to do aggressive kernel fusions.
- papersnake 2 years ago
  Have you tested this on big models involving multi-gpu communication, or any plans?
  - ipiszy 2 years ago
    For now it's for single GPU inference only.
- pretty_dumm_guy 2 years ago
  How do you verify the correctness of your fusion operation ?
  - ipiszy 2 years ago
    We have a bunch of unittests and E2E tests to compare numeric numbers between AITemplate and PyTorch eager.
house_road 2 years ago
It supports both Nvidia and AMD, and both got pretty good speedup. This is a great achievement!
enoch2090 2 years ago
How would this perform compared with Tensorflow?
devcat 2 years ago
Sadly it doesn't have Apple GPU backend
- mbroncano 2 years ago
  It mentions it is in the works
throwaway81523 2 years ago
Tldr?
- theflyingelvis 2 years ago
  Unfortunately your comment was too long. I didn’t read it. Try being more succinct next time.