TensorRT-LLM runtime now open-source

4 points by mmoskal 3 months ago | 1 comment
  • mmoskal 3 months ago
    Previously, the "Executor" runtime was shipped as binary blobs. This is the bit that schedules requests and manages KV cache (similar to vLLM or SGLang server).
    • 3 months ago