LLM Inference in TEE

LLM ( Large Language Model) inference in TEE can protect the model, input prompt or output. The key challenges are:

  1. the performance of LLM inference in TEE (CPU)

  2. can LLM inference run in TEE?

With the significant LLM inference speed-up brought by BigDL-LLM, and the Occlum LibOS, now high-performance and efficient LLM inference in TEE could be realized.

Overview

../_images/occlum-llm.pngLLM inference

Above is the overview chart and flow description.

For step 3, users could use the Occlum init-ra AECS solution which has no invasion to the application.

More details please refer to LLM demo.