Encrypted GPT2 Inference

In this demo, we encrypt the user's input and then run the gpt2 model on that encrypted data. Openvector computes the next token and sends it to the server, which relays it to the client for streaming, and this repeats for the number of tokens required. The model is currently deployed on AWS c5a.16xlarge compute machine and works optimally for upto 20 tokens.

Enter your prompt:

Number of tokens to generate: