AI Learning Record – Building Simple LLM API with QWEN as an Example

Current large language models all have accompanying HTTP APIs, such as the one mentioned in this article. Using it is not complicated, but the parameters required to use it seem to be a bit excessive. In fact, building your own LLM API involves receiving a POST or GET request on an HTTP server, and then returning the response obtained from the LLM’s chat API to the HTTP server.

This self-built approach feels more customizable to some extent, of course it is also very likely that I am not familiar with the official API.

To customize this service, it’s actually just necessary to add Python HTTP functionality to a general CLI example. Here’s an example using Qwen, which records my own code.

LLM API program

Language model-related functions.

  • load_model_tokenizer(), ask():  load_model_tokenizer() loads the model, ask() gets the prompt responses from the LLM.

Functions related to HTTP:

  • class S, run(): The former is the base class for an HTTP server, which handles GET and POST requests. The latter is the method to start the service.

Save the above code and after starting it, you can connect on port 8080.

Using the GET method, just append “/?q=” followed by the question to the end of the browser URL. The following is the answer to “who is Trump,” with the %20 inserted by the browser itself.

To use POST, you can accomplish simple tasks by using a shell script with curl.


The advantage of building your own API is that you can customize some keywords to achieve special functions, such as starting or stopping the history feature, or instantly grabbing network data. Although it’s not very smart, it is indeed more practical.

Leave a Reply(Name請以user_開頭,否則會被判定會垃圾息)

請輸入答案 × 3 = 27