In the previous part of this series we basically looked at how litellm and the Theia-IDE can be used to take advantage of the possibilities of AI-based software development and at the same time retain full control over our own data and costs. This article is about how the setup can be optimized. The focus lies on operating the setup as low-touch as possible.
litellm as a service – only locally
The first step is to basically always start litellm automatically at the system start and run in the background. This can best be implemented on Linux with a systemd service. There are, however, also possibilities for Windows and MacOS. If you run litellm as a Docker container, they can also be started automatically. Forgive me that I can’t cover every setup in this blog post.
As mentioned in the last part, I decided to implement as much as possible of the litellm configuration via the "config.yaml". Since I would like to save it in a git repository, first the secret data needs to be replaced with environment variables.
model_list:
- model_name: MistralSmall24BInstruct
litellm_params:
model: openai/mistralai/Mistral-Small-24B-Instruct
api_base: https://openai.inference.de-txl.ionos.com/v1
api_key: "os.environ/LITELLM_IONOS_API_KEY"
input_cost_per_token: 12e-08
output_cost_per_token: 35e-08
general_settings:
master_key: "os.environ/LITELLM_MASTER_KEY"
salt_key: "os.environ/LITELLM_SALT_KEY"
database_url: "os.environ/LITELLM_DATABASE_URL"In order to start litellm automatically after the system start, a systemd service is required. In addition to configure the start of litellm, it must of course also contain the environment variables. In addition, the service should not run with root permission. The logs of litellm are stored in a "logs" folder next to the litellm installation.
[Unit]
Description=LiteLLM Service
After=network.target
[Service]
User=benutzer
Group=benutzer
WorkingDirectory=/home/benutzer/workspace/litellm
Environment="LITELLM_IONOS_API_KEY=mein_ionos_api_key"
Environment="LITELLM_MASTER_KEY=sk-mein_sehr_sicherer_master_key"
Environment="LITELLM_SALT_KEY=sk-mein_sehr_sicherer_salt_key"
Environment="LITELLM_DATABASE_URL=meine_sehr_sichere_datenbank_url"
ExecStart=/home/benutzer/workspace/litellm/bin/python3 /home/benutzer/workspace/litellm/bin/litellm --config config.yaml
StandardOutput=append:/home/benutzer/workspace/litellm/logs/litellm.log
StandardError=append:/home/benutzer/workspace/litellm/logs/litellm.log
Restart=always
RestartSec=5
[Install]
WantedBy=multi-user.target
After creating the file, the access rights should first be adjusted. In addition, the service itself must be activated and started.
sudo chown root:root /etc/systemd/system/litellm.service
sudo chmod 600 /etc/systemd/system/litellm.service
sudo systemctl daemon-reload
sudo systemctl enable litellm.service
sudo systemctl start litellm.serviceAdjusting budgeting
Currently, we only have one user in litellm who has been assigned a budget, but a maximum available budget is not displayed. For this purpose, a team must be created and the user assigned to it. In addition, the user can currently create additional models and thus further increase the budget. Now, we want to change both. First, we create a team and add the current user. In order to do so, click on the tab "Teams" in the WebUI and create a new team. Here we can again configure a budget and set the available models. This has the advantage that we can adjust budgeting for individual users – or as in our case – tools. This way, when experimenting with new tools, we can ensure that we still remain within our set budget.
After the team is created, we can add our Theia user as a member. To do so, we select the team and go to "Add Member", search by e-mail and end the process by clicking on "Add Member". The permissions to create a new model are dealt with automatically. Although the menu item is still displayed to the user in the current litellm version, an attempt to add a new model fails due to an error message that is displayed in the browser window.
Different models for different agents
Since the various AI models provided by the IONOS Model Hub also have different costs, an optimization is worthwhile.

The Theia-IDE offers various Agents , which certainly do not all need the most expensive model. Even more exciting is the ability to combine models from other cloud providers and perhaps even local models. In the following, I will focus on the how-to of the configuration. At a later stage, it is certainly interesting to test the suitability of the different models for the different agents. If you take a look at the above table, the two coding agents from Theia and the "Code Llama 13b Instruct HF" model from IONOS come to mind. In order to introduce a new model into our environment, we first have to extend the configuration of litellm (config.yaml) with another model:
- model_name: CodeLlama13BInstruct
litellm_params:
model: openai/meta-llama/CodeLlama-13b-Instruct-hf
api_base: https://openai.inference.de-txl.ionos.com/v1
api_key: "os.environ/LITELLM_IONOS_API_KEY"
input_cost_per_token: 52e-08
output_cost_per_token: 52e-08Again, it should be noted that litellm calculates in US dollars. The token price of 0.45€ corresponds to about 0.52 US dollars. Next, litellm must be restarted with the known commands and the models assigned to both the team and the user.
Now we have to assign the new model in the "settings.json" file to the code agent in the Theia IDE. In the first step, the model itself must be configured in the file. The values "model" and "id" are important here. Both must be adapted for the Code Lama model. The rest of the values remain identical.
{
"window.titleBarStyle": "native",
"redhat.telemetry.enabled": false,
"ai-features.AiEnable.enableAI": true,
"ai-features.openAiCustom.customOpenAiModels": [
{
"model": "MistralSmall24BInstruct",
"url": "http://0.0.0.0:4000",
"id": "IONOS Mistral Small 24b Instruct",
"apiKey": "sk-api-key-des-theia-nutzers",
"developerMessageSettings": "user"
},
{
"model": "CodeLlama13BInstruct",
"url": "http://0.0.0.0:4000",
"id": "IONOS Code Llama 13b Instruct",
"apiKey": "sk-api-key-des-theia-nutzers",
"developerMessageSettings": "user"
},
],
"ai-features.languageModelAliases":{
"default/code": {
"selectedModel": "IONOS Code Llama 13b Instruct"
},
"default/code-completion": {
"selectedModel": "IONOS Code Llama 13b Instruct"
},
"default/summarize": {
"selectedModel": "IONOS Mistral Small 24b Instruct"
},
"default/universal": {
"selectedModel": "IONOS Mistral Small 24b Instruct"
},
},
}More information about the requests sent
Especially when working with unknown applications, it can be helpful to save the actual requests and not only their logs in the database. This requires adding two new values in the config.yaml file:
general_settings:
...
store_model_in_db: true
store_prompts_in_spend_logs: true
...Now, in the section "Logs" in the web UI, not only the costs and the exact number of tokens consumed for a request are now shown, but also the complete request itself. If litellm runs as a service, it must first be restarted for the changes to take effect:
sudo systemctl daemon-reload
sudo systemctl restart litellm.service That's where the litellm interface is a bit confusing: for individual requests values costs of 0.00 USD are shown, because not enough tokens were used to be relevant beyond the second descendant.
Saving Money by caching
Another possibility that can be implemented quite quickly is the use of a cache in order to save money. Using it, the same request won't be sent multiple times to the LLM, but answered from it. Although repeating the same question in software development might be quite rare, litellm makes it very easy to activate the cache. Various options are available, from a redis-, and a cloud- to two local caches. Since we want to save costs, cloud-based services aren't really interesting. Setting up a Redis server in addition to the PostgreSQL database is possible, but probably also overkill. Since my notebook currently has only eight gigabytes of memory, the local in-memory cache is not an option as well. Instead, I opt for the cache of the type "Disk". In order to use it, the "caching" package of litellm must be installed. It is important that "pip3" from the virtual Python environment must be used.
/home/benutzer/workspace/litellm/bin/pip3 install 'litellm[caching]' After the successful installation, a new section has to be added to the "config.yaml" file and the service must be restarted with the same commands as above.
litellm_settings:
cache: True
cache_params:
type: diskIn my tests with the Theia-IDE the reponse to my requests were not answered from the cache - despite sending the same request. For a later use, for which I will publish an article later, a cache is still helpful.