Installing and running ContextBot

Intro

In this one, we will go through installing both ContextBot and discuss the infrastructure required to run an LLM locally. I will not give full instructions, since there are a lot of systems out there. I will just provide a list of things that need installing and the parameters I'm running KoboldCPP under.

ContextBot uses plugins to connect to LLM APIs. The only plugin available right now, handles connections to KoboldCPP running the Gemma-3 model from Google. The plugin interface is quite easy to extend, and for those who are interested it can be found in the ContextBot repo in the net_runner package.

ContextBot is not a finished product as of now, it's more of an environment where I plan to test the feasibility of the whole idea.

Checking expectations

Before going into it, there are a few things I'd like to mention, especially for those who like me, have only dealt with the high end AIs like ChatGPT and Claude before jumping into local inference.

1. Running an LLM locally, especially on limited hardware is nothing like the flagship AIs
A local AI will obviously have reduced capabilities, and it will not "feel" as smart. Don't get me wrong, some of them can definitely keep up a conversation, just don't expect Claude level smarts.
2. Make sure your hardware is up for the job
The main bottleneck for running models locally is the GPU, more specifically the GPU VRAM. So make sure to download a model which can fit. For example I'm using a laptop equipped with an RTX 3050 with 4 GB of VRAM. It can "comfortably" fit the gemma-3-4b-it.Q6_K model, which only takes 3.2 GB. Quantisation is your friend here, and your mileage may vary.
3. What I'm running
I'll share my specs, just in case anyone is curious. I'm running a three year old Asus ROG laptop equipped with:

An AMD Ryzen 7 4800H CPU
16 GB of RAM
2X 512 GB SSD
Nvidia RTX 3050 GPU with 4 GB of VRAM
Which is kind of meh for the intended purpose since it gives me roughly ten tokens a second and a lot of the times acts as a little space heater.

I also need to mention that I'm a Linux guy, so I kinda' assume that ContextBot will be run under Linux. It should definitely work on any OS which supports Python, it's just at this stage of the project I'm not too worried about it.

Running LLMs

In order to run an LLM locally, besides the actual hardware we need the model itself (the LLM), and a piece of software which can load and run it while providing an interface for the user. For that purpose I'm using KoboldCPP.

In order to get your machine ready to run a model, the following should be installed:

The proprietary NVIDIA drivers for your card
The CUDA toolkit and appropriate libraries
The KoboldCPP software

All of the above are usually available in any major Linux distribution's repo.

After everything is installed, start KoboldCPP with the following:

where:

contextsize - specifies the size of the context in tokens. A bit of a brief explanation, the context is made of the prompt, system instructions and the machine's response. So in the above case all of it needs to be less than 9216 tokens
model - is obviously the absolute or relative path to the model to be used
port - the port to connect to

Once the above is running, there are two ways to interact with the LLM:

The first one is via API, which is what ContextBot does
Second, just open http://localhost:5001 in your browser, which will give you a chat window

ContextBot

First things first, install Python on your machine, if it's not already there. I've developed ContextBot under Python 3.13.7.

1. Create and activate a virtual environment

2. Download and initialise ContextBot

3. Run it

The above should give you access to the REPL, and you should be greeted with the following prompt: "Command:". It uses something similar to Vim commands, and executing :h should give you the help screen:

Walkthrough

Let's go through a small usage example, while explaining some of the underlying concepts. At this point KoboldCPP should be running in the background and serving requests at http://localhost:5001.

The pre_prompt file

If you look in the root directory of the repo, you can see a file named pre_prompt.md. That file contains an explanation of the ContextScript language which will be added to every prompt before the user input.

Sending a message to the LLM

Type :s followed by Enter. This instructs the REPL to send a command to the LLM and stream the response. The :d command does roughly the same thing, it just waits for the LLM to finish reasoning before it outputs anything. Then type "Give me ten statements in ContextScript about a subject of your liking". At this point, the machine should start streaming the response. What I got back was the following:

Checking the machine's output

The best way to check if a ContextScript program is valid, is to pass it to the interpreter. This can be done with the :i command. Paste the code above, and press Enter. If the code is correct, the interpreter will respond with "INTERPRETER OK.". Otherwise, it will print an error message. For example, if I try to pass the following statement: Node(name="Sun", data="Star at the center", node_id="sun_123") the interpreter will reply with "On line 0. All statements must end with a semicolon."

Internal memory

The interpreter holds an internal representation of the graph. Please keep in mind that at this point in the project's lifetime the graph is lost every time the program exits.
To see the memory represented as JSON, issue the :m command. The output for the graph above is:

As it can be seen, there are three objects: NODES, EDGES and ADJACENT. NODES and EDGES are self explanatory. ADJACENT holds all the Edges going from or leading to a specific Node.

Conclusion

Hopefully the above serves as a decent technical explanation on what the whole project is about, and how to use the software. I also hope it provides some clarity on how I envision the concept of LLM knowledge expansion without retraining.

In the next post, I'm planning to go through some long overdue experiments which should check some of the basic premises necessary for the project's success.

->Next: Experiments