April 10, 2026, 10:46 p.m. by Gabe R
In this one, we will go through installing both ContextBot and discuss the infrastructure required to run an LLM locally. I will not give full instructions, since there are a lot of systems out there. I will just provide a list of things that need installing and the parameters I'm running KoboldCPP under.
ContextBot uses plugins to connect to LLM APIs. The only plugin available right now, handles connections to KoboldCPP running the Gemma-3 model from Google. The plugin interface is quite easy to extend, and for those who are interested it can be found in the ContextBot repo in the net_runner package.
ContextBot is not a finished product as of now, it's more of an environment where I plan to test the feasibility of the whole idea.
Before going into it, there are a few things I'd like to mention, especially for those who like me, have only dealt with the high end AIs like ChatGPT and Claude before jumping into local inference.
1. Running an LLM locally, especially on limited hardware is nothing like the flagship AIs
A local AI will obviously have reduced capabilities, and it will not "feel" as smart. Don't get me wrong, some of them can definitely keep up a conversation, just don't expect Claude level smarts.
2. Make sure your hardware is up for the job
The main bottleneck for running models locally is the GPU, more specifically the GPU VRAM. So make sure to download a model which can fit. For example I'm using a laptop equipped with an RTX 3050 with 4 GB of VRAM. It can "comfortably" fit the gemma-3-4b-it.Q6_K model, which only takes 3.2 GB. Quantisation is your friend here, and your mileage may vary.
3. What I'm running
I'll share my specs, just in case anyone is curious. I'm running a three year old Asus ROG laptop equipped with:
I also need to mention that I'm a Linux guy, so I kinda' assume that ContextBot will be run under Linux. It should definitely work on any OS which supports Python, it's just at this stage of the project I'm not too worried about it.
In order to run an LLM locally, besides the actual hardware we need the model itself (the LLM), and a piece of software which can load and run it while providing an interface for the user. For that purpose I'm using KoboldCPP.
In order to get your machine ready to run a model, the following should be installed:
All of the above are usually available in any major Linux distribution's repo.
After everything is installed, start KoboldCPP with the following:
koboldcpp-linux-x64 --contextsize 9216 --model models/gemma-3-4b-it.Q6_K.gguf --port 5001where:
Once the above is running, there are two ways to interact with the LLM:
First things first, install Python on your machine, if it's not already there. I've developed ContextBot under Python 3.13.7.
mkdir env_dir
python -m venv --clear env_dir
source env_dir/bin/activatemkdir contextbot
git clone https://github.com/byterows-dev/contextbot.git contextbot
cd contextbot
pip install -r requirements.txtpython contextbot.pyThe above should give you access to the REPL, and you should be greeted with the following prompt: "Command:". It uses something similar to Vim commands, and executing :h should give you the help screen:
:q - exit
:i - invokes the interpreter. Press Enter when done
:m - shows memory. The memory is updated each time a successful statement is parsed by the interpreter
:s - sends an user input to the llm and streams the response
:d - sends an user input to the llm and waits for the response
:h - helpLet's go through a small usage example, while explaining some of the underlying concepts. At this point KoboldCPP should be running in the background and serving requests at http://localhost:5001.
If you look in the root directory of the repo, you can see a file named pre_prompt.md. That file contains an explanation of the ContextScript language which will be added to every prompt before the user input.
Type :s followed by Enter. This instructs the REPL to send a command to the LLM and stream the response. The :d command does roughly the same thing, it just waits for the LLM to finish reasoning before it outputs anything. Then type "Give me ten statements in ContextScript about a subject of your liking". At this point, the machine should start streaming the response. What I got back was the following:
Node(name="GPT-3", data="Large Language Model", node_id="gpt3_123");
Node(name="LLM Research", data="Field of study", node_id="llm_research_456");
Edge(edge_id="1", from_node="gpt3_123", to_node="llm_research_456", verb="studies", weight=0.95);
Node(name="Neural Networks", data="Core technology", node_id="neural_nets_789");
Edge(edge_id="2", from_node="gpt3_123", to_node="neural_nets_789", verb="uses", weight=0.88);
Node(name="Data Training", data="Process of learning", node_id="data_training_101");
Edge(edge_id="3", from_node="gpt3_123", to_node="data_training_101", verb="requires", weight=0.75);
Node(name="Natural Language Processing", data="Domain of expertise", node_id="nlp_202");
Edge(edge_id="4", from_node="gpt3_123", to_node="nlp_202", verb="is_part_of", weight=0.92);
Node(name="Text Generation", data="Output capability", node_id="text_gen_303");
Edge(edge_id="5", from_node="gpt3_123", to_node="text_gen_303", verb="performs", weight=0.85);
Node(name="Human Interaction", data="User engagement", node_id="human_int_404");
Edge(edge_id="6", from_node="gpt3_123", to_node="human_int_404", verb="facilitates", weight=0.70);The best way to check if a ContextScript program is valid, is to pass it to the interpreter. This can be done with the :i command. Paste the code above, and press Enter. If the code is correct, the interpreter will respond with "INTERPRETER OK.". Otherwise, it will print an error message. For example, if I try to pass the following statement: Node(name="Sun", data="Star at the center", node_id="sun_123") the interpreter will reply with "On line 0. All statements must end with a semicolon."
The interpreter holds an internal representation of the graph. Please keep in mind that at this point in the project's lifetime the graph is lost every time the program exits.
To see the memory represented as JSON, issue the :m command. The output for the graph above is:
{
"NODES": {
"gpt3_123": "Node(name=GPT-3, node_id=gpt3_123, data=Large Language Model)",
"llm_research_456": "Node(name=LLM Research, node_id=llm_research_456, data=Field of study)",
"neural_nets_789": "Node(name=Neural Networks, node_id=neural_nets_789, data=Core technology)",
"data_training_101": "Node(name=Data Training, node_id=data_training_101, data=Process of learning)",
"nlp_202": "Node(name=Natural Language Processing, node_id=nlp_202, data=Domain of expertise)",
"text_gen_303": "Node(name=Text Generation, node_id=text_gen_303, data=Output capability)",
"human_int_404": "Node(name=Human Interaction, node_id=human_int_404, data=User engagement)"
},
"EDGES": {
"1": "Edge(id=1, from_node=gpt3_123, verb=studies, to_node=llm_research_456, weight=0.95)",
"2": "Edge(id=2, from_node=gpt3_123, verb=uses, to_node=neural_nets_789, weight=0.88)",
"3": "Edge(id=3, from_node=gpt3_123, verb=requires, to_node=data_training_101, weight=0.75)",
"4": "Edge(id=4, from_node=gpt3_123, verb=is_part_of, to_node=nlp_202, weight=0.92)",
"5": "Edge(id=5, from_node=gpt3_123, verb=performs, to_node=text_gen_303, weight=0.85)",
"6": "Edge(id=6, from_node=gpt3_123, verb=facilitates, to_node=human_int_404, weight=0.7)"
},
"ADJACENT": {
"gpt3_123": {
"1": "",
"2": "",
"3": "",
"4": "",
"5": "",
"6": ""
},
"llm_research_456": {
"1": ""
},
"neural_nets_789": {
"2": ""
},
"data_training_101": {
"3": ""
},
"nlp_202": {
"4": ""
},
"text_gen_303": {
"5": ""
},
"human_int_404": {
"6": ""
}
}
}As it can be seen, there are three objects: NODES, EDGES and ADJACENT. NODES and EDGES are self explanatory. ADJACENT holds all the Edges going from or leading to a specific Node.
Hopefully the above serves as a decent technical explanation on what the whole project is about, and how to use the software. I also hope it provides some clarity on how I envision the concept of LLM knowledge expansion without retraining.
In the next post, I'm planning to go through some long overdue experiments which should check some of the basic premises necessary for the project's success.
->Next: Experiments