Experiments

Intro

The REPL was frankly a bit of an afterthought. I had this whole idea in mind about how ContextBot's whole architecture would be, and I jumped straight to the most interesting part, which was implementing the interpreter. Went through a few iterations of that before deciding to do it properly, then I realised I've made some pretty bold assumptions, namely: "can an LLM actually understand and use the ContextScript language?" and "Would an LLM be able to correctly use a predefined set of verbs?"

So after the rush of "OMG I just created a language and wrote an interpreter" (idea that not too long ago sounded like black magic to me) dissolved, I realised that I have not tested the most basic assumptions on which the whole project relied. And if one or both of the questions above are "no", I'm left with three options: either it's not doable, it's doable with caveats, or maybe a more powerful LLM running on a more powerful system can do it.

The above is the reason why I've decided to divert the project from a full-fledged LLM interface with memory, to an environment where experiments can be performed.

In all honesty, I already ran some informal tests using the language, both on my machine with Gemma-3, and on higher level LLMs like Claude and ChatGPT.

Based on those, I think the most realistic goal at this point is to to optimise ContextBot in order to make it behave as well as it can using my current hardware, with the hope that something more powerful will have an easier time with it. But I'm getting ahead of myself.

Let's give an overview of the experiments, the way they will be performed, and what the results will mean for the project. Also please keep in mind that the experiments will be followed by changes in the software, and the process will be iterated until I will decide that I cannot improve the software anymore.

The experiments will try to answer the questions I hinted at above:

Can an LLM actually generate statements in the ContextScript language?
Can it take a random piece of text and translate it in ContextScript?
Can it do the reverse?
If I were to define a set of verbs, would a low level LLM be able to respect those without hallucinating or using them in unintended ways?

All of the questions above will be tested by asking the machine to perform the task given the appropriate data. If the answer to questions one to three is a resounding "no", then the project at least as I see it, is dead in the water. It means that the LLMs I have access to are not able to fundamentally understand ContextScript.

Question four is more of an open question. If the machine cannot respect a predefined set of verbs, it could be asked to generate its own. While this could work, it would most probably lead to duplicate edges which fundamentally express the same connection. Take for example:

The following two edges express the same thing:

This presents quite a few issues, the least of which is graph size. If a predefined set of verbs proves unfeasible, an attempt could be made to merge machine generated edges based on semantic closeness of the verbs generated, which presents its own set of problems.

All of the above is of course purely speculative. The next post will explore if the machine can actually generate valid ContextScript.

-> Generating ContextScript