Generating ContextScript

Intro

The experiment in this post tries to understand if the machine can actually generate valid ContextScript. The experiment was performed by asking the machine to repeatedly generate 20-ish statements about a specific subject. From the walkthrough in a previous post, the answer seems to be a resounding "yes". Unfortunately that is not the case, but the good news is that the issues I've identified seem to be solvable with some light modifications to the language and the way prompts are structured.

Issues

1. Weight boundaries

Sometimes the LLM will not respect the boundaries of the weight value [0.0, 1.0]. This means that it will generate weights well above the maximum value of 1.0. This happened with the prompt: "Please generate roughly twenty statements in contextscript. Make it about the Solar System". The machine would generate the planets in order, then use the verb orbits with increasing weights.

2. Misunderstanding constraints

Sometimes the LLM will use a node's name instead of it's id in an edge.

The best solution I found to this both the issues above, is to follow a two step approach: first ask it to generate nodes, aka concepts, then in the second step, ask it to draw relationships between the nodes by generating edges. This brings us to the third problem:

2. Hallucinating nodes

When following the two steps approach above, the machine will respect the weight constraints but it will hallucinate nodes. This happens especially when asked to generate a large number of connections (a hundred connections for the twenty nodes mentioned above) The hallucinated nodes are related and relevant to the subject under "discussion", but they have not been generated prior to use in the Edge creation statement. I cannot frankly see a way to solve this problem, and I don't think that prompt engineering is the solution, since one cannot just say "pretty please don't hallucinate" and have it happen. This leaves three options:

Ignore the whole run, and have the machine generate until a fully correct answer is given.
Discard the offending edge.
Ask the machine to generate the missing node.

Option one is obviously not the solution. One cannot determine how many iterations it will take until the machine will generate a correct answer, if ever. Option two could work, and assuming the phantom nodes are not that many, a mostly correct graph would result. This leaves us with the problem that if using prose instead of a constrained language such as ContextScript, the machine may have drawn a potentially relevant connection. Which is why option three seems like the best approach to me.

Generating the missing node

In order to do that, there are some architectural issues. Consider the program below:

If we take the Edge statement in isolation, we have little to no information about the nodes themselves. Let's presume that the Venus node (node_id="rocky_planet") was the one hallucinated by the LLM. In this case we would have:

There is simply not enough information in node_from="rocky_planet" to deduce the fact that Venus was the intended planet. Based on the id, it could have been any of the four: Mercury, Venus, Earth or Mars.

Possible solution

Modify the Edge statements to include a free-form metadata parameter. This should be used by the machine to describe the connection (Edge). If a node instance is hallucinated, ask the machine to generate the data and name parameters based on that description and the relationship itself.

A hypothetical Edge with metadata might look like this:

At this point the metadata could be used to ask the machine to generate missing nodes. Notice that it contains enough information to unambiguously identify Venus as the missing node, regardless of what the node_id parameter suggests. The metadata itself is not relevant for graph correctness, it is merely an error correction system, and it can be discarded during the graph storage phase.

What now?

The next step in the ContextBot's life cycle is to modify the language so that the Edge data type will hold free form metadata as discussed above. If it's doable, I will expand the interpreter and discuss my findings in a new post.

Stay tuned.