Why ContextScript?

As mentioned in a previous post, I had the idea of creating a memory store for LLMs based on a directed graph (semantic network). As for the name, it's simple: the client was supposed to provide context for chat conversations so I called it ContextBot. Then I realised I needed a language the LLM and the client could use to communicate graph operations, so it seemed only natural at this point to call it ContextScript.

The idea was to create a language that would be simple to understand, simple to parse and simple to write in. It was clear from the idea of a directed graph that it would need two data types, i.e. Node and Edge, and three operations, create, update, and delete. I've also decided to use the semicolon as a statement end.

What follows is an introduction to the language. While the language itself is extremely simple, the text that follows is relatively long, so prepare your favorite brew and maybe some biscuits before marching on. Also please keep in mind that the current implementation is a work in progress and not all constraints are enforced.

Data types and variables declaration

As mentioned above, there are two data types in the language, Node and Edge. Declaration and instantiation of a variable are done by writing the variable in the source code as in the examples below.

String

A string within the ContextScript language is composed of any combination of letters, digits, the minus ( - ) the underscore ( _ ), the comma ( , ), the period ( . ) or the single quote ( ' ) symbols. All strings must be enclosed in double quotes.

Float

The float is only used to define weights for the Edge type.

Node

A Node is intended to store concepts on which a user (machine or human) can reason. Each of the Node argument values is a string enclosed in double quotes. A Node has the following properties:

  • node_id - ids must be unique within the context of the graph. This also means that a Node and an Edge cannot have the same id. It can be any combination of letters, digits, the underscore ( _ ) and the minus sign ( - ).
  • name - shorthand description of the node data, encoded as a String.
  • data - whatever concept the Node needs to express, encoded as a String.

Node declaration

Node(
    node_id = "value",
    name = "value",
    data = "value");

Edge

An Edge stores a relationship between two Nodes. All Edge argument values are Strings, except for the weight which is a Float. An Edge has the following properties:

  • edge_id - ids must be unique within the context of the graph. This also means that a Node and an Edge cannot have the same id. It can be any combination of letters, digits, the underscore ( _ ) and the minus sign ( - ).
  • from_node - must be a previously declared Node's id
  • to_node - must be a previously declared Node's id
  • verb - a double quote enclosed string made of letters, numbers, the underscore ( _ ) and the minus sign ( - ).
  • weight - a floating point number in the range [0.0, 1.0]

Edge declaration

Edge(
    edge_id = "value", 
    from_node = "value", 
    to_node = "value",
    verb = "value",
    weight = value);

Variable update

A variable can be updated by re-declaring a previously declared variable while keeping the same id:

# This is the original variable.
Node(
    node_id = "node_1_id",
    name = "Name",
    data = "Whatever data");

# Update
Node(
    node_id = "node_1_id",
    name = "Name",
    data = "Whatever data has been changed");

The same can be done for variables of the Edge type.

Variable deletion

Deletion happens by referencing an element's (Node or Edge) id:

del(id="node_1_id");

Rules and Invariants

At the end of the program, the following must be true:

  • Each edge must connect two nodes, which means that each edge must have a from_node and a to_node. No dangling references are allowed.
  • If a node is deleted, all edges connected to it must also be deleted.
  • All edge weights must be in the range [0.0, 1.0]
  • Re-declaring a graph element with an existing id is an update. The del operator takes an id as its only argument. This means that:
  • A. Before and after execution ids must be unique
  • B. In the program text a variable cannot be re-declared with an existing id if that id belongs to a different datatype.
  • All nodes must be declared before an edge can use them

EBNF

letter = "A" … "Z" | "a" … "z" ;
digit = "0" … "9" ;
float = ["-"] digit {digit} "." digit {digit};
symbol = "-" | "_" ;
punctuation = "," | "." | "'";
str_delim = """;

name = letter, {letter | digit | symbol};
argument_name = letter, {letter | digit | symbol};
id = (letter | digit), {letter | digit | symbol};
verb = letter, {letter | digit | symbol};
str = str_delim, { letter | digit | symbol | punctuation }, str_delim;
assignment = argument_name, "=", (name | id | verb | str | float);

node_decl = "Node", "(", assignment, { ",", assignment }, ")", ";";
edge_decl = "Edge", "(", assignment, { ",", assignment }, ")", ";";
deletion = "del", "(", "id", "=", id, ")", ";"

statement = node_decl | edge_decl | deletion;
program = { statement };

Conclusion

If you got to this point, thank you for reading.

And if you're interested in what comes next:

-> The ContextBot REPL