One problem with having a cell oriented key value store is dealing with large cells. Specifically when modifying these cells, the paradigm of copy on write creates potential memory pressure.

One alternative is to update in place for cells over a certain size and accept the concurrency and cache invalidation implications. This approach still requires the ability to undo or rollback.

Another alternative is to treat the memory trunk as a large linked list split into 3 logical stores...vertex, edges, properties (properties will have sub stores). This is similar in concept to Neo4j storage. Having fixed record sizes offers a convenience when parsing data, to find offsets. This convenience is not important if using a jagged array approach however, where the IDs are directly used as the array index. The advantage being considered here is the ability to manage concurrency at a small, consistently sized scope.

A disadvantage is that this is a small case of O(n) when traversing these scoped linked lists. This can be optimized by distributing property and edge types as odds/evens, with the odds being pre-pended and the evens appended. This can allow 2 concurrent traversal directions when matching or eliminate a traversal direction and reduce n by ~50% (if balanced).

A key difference from the neo4j implementation is rooted in the idea that an end vertex can be on a remote partition. Because of this case, edge representations must be split into 2 parts, one for each node. This introduces some write complexity when treating the edge as a single logical unit.

A vertex record will have the following structure (trunk ID is implied by location, cell ID is index):
Transaction ID 8 bytes
First Edge ID 4 bytes
First Property ID 4 bytes
16 bytes

An edge record will have the following structure (trunk ID is implied by location, edge ID is index):
Transaction ID 8 bytes
Previous Edge ID 4 bytes
Next Edge ID 4 bytes
End Vertex Trunk ID 2 bytes
End Vertex Local ID 4 bytes
Edge Type 4 bytes
First Property ID 4 bytes
30 bytes

Property records will var in size by type:
Transaction ID 8 bytes
Next Property ID 4 bytes
Previous Property ID 4 bytes
Property Type 1 byte
Property Value (2-16 bytes, depending on store)
min 19 bytes, max 35 bytes, considering variable string storage

The property store(s) may have to have a compressed form of block storage, TDB.

A variation on this design is to split linkage into a dedicated store. In this case we would have:
- A vertex store
- An edge linkage store
- An edge store
- A vertex property linkage store
- An edge property linkage store
- various property stores

The possible advantage to this approach is a single variable length record can be kept for linkage that can be subject to different transaction processing rules than actual data. Also, the list of links can be read from one pointer instead of chasing pointers. Adds can be appended or inserted, and deletes can be de-fragmented on the spot.

What would each of these stores look like? Also, would they be 'index aligned'?

Vertex Store
Transaction ID 8 bytes
Edge Linkage ID 4 bytes
Property Linkage ID 4 bytes

Edge Linkage Store (8 bytes n * 14 bytes variable record size)+
Transaction ID 8 bytes
Edge Reference 14 bytes 1-*
Property Linkage ID 4 bytes
End Vertex ID 4 bytes
End Trunk ID 2 bytes
Edge Type 4 bytes

Property Linkage Store (16 byte fixed record size)
Transaction ID 8 bytes
Property Type 4 bytes
Property Value ID 4 bytes

...various property stores, by type...

Last edited Oct 18, 2014 at 9:20 PM by bkjuice, version 25