Cell Format

A key design element of the system is how the data will be encoded. The design objective of the cell data format is to be as compact as possible to minimize the RAM footprint. Additionally, cells are considered variable in size to accommodate as much data locality as possible. The RAM Trunk cell heap will be implemented as a jagged array to facilitate copy and swap style operations.

The cell format is greatly influenced by support for concurrency and transaction isolation and concern for extremely large cells. It is considered undesirable to have to copy a large cell on update, as large cells will often be subject to high concurrency. It is also desirable to be able to have a fixed size internal layout for Edges, since these constructs are the primary mechanism to perform graph operations.

To facilitate above, the basic design constraints for a cell are as follows:
  • Cell properties will be referenced by a single 6 byte ID (for edges and vertexes)
  • A cell that supports transactions must have 2 8 byte transaction IDs that represent the add and delete times. This is the MVCC approach similar in concept to the approach used for SQL 2014 In-Memory transaction processing.
  • Cells will be partitioned.
  • Edges will only ever be added or deleted to vertices. If a property changes on an edge, this is done against the edge specific property blob.
  • A Master Cell will refer to the property blob.

Cell Identity

A cell will be identified by a monotonically increasing 4 byte non-negative signed integer value that is unique in the hosting RAM Trunk.

A cell must also be identified as local or remote. The trunk identifier is 2 bytes in size and is only used when a cell ID must be qualified for processing outside the local RAM Trunk.

Master Cell Layout

A master cell is simply a pairing of a property ID to a cell. Because a cell can be divided into shards, this is a singular representation of the "property pointer" for all such shards. The layout of the Master Cell is as follows:

{Cell ID - 4 bytes} {Trunk ID - 2 bytes} *({ Cell ID - 4 bytes } { Trunk ID - 2 bytes })

Of note is that Master Cells are not subject to the concurrency control mechanism and will be update in place on rare occasions, if ever. Therefore the 16 byte transaction header is omitted and the total layout is fixed at 6 bytes.

Vertex Cell Layout

The first 16 bytes are the cell version number which is used to handle concurrency resolution. By definition, the version number must be generated in a "shared nothing" manner. The cell version number will be generated using a variant of the Twitter Snowflake algorithm. Snowflakes will be generated using the 2 byte (16 bit) Trunk ID as the "node/data center name". The next 6 bytes of the cell represent the next cell in a chain of partitions. In most cases, this will be 0, which indicates no additional cells to traverse. This layout represents the "vertex head" and is expressed below:

{Added TX - 8 bytes} {Deleted TX - 8 bytes} {Next Shard ID - 4 bytes} {Next Shard Trunk ID - 2 bytes}

This fixed layout is 22 bytes and represents the minimum size for a vertex record. In addition to the header layout, the vertex will have an arbitrary number of edge records of fixed size, as follows:

{Edge Type - 4 bytes} {Property Cell ID - 4 bytes} {End Vertex ID- 4 bytes} {End Trunk ID - 2 bytes} {Property Trunk ID - 2 bytes}

Vertex cells will vary in size but have a predictable fixed internal layout. The decision to keep edge data a local as possible is for in-memory read performance.

Property Cell Layout

The conceptual design layout of a property cell is as follows:

{Added TX - 8 bytes} {Deleted TX - 8 bytes} *( {Property Type ID - 4 bytes} {Start Byte - 1 byte} { Data - n bytes } {End Byte - 1 byte} )

A single start byte determines the next embedded record type. Each record type has a known fixed size, but the number and types of records in a cell may vary according to its degree. This pattern continues until the end of the cell is reached (0 is considered end of cell).

It is expected that property blobs remain of reasonable size and do not need to be partitioned. It is rare in the real world for something to be modeled with the thousands of properties that would be needed to make such a blob prohibitively large.

A proposed definition of these values is as follows (expressed below as an enumeration, may be defined as a set of constants to eliminate design time casting):

namespace O1.Model.Internal
{
    /// <summary>
    /// Predefined constants that define markup that describes the layout of a cell.
    /// </summary>
    /// <remarks>
    /// THESE VALUES CANNOT CHANGE ONCE IN USE.
    /// </remarks>
    internal enum Markup : byte
    {
        /// <summary>
        /// Indicates the next value is a string property.
        /// </summary>
        StringProp = 0,

        /// <summary>
        /// indicates the next value is a hashed property.
        /// </summary>
        HashProp = 1,

        /// <summary>
        /// Indicates the next value is a decimal property.
        /// </summary>
        DecimalProp = 2,

        /// <summary>
        /// Indicates the next value is a date property.
        /// </summary>
        DateProp = 3,

        /// <summary>
        /// Indicates the next value is a date property.
        /// </summary>
        DateTimeProp = 4,

        /// <summary>
        /// Indicates the next value is a 16 bit integer property.
        /// </summary>
        Int16Prop = 5,

        /// <summary>
        /// Indicates the next value is a 32 bit integer property.
        /// </summary>
        Int32Prop = 6,

        /// <summary>
        /// Indicates the next value is a 64 bit integer property.
        /// </summary>
        Int64Prop = 7,
    }
}

The data for each type of record is generally described as follows:
  • End cell IDs for a given edge are 6 bytes in size: 2 bytes to identify the local RAM Trunk + 4 bytes for the local RAM Trunk ID.
  • Property type identifiers should be 2 bytes. This allows ~65K distinct property types, which is expected to be more than enough distinctions to identify the meaning of a property. This requires that property type definitions are globally controlled. Alternatively, in a shared nothing design, property IDs can be hashed string values of 16 bytes in size.
  • All other properties except string are the known CLR size in bytes. Dates are stored as DateTimeOffset types.
  • String property values are stored with an additional byte to indicate size. This infers a constraint of 255 byte string size (consider Unicode). Alternatively, since property matching will be most likely constrained to equality for strings, large strings can be hashed down to 16 bytes to provide the advantages of a fixed record size.

Cell Layouts Depicted

Cell Layouts.png

Cell Processing

Cells will not be instantiated into .NET run-time objects due to the overhead of doing so. Instead, cells will be read as binary data using a visitor pattern layered on top of a forward only streaming approach. Implementers may override a well-known visitor implementation to specify graph processing algorithms.

Additional Thoughts

Reactive extensions are under consideration for processing queries using an observer pattern. Additionally, Orleans (MS Research) provides an interesting "Giraph like" approach where a vertex is an Actor. Given the need to be as compact as possible in RAM, an actor instantiation would represent "some thing" that processes raw data in the context of a graph query, as opposed to a fully articulated run-time domain object.

Regarding storage and variable cell size, a "delta" buffer (similar in concept to the approach used in Hekaton) or jagged array may be useful for handling data modification conditions. A jagged array is interesting because memory compaction is delegated to the CLR GC; if this works, why bother with anything else? This is likely to be generation 2 garbage and the impacts of non-deterministic compaction must be assessed.

raw design notes
Storage
Global Shared Constructs
CAP
Query Processing

Last edited Feb 8, 2015 at 3:32 PM by bkjuice, version 34