OriginTrail
Search
⌃K

Data structure guidelines

GS1 EPCIS XML

The OriginTrail node supports the GS1 EPCIS 1.2 standard for importing and connecting data in the knowledge graph. You can learn more about the GS1 EPCIS standard here.
This document will show how the GS1 EPCIS data is represented in the Knowledge Graph inside one node.

Document data

EPCIS guideline suggests “Standard Business Document Header” SBDH standard for description of the document data. This part of data is in the EPCIS Header part of the file. It has basic information about the file (sender, receiver, ID, purpose…).
Although OriginTrail is the receiver of the file and it can be named as receiver (SBDH allows defining multiple receivers) it is not necessary to include this. Receiver is some entity involved in a business process, not in the data processing.
This data will be stored separately from the dataset contents within the knowledge graph, as metadata.

Master data

EPCIS standard describes 4 ways to process Master data. OriginTrail currently supports the most common way: including the Master data in the Header of an EPCIS XML document.
Since visibility event data contains only identifiers of objects, locations or parties, the Master data serves to further describe them in a more human readable way. This data will be connected to the visibility event data as long as the identifiers of master data are found inside visibility event data.

Visibility event data

Main focus of the EPCIS standard is formalizing description of event data that are generated by activities within the supply chain. OriginTrail supports ObjectEvent, AggregationEvent, TransformationEvent and TransactionEvent, which are thoroughly described in the standard. We strongly advise reading the GS1 EPCIS implementation guideline and to evaluate our example files.
Event data describes interactions between entities described with master data. OriginTrail distinguishes between two types of event data:
  • Internal events are related to processes of object movements or transformations (production, repackaging etc) within the scope of one supply chain participant’s business location (read point) as part of some business process.
    For example, this could be production or assembly that results in output which is used for further production or for sale (repackaging, labeling etc). The important distinction is that the ownership of event objects does not change during the event.
  • External events are related to processes between different supply chain participants (sales/purchases, transport). They represent processes where the jurisdiction or ownership of the objects gets changed in the supply chain. These types of events should use connectors for connecting between parties.

How an event is represented in the graph

When converting an EPCIS Visibility Event to graph, a central vertex will be created for the event. Any event identifiers will be created as separate vertices in the graph, connected to the event vertex, in order to enable connection to other entities with the same identifier.
Any observed objects in the event (the name varies depending on the event type, see the EPCIS data structure) will be added as separate vertices, with the relation created from the object to the event. This enables the objects to connect to their respective master data if available, as the information about the object will be set as that object’s properties.
If the event contains bizLocation and/or readPoint attributes, those will be created as separate vertices, similar to the way it is done for observed objects in the event.
Another part of a visibility event that generates a separate vertex is a connector, which is explained in the following section.

Connectors in EPCIS files

If the event is external (see above) and it should be connected to an event from another data creator’s dataset (such as a business partner) the bizTransactionList should have a bizTransaction attribute containing a connection identifier and the corresponding data creator’s decentralized identity (currently supported is the ethereum ERC-725 identity), separated by a colon. This will create a connector vertex in the graph, and connect it to the event it belongs to.
Once the corresponding data creator creates an event containing the same connection identifier with your decentralized identity, an analogous connector vertex will be created and the two connector vertices will be connected together. This feature enables querying the knowledge graph data belonging to multiple parties.

Permissioned data in EPCIS files

In cases when disclosing the full data publicly is not applicable to the implementation, it is possible to add a visibility property to an attribute of a VocabularyElement in the EPCISMasterData section. The data marked as permissioned will be visible only to the data creator and the parties the data creator marks as whitelisted via the API. More information on permissioned data is available at Vertex Data permissioning
There are two visibility options available:
In cases that only value of the attribute needs to be hidden this option should be used visibility="permissioned.show_attribute". Example:
<VocabularyElement id="id:Company_Green_with_permissioned_data">
<attribute id="id:name" visibility="permissioned.show_attribute">Green</attribute>
</VocabularyElement>
In cases that whole attribute needs to be hidden this option should be used visibility="permissioned.hide_attribute". Example:
<VocabularyElement id="id:Company_Green_with_permissioned_data">
<attribute id="id:wallet" visibility="permissioned.hide_attribute">0xBbAaAd7BD40602B78C0649032D2532dEFa23A4C0</attribute>
</VocabularyElement>
For more information on structuring XML EPCIS files, see XML EPCIS Examples

Verifiable credentials data model

What is a Verifiable Credential
If we look at the physical world, a credential might consist of:
  • Information related to identifying the subject of the credential (for example, a photo, name, or identification number)
  • Information related to the issuing authority (for example, a city government, national agency, or certification body)
  • Information related to the type of credential this is (for example, a Dutch passport, an American driving license, or a health insurance card)
  • Information related to specific attributes or properties being asserted by the issuing authority about the subject (for example, nationality, the classes of vehicle entitled to drive, or date of birth)
  • Evidence related to how the credential was derived
  • Information related to constraints on the credential (for example, expiration date, or terms of use).
A verifiable credential can represent all of the same information that a physical credential represents. The addition of technologies, such as digital signatures, makes verifiable credentials more tamper-evident and more trustworthy than their physical counterparts.
Verifiable credentials data can be placed inside generic OT-JSON object (OT-JSON Structure) with an additional identifier and can be queried using local knowledge graph querying system (Querying the data).
More detailed information about verifiable credentials can be found here:

OT-JSON Data Structure and Guidelines

Introduction and Motivation

In order to have a database and standard agnostic data structure, the protocol utilizes a generic data structure format called OT-JSON, based on JSON-LD. The guiding principles for OT-JSON development are:
  • 1-1 convertibility from/to higher level data formats (XML, JSON, CSV, … )
  • 1-1 convertibility from/to generic graph data structure.
  • Generic, use case agnostic graph representation
  • Extendable for future use cases of the protocol
  • Versionable format

OT-JSON essentials

An OT-JSON document represents a dataset as a graph of interconnected dataset objects (use case entities), such as actors, products, batches, etc. together with relations between them. Structure of dataset objects is generally defined, but extendable to support new use cases.
  • Objects - Use case entities (products, locations, vehicles, people, … )
  • Relations - Relations between use case entities (INSTANCE_OF, BELONGS_TO, … )
  • Metadata - Data about dataset (integrity hashes, data creator, signature, transpilation data, ….)
Example: Assuming that use case request is to connect products with factories there they are produced. Entities of the use case are Product and Producer. These entities are represented as objects in OT-JSON format. Product can have relation PRODUCED_BY with producer that produces it and the producer can have relation HAS_PRODUCED with the product. Product and producer have unique identifiers Product1, Producer1 respectively.
../_images/datalayer4.png
Figure 2. Diagram of the example entities and relations
{
"@graph": [
{
"@id": "Product1",
"@type": "OTObject",
"identifiers": [
{
"identifierType": "ean13",
"identifierValue": "0123456789123"
}
],
"properties": {
"name": "Product 1",
"quantity": {
"value": "0.5",
"unit": "l"
}
},
"relations": [
{
"@type": "OTRelation",
"linkedObject": {
"@id": "Producer1"
},
"properties": {
"relationType": "PRODUCED_BY"
}
}
]
},
{
"@id": "Producer1",
"@type": "OTObject",
"identifiers": [
{
"identifierType": "sgln",
"identifierValue": "0123456789123"
}
],
"properties": {
"name": "Factory 1",
"geolocation": {
"lat": "44.123213",
"lon": "20.489383"
}
},
"relations": [
{
"@type": "OTRelation",
"linkedObject": {
"@id": "Product1"
},
"properties": {
"relationType": "HAS_PRODUCED"
}
}
]
}
]
}
Figure 3. OT-JSON graph representing example entities

Conceptual essentials

Here are some essential conceptual things related to the data in a dataset. Try to fit example of book as an object from the physical world with its information as the data.
  • Every OT-JSON entity (Object) is identified with at least one unique identifier. An identifier is represented as a non-empty string.
  • Entities can have multiple identifiers along with the unique one. For example: EAN13, LOT number and time of some event.
  • Data can be connected by arbitrary relations. A user can define own relations that can be used with others defined by standard.
  • Relations are directed from one entity to another. It is possible to create multiple relations between two objects in both directions.
For more specific information about OT-JSON, see OT-JSON Structure

Web of Things

WoT (Web of Things) provides mechanisms to formally describe IoT interfaces to allow IoT (Internet of Things) devices and services to communicate with each other, independent of their underlying implementation, and across multiple networking protocols. The OriginTrail node supports the WOT standard for importing and connecting data in the knowledge graph.
The goals of the WOT are to improve the interoperability and usability of the IoT. Through a collaboration involving many stakeholders over the past years, several building blocks have been identified that address these challenges. The first set of WoT building blocks is now defined:
  • the Web of Things (WoT) Thing Description
  • the Web of Things (WoT) Binding Templates
  • the Web of Things (WoT) Scripting API
  • the Web of Things (WoT) Security and Privacy Considerations
More details for defined building blocks and use cases are available on the following link: https://www.w3.org/TR/wot-architecture/
Data model is composed of the following resources:
  • Things – A web Thing can be a gateway to other devices that don’t have an internet connection. This resource contains all the web Things that are proxied by this web Thing. This is mainly used by clouds or gateways because they can proxy other devices.
  • Model – A web Thing always has a set of metadata that defines various aspects about it such as its name, description, or configurations.
  • Properties – A property is a variable of a web Thing. Properties represent the internal state of a web Thing. Clients can subscribe to properties to receive a notification message when specific conditions are met; for example, the value of one or more properties changed.
  • Actions – An action is a function offered by a web Thing. Clients can invoke a function on a web Thing by sending an action to the web Thing. Examples of actions are “open” or “close” for a garage door, “enable” or “disable” for a smoke alarm, and “scan” or “check in” for a bottle of soda or a place. The direction of an action is usually from the client to the web Thing. Actions represent the public interface of a web Thing and properties are the private parts.
All these resources are semantically described by simple models serialized in JSON. Resource findability is based Web Linking standard and semantic extensions using JSON-LD are supported. This allows extending basic descriptions using a well-known semantic format such as the GS1 Web Vocabulary. Using this approach, existing services like search engines can automatically get and understand what Things are and how to interact with them. An example of WOT file is available on the following link:

How an event is represented in the graph

When converting a WOT file to graph, a central vertex will be created for the device described in the file. All sensor measurements will be created as separate vertices in the graph, connected to the main event vertex, in order to enable connection to the rest of the graph via the main vertex. There are two custom vertices denoted as readPoint and observerdLocation. These two vertices are considered as connectors which connect data with the rest of the graph. An example of WOT file with connectors is available on the following link: https://github.com/OriginTrail/ot-node/blob/develop/importers/use_cases/perutnina_kakaxi/kakaxi.wot

OT-JSON Structure

Dataset structure

OT-JSON dataset is the native serialization of objects that are transferred in the OriginTrail network. The structure of dataset consists of dataset header, dataset graph and dataset signature. Dataset header contains dataset metadata, such as dataset timestamp, data creator information, transpiler data, verification schemes versions etc. Identifier of a dataset is calculated as a SHA3-256 digest of dataset header and dataset graph sections. Dataset signature is calculated for the canonicalized form of the entire, unsigned, dataset object.
../_images/graphrepresentation.png
Figure 1. Graphic representation of a dataset
Example
{
"@type": "Dataset",
"@id": "0x123456789034567894567890",
"datasetHeader": {...},
"@graph": [...],
"signature": {...}
}
Example 1. Dataset structure example

Attribute definitions

../_images/table4.1.png

Dataset header

Dataset header contains metadata information about dataset, transpilation process from:
  • Version of OT-JSON document
  • Dataset creation timestamp
  • Dataset title
  • Dataset tags
  • Related datasets
  • Validation schemas
  • Data validation information
  • Data creator
  • Transpilation information
{
"datasetHeader": {
"OTJSONVersion": "1.0",
"datasetCreationTimestamp": "2019-01-15T09:43:58Z",
"datasetTitle": "",
"datasetTags": ["gs1-datasets", "..."],
"relatedDatasets": [{
"datasetId": "0x232134875876125375761936",
"relationType": "UPDATED",
"relationDescription": "...",
"relationDirection": "direct"
}],
"validationSchemas": {
"erc725-main": {
"schemaType": "ethereum-725",
"networkId": "1",
"networkType": "private",
"hubContractAddress": "0x2345678902345678912321"
},
"merkleRoot": {
"schemaType": "merkle-root",
"networkId": "1",
"networkType": "private",
"hubContractAddress": "0x2345678902345678912321"
}
},
"dataIntegrity": {
"proofs": [
{
"proofValue": "0x54364576754632364577543",
"proofType": "merkleRootHash",
"validationSchema": "/schemas/merkleRoot"
}
],
},
"dataCreator": {
"identifiers": [
{
"identifierValue": "0x213182735128735218673587612",
"identifierType": "ERC725",
"validationSchema": "/schemas/erc725-main"
}
],
},
},
"transpilationInfo": {
"transpilerType": "GS1-EPCIS",
"transpilerVersion": "1.0",
"sourceMetadata": {
"created": "",
"modified": "",
"standard": "GS1-EPCIS",
"XMLversion": "1.0",
"encoding": "UTF-8"
},
"diff": { "...": "..."}
}
}
Example 2. Dataset header structure example

Validation schemas

Validation schemas are objects that provide information on how to validate specific values, like identifiers and hashes. Schemas can contain addresses of smart contracts where identifiers are created, network identities, locations of proof hashes, etc.

Attribute definitions

../_images/table4.2.png

Hash structure

OT-JSON document is uniquely identified with data hash and root hash. Those hashes are generated from the OT-JSON graph object which stores a user defined data. Before calculating dataset hashes it is important to determine a uniform order of objects in OT-JSON object in order to always obtain the same hash values. When a user imports a dataset, depending on the standard, OT-Node converts the dataset to OT-JSON format, sorts the dataset and calculates data hash and root hash.
OT-JSON service supports 1.0 and 1.1 versions which differ in sorting algorithms. OT-JSON 1.0 version service sorts the entire dataset before calculating hash values and saves unsorted dataset in the graph database. OT-JSON 1.1 version service sorts the entire dataset except arrays in properties and saves sorted dataset in graph database. The new version of OT-JSON service improves overall performance and ensures data integrity by sorting datasets during the import process and when reading data from graph database. Such an approach ensures that the dataset is always sorted during processing and only requires one sorting call for dataset processing functionalit, such as import or replication.
The following sequence diagrams describe the usage of sort methods for both versions of OT-JSON during the import process.
../_images/sortOtJson1.0.png
Figure 2. Import process for OT-JSON version 1.0
../_images/sortOtJson1.1.png
Figure 3. Import process for OT-JSON version 1.1

Signing

When the unsigned OT-JSON document is formed, resulting object is canonicalized (serialized) and prepared for signing by data creator. Dataset signing process can be done using different signature schemas/suits. The canonicalization of OT-JSON dataset is creating sorted stringified JSON object.
Structure of a signature object is defined according to selected signature suit specifications. Signing is done using Koblitz elliptic curve signatures (Ethereum private keys).
Also, id using JSON-LD as a format for OT-JSON, Koblitz 2016 signature suit can be used.
Example of JSON-LD Koblitz signature 2016 Signature Suite
The entire JSON-LD dataset document is canonicalized using URDNA2015 algorithm for JSON-LD canonicalization. Resulting N-QUADS data is digested using SHA256 algorithm. Finally, the digest is signed with ECDSA private key using Koblitz elliptic curve. Koblitz curve is used for generating Ethereum and Bitcoin wallets, so private keys for Ethereum and Bitcoin wallets can be used for signing.
../_images/kobilitzSignature.png
Figure 4. Diagram of dataset signing procedure using Koblitz Signature 2016 Signature Suite

Object structure

OT-JSON dataset objects represent entities which can be interconnected with relations in a graph-like form. Every OT-JSON dataset object is required to have it’s unique identifier (@id), type (@type) and signature. Other required sections include identifiers, properties and relations, while optional sections include attachments.

Attribute definitions

../_images/table4.3.png
{
"@id": "<UNIQUE_OBJECT_IDENTIFIER>",
"@type": "<OBJECT_TYPE>",
"identifiers": ["..."],
"properties": {"...": "..."},
"relations": ["..."],
"attachments": ["..."],
"signature": {"...": "..."}
}
Example 3. Dataset object structure template

Object identifiers section

Object identifiers section is a list of objects that represent identifier values for certain object. Identifier objects contain information about identifier type, identifier value, and optionally validation schema that is used for validating identity.
{
"identifiers": [
{
"@type": "sgtin",
"@value": "1234567.0001",
"validationSchema": "/datasetHeader/validationSchemas/urn:ot:sgtin"
},
{
"@type": "sgln",
"@value": "3232317.0001",
"validationSchema": "/datasetHeader/validationSchemas/urn:ot:sgln"
}
]
}
Example 4. Example of identifiers section

Attribute definitions

../_images/table4.4.png

Object properties section

Object properties section is defined as container for all object property attributes. OT-JSON does not provide specific rules for structuring object properties, those rules are defined within recommendations and data formatting guidelines.
Related objects section is a list of objects that represent information about other objects that are related with the object and definitions of those relations. Objects in related objects list contain information about linkedObject (@id), related object type (@type), relation direction, properties containing additional information about the relation and relation type.
{
"relations": [
{
"@type": "otRelation",
"linkedObject": {
"@id": "<OBJECT ID>",
},
"properties": {"..." : "..."},
"relationType": "PART_OF",
"direction": "direct"
}
]
}
Example 5. Example of related entities section

Attribute definitions

../_images/table4.5.png

Attachments section

Attachments section contains a list of objects that represent metadata about files that are related with the object. Objects in attachment section list contain information about related file id (@id, as URI), attachment type (@type), attachment role (such as certificate, lab results, etc.), attachment description, attachment file type, and SHA3-256 digest of a file content.
{
"attachments": [
{
"@id": "0x4672354967832649786379821",
"@type": "Attachment",
"attachmentRole": "Certificate",
"attachmentDescription": "...",
"fileUri": "/path/file.jpg",
"metadata": {
"fileType": "image/jpeg",
"fileSize": 1024
}
}
]
}
Example 6. Example of attachments section

Attribute definitions

../_images/table4.6.png

Connector objects

Special type of graph objects are Connectors. Connectors are used to connect data from multiple datasets of possibly different data providers. Every connector contains connectionId attribute, which represents value on which connectors are connected to each other. Also, the list expectedConnectionCreators contains list of data creators that are allowed to connect to a connector.
{
"@id": "urn:uuid:1230c84b-5cd6-45a7-b6b5-da7ab8b6f2dd",
"@type": "otConnector",
"identifiers": [
{
"@type": "id",
"@value": "1A794-2019-01-01"
}
],
"properties": {
"expectedConnectionCreators": [
{
"@type": "ERC725",
"@value": "0x9353a6c07787170a43c4eb23f59567811336a8f3",
"validationSchema": "../ethereum-erc"
}
]
},
"relations": [
{
"@type": "otRelation",
"direction": "direct",
"linkedObject": {
"@id": "urn:uuid:fe7d4949-6f34-4f4e-8a11-d048e9c0b835"
},
"properties": null,
"relationType": "CONNECTOR_FOR"
}
]
}
Example 7. Example of a connector object

OT-JSON Versions

In order to improve the simplicity and consistency of generating data integrity values, such as dataset signatures, dataset IDs and dataset root hashes, there have been revisions to how dataset integrity values are calculated. These revisions have been created in order to preserve the ability to validate the integrity of datasets already published to the network.
The differences between OT-JSON versions are in how data is ordered when generating three different data integrity values:
  1. 1.
    datasetID , which is generated as a hash of the @graph section of the dataset, and is used to verify data integrity of the dataset
  2. 2.
    rootHash , which is generated as a hash of the @graph section along with the dataset creator, and is used for verifying the dataset creator
  3. 3.
    signature, which is generated as a signed hash of the entire dataset, and is used to verify the creator and integrity of a dataset off chain.

OT-JSON 1.2

Note
OT-JSON 1.2 was introduced in order to sort the dataset when generating a signature. Along with that, sorting of non user generated arrays (such as identifiers and relations) was reimplemented.
The datasetID for OT-JSON 1.1 is generated out of the @graph section after sorting every object and array, including the the @graph array, without changing the order of any array inside of a properties object.
The rootHash for OT-JSON 1.1 is generated out of the @graph section in the same was as it is for the datasetID.
The signature for OT-JSON 1.1 is generated out of the dataset when the datasetHeader is attached, after sorting the dataset in the same way it was done for datasetID and rootHash.

OT-JSON 1.1

Note
OT-JSON 1.1 was introduced in order to have the same sorting method for generating hashes. Along with that, sorting of arrays was removed in order to prevent unintentionally changing user defined data (such as properties of OT-JSON objects).
The datasetID for OT-JSON 1.1 is generated out of the @graph section after sorting every object in the the @graph array, without changing the order of any array.
The rootHash for OT-JSON 1.1 is generated out of the @graph section in the same was as it is for the datasetID.
The signature for OT-JSON 1.1 is generated out of the dataset when the datasetHeader is attached.

OT-JSON 1.0

The datasetID for OT-JSON 1.0 is generated out of the @graph section after sorting every object and array, including the @graph array.
The rootHash for OT-JSON 1.0 is generated out of the @graph section after sorting the relations and identifiers of each element, and sorting the @graph array by each array element @id.
The signature for OT-JSON 1.0 is generated out of the dataset after first sorting the relations and identifiers of each element, and sorting the @graph array by each array element @id, and then sorting every object in the dataset.

Sorting differences overview

Below is an image visually showing the differences of how the data integrity values are calculated between the OT-JSON versions
../_images/sorting-process-overview.png