The network layer takes care of the node communication and network lookups of the underlying data layer. Access to the data is achieved through the provided data exchange API.
The peer to peer network is built based on S/Kademlia which is responsible for efficient routing within the network. The messages between peers are signed, while the Kademlia node ID presents the OriginTrail Node ID, contained in the node Profile. This enforces long-term identity and helps with Kademlia routing and Eclipse attacks.
The peer-to-peer decentralized network operates as a serverless supply chain data storage, validation and serving network with built in fault-tolerance, DDoS resistance, tamper-proof resistance and self-sustaining based on the incentive system explained in this document.
The intention of this paper is to document the research findings and mechanics behind the incentive model of OriginTrail, as well as to attract opinions and feedback from the community and researchers interested in the topic.
Network entities and classification¶
In order to better understand the OriginTrail P2P network structure and the incentive mechanisms within the protocol we have to understand the different roles within the context of the system.
The main premise is that the different nodes have different interests given their roles. In order to provide fair play on the network and provide a fair market, we have to understand different entities, their aims, needs and relationships. Above all we have to understand possibilities of collusion of different entities and their possible motives and therefore construct incentives in order to mitigate them.
It is important to state that all the nodes are operated by the same software, but rather their function in the context of observed data determines how the nodes are perceived - one node can have different roles within different deals. Below is a list of different entities and their roles in the system.
The data provider (DP) - is an entity that publishes supply chain data to the network. A typical scenario would be a company that would like to publish and share its data from their ERP system about the products that are part of the supply chain. Data providers can also be consumers which are interacting with the network through applications, or devices such as sensors which provide information about significant events in the supply chain.
The interest of the data provider is to be able to safely store the data on the network as well as to be able to connect it and cross-check with the data of other DPs within the network. Depending on the use case, providing the data to the network can be incentivised with the Trace token.
Data Creator Node¶
The Data Creator node (DC) - is an entity that represents a node which will be responsible for importing the data provided by the DP, making sure that all the criteria of DP are met - such as availability of the data on the network for a desired time and a factor of replication. While we expect typically that Data Providers will run their own Data Creator nodes, it is not a requirement - third party DC nodes may provide the service for one or several Data Providers. The DC node is an entry point of the information to the network and the relationship between the DP and DC is not regulated by the protocol.
The responsibility of the DC node is to negotiate, establish and maintain the service requested by the DP in relationship with its associated Data Holder nodes (DH). Furthermore, DC nodes are responsible to check if data is available on the network during the time of service and initiate the litigation process in case of any disputes.
Data Holder Node¶
The Data Holder (DH) is a node that has committed itself to store the data provided by a DC node for a requested period of time and make it available for the interested parties (which can also be the DC node). For this service the Data Holder is compensated in TRAC tokens. The DH node has the responsibility to preserve the data intact in its unaltered, original form, as well as to provide high availability of the data in terms of bandwidth and uptime.
It is important to note that the DH node can be a DC node at the same time, in the context of the data that it has introduced to the network. As noted, the same software runs on all the nodes in the network, providing for symmetrical relations and thus not limiting scalability.
The Data Holder may also wish to find the data not directly delivered by DCs, but that is popular, and offer it to the interested parties. Therefore it is probable that Data Holders will listen to the network, search for data that is frequently requested, and replicate it from other Data Holders to also store them, process and offer them to the Data Viewers.
The Data Viewer (DV) is an entity that requests the data from any network node able to provide that data. The Data Viewer will be able to send to request data for a specific set of supply chain identifiers they are interested in, where they will be able to retrieve the all connected data of the product trail. The Data Viewer will receive the offers from all the nodes that have the data together with potential charges for reading and structure of the data that will be sent. The Data Viewer can decide which offers it will accept and deposit the requested compensation funds on the escrow smart contract if needed. The providing node then sends the encrypted data in order for the Data Viewer to test the validity of data. Once the validity of the data is confirmed, the Data Viewer will get the key to decrypt the data while the smart contract will unlock the funds for the party that provided the data.
The interest of the Data Viewer is to get the data as affordable as possible, but also to be sure that the provided data is genuine. Therefore, the Data Viewer will also have an opportunity to initiate the litigation procedure in case that received data is not valid. If that happens, and it is proved that Data Viewer received the false data, the stake of the corresponding DH node is lost.
The complete picture of interaction between participants in OriginTrail system is presented on data diagram (Figure 1).
To get data onto OriginTrail network, the Data provider sends tokens and data to the chosen DC node. The data creator sends tokens to the smart contract with tailored escrow functionalities and broadcasts a data holding request with the required terms of cooperation. All interested DH node candidates then respond and initiate in a network handshake, which invovles negotiaon on the price of the service per data unit and minimum time of providing the service.
Once the data set is registered in the smart contract with its corresponding fingerprints, the DH nodes which agree with given offer conditions contact the DC node for fetching the data set. The DC node sends the data to the DH and when the data is successfully replicated on DH node, the DH node sends back a signed confirmation to the DC node. The DC collects signed confirmations and attempts to use the confirmed DH identities for resolving the replication task generated by the smart contract. When the task is solved by a proof of work mechanism, the DC node submits arguments for the task solution calculation to the smart contract and used DH identities are selected for compensated data holding. The escrows are the created for selected DH nodes. The DH node can be used for replication only if it has enough (non staked) tokens on its profile for the required job stake and if has not started the time-delayed withdrawal process of profile tokens.
The Data Creator will deposit the compensations in tokens for the Data Holders on an escrow smart contract that Data Holders will be able to progressively withdraw from as the time passes, and up to the full amount once the period of service is successfully finished. The smart contract will take care that the funds are unlocked incrementally. It is up to the Data Holder to decide how often it will withdraw the funds for the part of the service that is already delivered.
In order to participate in the service, the Data Holder will also have to deposit a stake in the amount proportional to the amount of the job value. This stake is necessary as a measure of security that data will not be deleted or tempered in any way, and that it will be provided to third parties according to the requirements.
During the agreement formation between Data creator and Data holders, the Data holder prepares data by splitting graph vertex data into blocks and calculating a root hash which is compared to the one stored on the blockchain. The root hash is stored permanently during the offer creation proceess for everyone to be able to prove the integrity of data. The data is then encrypted using RSA encryption and encryption key appended to it. A Merkle tree is again created for the encrypted data blocks, proving integrity of data that will be sent to Data holder. The root hash of the encrypted data is written to the escrow contract and finally the data can be sent to Data holder. Upon receiving data, the Data holder is verifying that root hash of received data is indeed the one written into escrow contract and if it is a match the testing and payment process can begin.
Testing and compensation¶
To ensure that the service is provided as requested, the Data creator is able to test Data holders by sporadically asking them for a random encrypted data block. In case when the Data creator has a suspicion that the data is not available anymore or is altered in any way, it is able to initiate the litigation procedure in which the smart contract will decide if the Data holder is able to prove that it still has the data available.
The litigation procedure involves a smart contract as a validator of the service. When the Data Creator challenges the Data Holder to prove to the smart contract that it is storing the agreed-upon data, it sends a test to the smart contract in the form of a requested data block number. In response, the Data Holder sends the requested block to the smart contract. The Data Creator then sends the Merkle proof for the requested data block and the smart contract checks if it (the provided Merkle proof), and requested data block, comply with the already agreed data root hash.
There are two possible situations in which the Data Holder will not be able to or cannot prove that the data it holds is there and unchanged. The first is when it, the Data Holder, is not available and thus unable to answer the challenge. In this case, the Data Creator will try to contact the Data Holder multiple times and, if that fails, will trigger litigation on the smart contract.
The other situation happens when the Data Holder answers the challenge with the wrong data. This can happen in two cases. The first occurs when the Data Holder does not store agreed-upon data and is then not able to submit the correct answer. The second case occurs when the Data Creator has created and submitted a false (unanswerable) test. This dilemma can be solved by the Data Creator sending the correct data block, which fits the already submitted Merkle proof and Merkle root hash to the smart contract. If the Data Holder’s block is incorrect for the given proof, than the Data Holder loses its deployed stake and the stake is transferred to the Data Creator. If it is proven that the Data Holder does not have the original data anymore, or it is not available to answer litigation, the smart contract will initiate the Data-Holder-replacement procedure.
If the Data Holder answers correctly, the Data Creator needs to wait for a certain amount of time in order to start a new litigation. In that case, the Data Holder is safe from answering multiple litigation requests in a short window of time. That time restriction is part of the offer parameters. The Data Holder is able to choose whether or not to bid for the offer, based on offer criteria.
The resolution of the litigation mechanism involves the replacement of the successfully litigated Data Holder node by the Hydra protocol.
The Hydra protocol is similar to the replication phase described earlier. Data Holders will be notified that the litigated node is being replaced. The Data Holders will then contact the Data Creator and take the replication. In that way, they will again participate in the algorithm that will choose which Data Holder will be paid for the offer and take the place of the litigated Data Holder. Upon successful replacement, the offer will be complete again.
This is the first iteration of the litigation mechanism and the Hydra protocol. This solution is liable to change and it will be reiterated in the future.
The Merkle tree for data blocks <B1, B2, … , Bn> is a balanced binary hash tree where each of internal node is calculated as a SHA3 hash of the concatenated child nodes. The i-th leaf node Li is calculated as Li = SHA3(Bi, i). The root hash R of the Merkle tree is SHA3 hash of the roots child nodes. The Merkle proof for block Bi is tuple of hashes <P(0), P(1), .. , P(h−1)> where h is the height of the Merkle tree. For the proof to be valid, it needs to satisfy the tuple of tests <T(0), T(1), .. , T(h-1)> such that T(0) = SHA3(Li, P(0)) and T(i) = SHA3(P(i), T(i−1)), for i > 0, and T(h-1) = R. To prove the integrity of the answer block Bk, the smart contract calculates the hash _a = L(k) _and calculates proof T(h-1). If the proof is correct then the answer blocks integrity is unchanged from when it was created. The diagram of the proving mechanism is shown on Figure 2.
Data consumer broadcasts a query for the data it needs through its associated node. Any DH that stores the data can reply to the broadcast. The data consumer then selects a DH by his own criteria, and either gets the data directly via the DH read API (if the DH node allows it), or creates an escrow contract for reimbursed read and deploys tokens for payment. The DH node then sends the encrypted data to the Data consumer, and the Data consumer randomly selects one data block to send it to the escrow contract together with the block number. After sending, the DH node needs to reply with the unencrypted block, the key that was used for encryption and the Merkle path proof for proving that block is valid. If the whole process is valid, the tokens are transferred to the DH node and the Data consumer can take the key for unlocking data.
Conclusion and further research¶
This document is under constant improvement and is intended to illustrate network mechanics. The focus of the upcoming research in the incentive model will be on simulating the activities in the network based on a larger scale tests in real network conditions. We invite the community to provide opinions, ideas and feedback to further improve the model and document.