On Web2 or the traditional web, storage systems were designed to suit
fast-growing, collaborative environments. These storage systems boasted different degrees of data availability and data security and protection from loss. As Web2 grew in adoption so did the generation of Web2 data volume and density in many formats. This data could be large files, non-transactional, often created by a single user, and may be shared over some geographical distance. This design led to the reliance on central authorities to store the world's data.
Decentralization, as it relates to data storage, is focused on the transfer of authority from central entities to a p2p file storage system. On Filecoin for
example, the decentralized storage network is designed with advanced
cryptographic methods and an incentive mechanism to encourage mass-scale consumption of p2p-first solutions.
In this post, we will explore the cryptographic methods that power
Filecoin's protocol design and consensus mechanisms and how they incentivize decentralized storage. Before we start unveiling the
magic of decentralized storage we will first explore the limitations
in traditional storage solutions.
One of the major limitations of traditional data storage is the location
based addressing system. Central servers keep a list, or directory of the location of data elements for example path and filenames of each data structure. These primary addresses are Uniform Resource Locators (URLs). Servers use these directories to locate data and retrieve it. Information at given locations can usually be altered or completely overwritten without
modification trackers on the servers.
With content addressing as it's implemented on Filecoin we have more
resilience, as the network use cryptographic hashes at its core which
eliminates problems linked with address changes like 404 Err. Data on this system is secure from alterations which result in hashes changes. Traditional data storage also have several other limitations including:
On top of addressing these issues, Filecoin scales with the number of
users/nodes joining the network which reduces latency and retrieval
time and simultaneously increases storage capacity.
Before data can be stored on the Filecoin network it must be packed in a CAR (Content Addressable aRchives) file and a storage deal between a
miner and a client must be initiated and accepted by miner. To initiate a storage deal, a client submits a deal proposal using a Piece CID which is wrapped with all the details of the deal parameters like the deal CID, miner ID, price, and duration.
Once data has been transferred to the miner, they will place the data in a
sector (the storage unit on Filecoin), seal it and start submitting proofs to the network. The storage deal will now be live on the network. Filecoin
uses two cryptographic proofs to verify storage on the network, Proof of Replication (PoRep) and Proof of Spacetime. (PoSt).
During Proof of Replication, a storage miner provides proof of storage of a
unique copy or replica of data. The process happens once when data is first stored by miner in two steps filling sector and sealing sector.
To fill a sector a storage miner must sign multi deals and clients. When
the sector is full an UnSealedSectorCID is generated. This CID, a Commitment of Data or CommD is the root node of all the Piece CIDs in the sector.
To seal a sector, the network runs a computationally heavy encoding
process making it difficult to spoof. The UnSealedSectorCID or CommD
is encoded through a sequence of graph and hashing processes to
create a unique replica.
The resulting root hash of the replica merkle tree, called CommRLast is saved privately by miner for use in Proof of Spacetime processes. The CommRLast is then hashed with another merkle root output from Proof of Replication to generate the SealedSectorCID. This CID, a Commitment of Replication or CommR is recorded to the public blockchain.
Each storage deal generates a unique CommR including deals where same data is stored with multiple storage miners or where multiple deals for the same data are made to a single miner.
Proof of Spacetime runs repeatedly to prove a miner is continuing to
dedicate storage space to the same data over time. This process requires Merkle Inclusion Proofs which are regular checks to ensure that a random selection of encoded data is present at the right location.
* The miner uses the privately stored CommRLast to match the replica merkle root hash to the random hash bytes without revealing the value of the hash.
* After paying a collateral to store user data, if a miner fails a proof of
spacetime at any point, they will be penalized.
Both PoRep and PoSt processes use zk-snarks for compression during sector sealing and spacetime proofing respectively. Zk-SNARKs stands for
“Zero-Knowledge Succinct Non-Interactive Arguments of Knowledge". Zk-SNARKs let us prove that a proof has been done correctly without having to reveal the details of the proof itself or the underlying data on which it's based.
The compression process is computationally expensive but the resulting
end product is small and the verification process is very fast. Zk-snarks keep the chain small and reduce the time needed for verification.
Besides the application of advanced cryptography to run publicly verifiable proofs-of-storage and blockchain technology to build a native cryptocurrency $FIL, Filecoin also incorporates storage contracts collaterals and algorithmic run marketplace with efficient pricing. All these combine to make the storage network a great service with higher availability, resilience and market-determined prices.