Introduction to Decentralized File Storage

Level 2 - Value Investor

Jun 06, 2022

Welcome Avatar!

As regulation increases and the world becomes increasingly unstable from a geopolitical perspective, true decentralization will become more valuable. We are extremely bullish on the ability to permissionlessly store and transmit value and engage in economic activity without censorship, anonymously. However, many projects that say they are permissionless and decentralized, really aren’t.

Today we’ll explain why many of the web3 services you rely on aren’t really decentralized, with a particular focus on the storage layer (see our decentralized cloud piece for an overview of compute).

This Startup Is Disrupting Decentralized Data Storage Industry — Picture from Entrepreneur.com

Why Your NFTs (Probably) Aren’t Decentralized

Ownership of digital assets is associated in the public mind with NFTs, since they can be (or represent) unique items. People become attached to ‘their’ Bored Ape or Punk PFP and build their online identities around their NFT. As you probably know, premium NFTs frequently change hands for six figure sums. Whether right or wrong, holders and buyers of the largest projects have high confidence in the value of “their” NFT.

Last year a high profile debate arose concerning the issue of who should own or be able to commercially exploit intellectual property rights associated with high-end NFTs. Should the benefit of these rights flow to the buyer or the NFT creator?

What is less known is that the artwork associated with NFTs is usually not stored in a durable or tamper resistant way. Instead, the images are hosted on the legacy world wide web, not a blockchain. There is no guarantee of availability now or in the future. Crypto wallets and online marketplaces like OpenSea simply pull the images from a web server which could one day be unavailable, or configured to serve up completely different images!

Most NFTs are stored on the web rather than a blockchain or decentralized storage!

This means that although you own the token in your wallet, the image content which contributes to the token value is hosted on a centralized server (controlled by a single individual or corporation). You are not paying to keep the data alive and the server owner owes no obligations to you to continue to store and publish it. The person who controls the server (or even the domain which points to the server) can arbitrarily change or delete the image you “bought”.

Moxie’s Experiment

Signal Co-Founder and privacy OG Moxie Marlinspike created a custom NFT “At My Whim” which uses server side scripting to display a different image for the same NFT depending on whether you are looking at it on Opensea, Looksrare, or from your Metamask wallet!

If you want to see what an NFT looks like “under the hood” here it is.

Why Aren’t NFT Projects Storing Data On The Blockchain?

Blockspace on Ethereum is scarce and highly demanded, making it highly expensive and therefore unsuitable for image data storage.

Autist note: every non-zero 32 byte word of data requires 68 gas to send and 20,000 gas to store on Ethereum. 20,068,000 gas per KB means that at an ETH/USD price of $2,000 and a gas price of 50 gwei it would cost $62 to store a 1KB file (length of a short email), and $1,881 to store a basic low quality JPEG (30KB). Or ~$19 million to store all the images from a 10k PFP collection in low quality format.

Even if storage was not cost-prohibitive (e.g. using an EVM compatible blockchain with lower gas fees), websites are set up to load images from web servers, not parse through a blockchain looking for relevant image data. There would be significant cost to re-engineer projects like Metamask and Opensea to display blockchain-native images. People would need to agree on a technical standard - not an easy task!

Where Should Data Be Stored to Ensure Decentralization?

The legacy web is not the answer. Nor is centralized data storage services like Amazon S3 or Dropbox. It is surprising that people are still using these for blockchain projects more than seven years after the launch of the Interplanetary FileSystem (IPFS), the first project to successfully tackle decentralized storage at scale.

Evolution of distributed file storage

Distributed file storage has its roots in the peer-to-peer file sharing craze which began with the launch of the Napster service in 1999. The growth of the Internet coincided with an audio compression format called MP3 which allowed users to share their CD collection over the Internet.

As the music was copyrighted, the music files were not stored on any centralized server. Users simply connected direct to personal computers over the Internet. This was an early example of distributed storage, but we wouldn’t say it was decentralized. Files had to be looked up on a software program downloaded from the centralized Napster website which a U.S. judge ordered to be taken offline in 2001.

The next key player in file sharing was BitTorrent. Again, this service started off as centralized. A web server called a “tracker” would maintain a list of which computers were sharing a file. Users wishing to download a file over BitTorrent would consult this tracker for a list of personal computers and servers to connect to. Also, the .torrent file which contained the address of ‘trackers’ and hash data about the file had to be served from a centralized website. This was not ideal.

The breakthrough came in the form of Distributed Hash Table, a technology invented under a $12 million grant from the US National Science Foundation to build Infrastructure for Resilient Internet Systems. Let’s break down the terms.

Hashing refers to using a cryptographic function to produce a digest, a summary of the data contained in a file. Any change to the file should result in a different hash. Therefore, a file can be uniquely and globally identified by its hash. If you know the hash of the file, you can always find the file in the database. Unlike in location based addressing (website.com/image_name.jpg), file names do not matter. Using hashes allows for content based addressing - in other words the address of the file depends on the hash of its content.

A hash table is simply a database containing the hash of the file together which which computers have a copy of the file. Distributing this hash table means sharing it amount many nodes in a network so anyone can easily look up a file by consulting their own part of the hash table, and then if the file is not found asking nearby peers if they know about it. (simplified example)

In summary:

Napster allowed people to search each others’ computers to share files, but the application was centralized and shut down
Bittorrent allowed people to download files from each other, but the list of who had the file needed to be centralized which was a target for lawsuits and many ‘tracker’ sites got shut down
DHT allowed anyone to host a copy of the database/tracker of where files were located and look up any file only by its hash, completing the objective of decentralized distributed file storage

What is IPFS and How Does It Work?

The Interplanetary File System was released in 2015 by Juan Benet and Protocol Labs. IPFS is a single global network used for peer-to-peer file sharing. Content addressing is used with a Distributed Hash Table to create the database / index of available files.

IPFS also aims to replace protocols used for delivery of static assets on websites (e.g. images, videos, software code). An IPFS Gateway is a server which allows ordinary websites to download IPFS files using the HTTP protocol. In other words, by using an IPFS gateway you can host an image stored on IPFS on an ordinary website like OpenSea. Gateways are a bridge between the legacy web and the distributed web.

Autist note: there are several ‘public gateways’ available, including hosting by Cloudflare, IPFS.io, and Infura. However, these gateways may censor content for legal reasons. An alternative is to use a browser with built-in access to IPFS such as Brave Browser (BAT token), or Opera for Android.

Did IPFS succeed?

We think so. When Turkey censored Wikipedia between 2017 and 2020, a snapshot was hosted on IPFS which Turkish residents could access. Also in 2017, during the Catalonia independence referendum in Spain, a constitutional court judge blocked some websites - these were made available on IPFS successfully instead.

IPFS can’t be blocked at the domain level (like putting a firewall rule to block *.wikipedia.org) because any web server can act as an IPFS gateway.

Files can’t be censored as it is up to the individual node whether they want to store and share a file, and it is a global network.

Gateways can censor, but anyone can be a gateway.

Finally, it isn’t possible to censor the file name / search terms - files are looked up based on their hash.

IPFS is therefore an ideal place to host the front end for a DeFi application which may be banned by regulators.

Other uses include hosting source code repositories as centralized services like Github may be forced to censor.

Date Persistence and Filecoin

The IPFS system depends on nodes being willing and able to store files and share them with the network. Storage is finite and has a cost. Therefore it is likely that nodes will clear out stored files which are infrequently accessed.

Persisting data on IPFS requires pinning the file to one or more IPFS nodes. You can pin a file to a local node that you run, or you can ask one or more nodes to pin your file.

When you pin a file, you are asking other nodes to incur a cost in hosting the data. There is a fee associated with this, similar to any cloud hosting service.

Filecoin-backed pinning services allow users to use the Filecoin (FIL) token to pin a file on IPFS.

Other decentralized storage options include Arweave, MaidSafe, Sia, Storj, and Swarm. We’ll be taking a deeper look at decentralized storage in future research items.

Stay toon’d.

Disclaimer: None of this is to be deemed legal or financial advice of any kind. These are opinions from an anonymous group of cartoon animals with Wall Street and Software backgrounds.

BowTiedBernard

Jun 6, 2022Edited

I use skiff.io as an alternative to google docs/sheets, and already has wallet integration, and hosts on the IPFS

Loved this overview, wonder if decentralized file storage will overcome the current benefits of centralization, not even including the actual internet infrastructure.

Expand full comment

1 reply

Adam

Jun 11, 2022

favorite article yet - really interesting, exciting part of this space. and something that actually has practical value *today*

10 more comments...

DeFi Education

Discussion about this post