Context for this deck
About this material:
This 30-minute presentation introduces the audience to IPFS and Filecoin and how they accelerate the transition to Web3, a version of the web where ownership and control are more decentralized and distributed.
The audience will get a high-level introduction to the technologies’ core concepts, discuss their potential in the context of web3 and learn about user-friendly tools and existing applications to kick-start their own development efforts.
About IPFS & Filecoin:
The Interplanetary Filesystem (IPFS) is a peer-to-peer network and protocol designed to make the web faster, safer, and more open. IPFS upgrades the web to work in a decentralized manner, addressing data by what it is instead of where it’s located on the network or who is hosting it.
Filecoin, the world’s largest decentralized storage network, allows users to store, request, and transfer data via a verifiable marketplace. Filecoin is an entirely open-source alternative to costly cloud storage, where a blockchain network offers efficiently priced and geographically decentralized storage that ensures the persistence of user’s data.
Please feel free to create a copy for non-permanent edits. For permanent edits, please notify the team to make them aware of the updates.
Introduction to IPFS & Filecoin
[Name of event]
[Name of speaker]
A peer-to-peer protocol making the web upgradeable, resilient, and more open
Think: Peer-to-peer version of HTTP
A decentralized storage network to store humanity’s most important information.
Think: Airbnb for storage
Protocol Labs is an open-source R&D lab building protocols, tools, and services to improve the internet
FF is the steward of the Filecoin community, aspiring to put the power of humanity’s most important information back into the hands of everyone.
Agenda
Core concepts of IPFS & Filecoin
Tools for builders
The possibilities of web3
Learning resources
Discussion and Q&A
Agenda
Core concepts of IPFS & Filecoin
Tools for builders
The possibilities of web3
Learning resources
Discussion and Q&A
6
Centralized
Decentralized
Distributed
Web3
Web2
Data, information and knowledge are a couple of the most important assets of our connected era. They are critical to human development!
That’s why we believe they must be safeguarded open and in the hands of the people
But they aren’t in the current model…let’s start with that problem:
Currently the web 2.0 model is centralization (there are only a few companies that are offering storage: Microsoft, Google, Apple, Amazon).If any of these fail (which they do occasionally) entire services can go down.
As you get further and further distributed, your service's resilience increases as the users themselves are what are powering the services
7
Centralized
Single points of failureData monetized by data monoliths
Decentralized
DistributedUsers power serviceCensorship resistance
Privacy
Self-verified
Non-siloed data
Web3
Web2
So what’s the solution? How do we build a web that is distributed?
How do we get to a world that has a resilient, performant, scalable, secure, efficient, trustlessness, censorship resistance, freed and private, silo busting web?
Over the course of the next 40 mins we’ll see that IPFS, a p2p hypermedia protocol for content addressing & Filecoin, the world’s largest decentralized storage network, are working towards, are important building blocks of this new web3.0
Files and folders - might sound boring but really they’re NOT!
IPFS Interplanetary File System
File system
Files + Folders
8
8
Interplanetary
Distributed (no central server)
Resilient / Offline first
Upgrading the web
What is a file system: files and folders; any file with any content.
Why interplanetary? because it was conceived as a way to upgrade the web in a way that would still work when the network stretches across planets.
The idea being that if you are on Mars it may take one hour for a request to go down and come back from Earth.
However, if that content was already fetched by someone else on Mars then that person should just serve the content instead of going back to Earth
As we will see IPFS is distributed by design; no central authoritative servers are storing content and no central server needs to be contacted to obtain the content.
9
Computer
file://path/to/index.html
Web2IP + port
http://domain.com/path/to/index.html
ipfs://[CID]/path/to/index.html
IPFSContent ID
IPFS addresses content by what it is, instead of where it is
It replaces a folder or file location with a Content ID
9
[Read the LHS]
Let's think of the process we follow when we save data
Locally, it just goes on our drive and is addressed there by its path...
The web is no different: When we open a website, we are just opening some files. The difference is just that they need to be downloaded from a remote location and that we do that using a browser which make things pretty
Now, with IPFS we are also obtaining files from a remote location.
However, the key difference here is that we don't need to know the location of the content, but what is called a content identifier.
Ultimately, the content can be at one or several locations but -as we will see- it does not matter anymore where it is.
10
Same content = Same CID
Content ID can be reproduced anytime from the original content
Copies of content are verifiable by their CID
Cryptographic Hash function(Secure Hash Algorithm-256)
Content hello world
Content Identifier (CID)QmbfSNQ6h73kr72RQ5h8nX8s9aN7aVKNiwGEYabzxBsQT9
Next, let’s look at the key technologies making this possible.
In order to be able to have content addressing, we need to create content identifiers for each piece of information that we want to put on the network.
Think of this as kind of a cryptographic fingerprint of that piece of content.
Every piece of content produces a different fingerprint.
All the fingerprints are of the same size, regardless of the amount of content that they represent
This fingerprint, which we named content ID, can be reproduced anytime from the original content by hashing it.
This means that if we obtain a piece of content after requesting a content ID, we can verify that we were given exactly what we asked for.
11
A folder is a special kind of file, which lists other files in it
The core principle remains the same
Cryptographic Hash function(SHA-256)
Content user1
Content Identifier (CID)
But what about folders?
Folders are really just special types of files, which have a list of files in that folder as content.
That list provides the names of those files and -in the case of IPFS- their CIDs.
Since a folder is a type of file, a CID can be obtained in exactly the same way as for any other type of file.
This means that we can represent a folder or even a full file system using a content address structure
12
Merkle DAGs are graph data structures in which each node is content-addressed
Root CID
abc.doc pic.jpg doge.png
pic2.jpg
user1
user2
Note: Colors indicate unique CIDs
Let’s bring it to life
As you see here on the right side, our top-level folder has a root content ID.
It has two entries corresponding to two folders and those folders have other entries corresponding to files. Each entry has a different fingerprint; that’s visualized with the colors here
This content address type of graph is what we call Merkle DAGs (Directed Acyclic Graph)
These Merkle DAGs used by IPFS allow us to move from location-based addressing to content-addressing in a single step
We are just replacing locations with the root CID of their content. The sub-paths stay the same
13
DeduplicationTo IPFS, CIDs of copies of the same file are the same thing
VerifiabilityIf the content changes, the CID changes
Root CID
abc.doc pic.jpg doge.png
pic2.jpg
user1
user2
new
Note: Colors indicate unique CIDs
What would happen if we wanted to copy file.txt to the folder of the second user. It means two things:
Firstly, we don’t actually have to copy the file. We just have to modify the folder to reference the content.
Two copies of the same content have the same identifier so to IPFS they are the same thing. We call that deduplication.
Secondly, since we changed a folder the fingerprint of that folder changed so we had to update the upper folder, too, to reference the new fingerprint.
This means that the fingerprint from that folder changed resulting in a new root CID.
The fact that a CID will always represent exactly the same piece of information (unlike a location) unlocks the capacity of doing verification on any piece of data
That matters because if the CID is guaranteed to give you the same content, you don't have to get that content from a trusted, centralized server. You can ask anyone for that CID from anybody in a network - regardless of whether you trust them
14
Peer
Unique ID in the p2p network namespace
Provide services to other peers
Must be "discoverable"
Encrypted communication channels
Use services from other peers
Must be "routable" / reachable
Content routing | Peer
Node in a p2p system
Swarm
Network of peers
Got it - but how does the retrieval actually work? Let’s start with the peers
Forget for a moment about IPFS and think about a group of people. If I want to address a person and communicate, it helps if I can identify them (for example by their name), if we share a common language (so we can communicate) and if we have ways to verify that we are who we claim to be
Same with IPFS and the peers in the network: Each peer has a unique identifier (their peer ID). This identifier is linked to a cryptographic identity, which allows each peer to communicate securely through an encrypted channel
15
Peer
Unique ID in the p2p network namespace
Provide services to other peers
Must be "discoverable"
Encrypted communication channels
Use services from other peers
Must be "routable" / reachable
Content routing | Peer
Node in a p2p system
Swarm
Network of peers
Peers also need to be abl