Pastebin System Requirements
- The client must be able to upload text data and receive a unique URL
- The received URL is used to access the text data
Data storage
Database schema
- The primary entities of the database are the Pastes table, the Users table
- The relationship between the Users and the Pastes tables is 1-to-many
Type of data store
- The content of a paste is stored in a managed object storage such as AWS S3
- A SQL database such as Postgres or MySQL is used to store the metadata (paste URL) of the paste
High-level design
- The server generates a unique paste identifier (ID) for each new paste
- The server encodes the paste ID for readability
- The server persists the paste ID in the metadata store and the paste in the object storage
- When the client enters the paste ID, the server returns the paste
Write path
- The client makes an HTTP connection to the server
- Writes to Pastebin are rate limited
- Key Generation Service (KGS) creates a unique encoded paste ID
- The object storage returns a presigned URL
- The paste URL (http://presigned-url/paste-id) is created by appending the generated paste ID to the suffix of the presigned URL
- The paste content is transferred directly from the client to the object storage using the paste URL to optimize bandwidth expenses and performance
- The object storage persists the paste using the paste URL
- The metadata of the paste including the paste URL is persisted in the SQL database
- The server returns the paste ID to the client for future access
Read path
- The client executes a DNS query to identify the server
- The CDN is queried to verify if the requested paste is in the CDN cache
- The client makes an HTTP connection to the load balancer or the reverse proxy server
- The read requests are rate limited
- The load balancer delegates the client connection to the server with free capacity
- The server verifies if the paste exists by querying theĀ bloom filter
- If the paste exists, check if the paste is stored in the cache server
- Fetch the metadata for the paste from the SQL database
- Fetch the paste content from the object storage using the metadata
Also publishedĀ here.
Featured imageĀ source.
References
- Todd Hoff,Ā Bitly: Lessons Learned Building A Distributed System That Handles 6 Billion Clicks A MonthĀ (2014), highscalability.com
- GitHub Docs API Documentation Gists, docs.github.com