Desigining dropbox (File sharing Service)

Deepak Pant
3 min readSep 29, 2019

--

Key feature and consideration:

  1. Upload and download the files from webclient or from REST API
  2. How do you sync the files across “ multiple smart clients”. This differs this system from other system. The client is not just a browser but a smart desktop/mobile app
  3. How to you handle upload and download of large file? Chunk the file in smaller pices.
  4. How do you only upload the pieces which are changed in client
  5. Do we need to save the history of the file changes? Yes
  6. How do you save space in storage? compression, detect identical chunks using SHA256

The video below will give you a good insight of the features needed and hammer on the idea of why we need to chunk the file before we upload.

Also notice that the File storage service (blob storage / S3 bucket) and file metadata services (which chunk lies where and which chunk is changed, list of all chunks which can make a file foo.txt) are two seperate services. Metadata service can notify the client about the changes in the file and than client can directly talke to the file storage service to fetch the files.

Pictorial represenatioin of the drobbox system components

Following video talks lot about why to chunk before dropbox upload the file. Good overview with lot of repeated stuff throughout the video.

System Design with Naren

Scalability in Dropbox talk:

Following snapshots is taken from the talk given by “ “. Notice that notificationServer needs a bidirectional connection so that it can notify the client. Usually in HTTP only client can ask server for a request. Bidir connections can be done using (Websocket) to reduce the load in server compared to HTTP polling by client

Chunk size = 4MB.

Dropbox usage binary diff to figure out which chunk changed and only update that chunk (saves network bandwidth and disk storage)

Dropbox also usage SQL and the presentation talk about some of the schema aspect.

Always think how you can maintain consistency? if two client are writing to the same file what will happen? Last write win.? Will you want multiple copies of the files?

Dropbox high level archtechure

DB (for metadata) is mysql for dropbox. datablocks are stored in S3 blob storage (buckets)

Notification server (NotServers) for notification rather than client polling your server. blockserver do rpc to the loadbalancer to get the metadata information.

memcache is not consistent when used in distributed scenerio. memcached is designed for availablity not consistency. Drobpox modified the memcached library for that.

Metadata storage: Log of all the edits / server file journal

id ==> index in log (can be think of timestamp) ,filename, latest ==> latest entry in the log for that file, name space id.

On disk disk are ordered by id.

Remove the latest from primary key to scale the writes

--

--

Deepak Pant
Deepak Pant

Written by Deepak Pant

Engineer, thinker and designer

No responses yet