building blocks:
memory
- in-memory cache
- relational database
- nosql database
- blob storage
- cdn (nodes on the edge of the network, that serve responses to users. commonly used to serve common responses, replicated across different geographical locations to be as close as possible to the end user)
compute
- server
- api gateway
- load balancer
hybrid
- queues (store units of work to be completed however they compute complicated priority stuff in some cases)
- search-optimised database (inverted index where word points resolves to documents that contain the word)
- stream (streaming events)
- distributed lock (sometimes implemented in an in-memory cache like redis. key refers to what the object is, then value is whether the lock is held. relies on the atomicity property of the cache)
relational db vs nosql:
- rdbs is known for it’s ACID properties. atomicity (stores transactions to happen almost in a queue), consistency (write ahead log), isolation (constraints checked before commit), durability (file sync kernel call to the journal then kernel call to the database file. wal does this differently by queueing it) which means uhh data integrity. does this suggest that nosql doesn’t have isolation or durability? i don’t think acid is necessarily where nosql fails, nosql can still have durability (replication and erasure coding)
- nosql for changing data schemas
cap theorem
availability vs consistency. it’s better if users see stale data (low consistency in accurate data) than no data at all (low availability blocked on waiting for strong consistency to occur). this is referred to as eventual consistency. when the system is network partitioned (microservices architecture), we may not be able to reach another node where accurate data is stored. prio availability unless it’s e.g. tickets or cinema seats being booked
general notes
- HTTP GET does not allow request bodies. Need to pass arguments as a path parameter or query parameter
- ssl termination being done on one node (api gateway) is performant, every server doesn’t need to ssl decrypt (cpu heavy), only one place to manage dns certs
- just had a situation where i opted for complexity where it wasn’t needed. in dropbox, i changed polling to event-driven. the intermediate step should’ve been stick to polling but switch to a smarter polling strategy taking into account client patterns, client battery power limits, time of day, ‘hot’ files
- full text search engine vs database text search plugin: elasticsearch allows fuzzy finding and apparently the text indexes on database plugin adds maintenance cost, more so than elasticsearch?
figures:
- 100ms-200ms latency
- 100M daily active users
- 10M concurrent requests for highest read scenario
- in-memory cache node tops at ~100k, 1TB storage on memory optimised vps, <1ms read
- database node tops at 50k transactions per second, 15-20k write tps, 64TB storage, 1-5ms cached read, 5-30ms disk, 5-15ms commit
- server, 100k requests, 500gbish memory
- sub 1ms in same availability zone (AZ), 1-2ms across AZ in same region, 50ms-150ms across AZ in different region
patterns:
pushing realtime updates
- pub/sub pattern generally. clients connected on a websocket to the server, server subscribes to updates from wherever the responses will come from. data flows through the long living connections constantly, meeting the realtime property
managing long-running tasks
- client request long-running task. server acknowledges instantly and 1) responds a uid 2) creates job in the database 3) appends job to the queue. queue gets consumed by workers, workers perform work and update the job status in database/other system. server notices that work is done somehow and then follow up response is given to client
dealing with contention
- cinema booking seats. key is to start with atomicity and transactional model where everything happens in one go or doesn’t happen, then transition (deadlock or too complicated) to more complex distributed synchronisation techniques (distributed lock, two phase commit protocol, queue based serialisation). make it deterministic
scaling reads
users are always reading more than writing, 10:1 ratio initially then becomes 100:1. first
1) optimise db, optimise sql queries, denormalise your data (instead of having 0 duplication of data so a query could be multiple joins, have data copied to each table so reads don’t have to join and are fast), indexes, perhaps different database technology (elasticsearch)
2) scale horizontally with read replicas (replication lag bad)
3) add caching considerations to the system network (cache invalidation, how to deal with a key that is particularly hot). trading off consistency for high availability. redis cache connected to server makes 10ms read to 1ms read. cdn for users, proximity basedbitly (url shortener)
i scaled the reads for this server by adding a redis cache that the server would read from first and if cache miss then go to database
scaling writes
optimise data model in database. normalise our data, remove indexes that aren’t required. batching techniques for business logic THEN horizontal sharding (hash ring complexity, difficult to reverse this decision once set, but it is hard to definitively know your future needs of your system so always has to be considered carefully) THEN if there’s unavoidable work to do then add a queue for the job, to be done asynchronously. cost of distributed synchronisation between workers and queue. OR add vertical partitioning, requests use a partition key to write to where their data is stored, challenging to get a good key, e.g. country is bad because it could be skewed towards one country, good is user ids, if its in 1xx range then location A, else it’s 2xx then location B.
handling large blobs
don’t try to stream blobs through application server, will just exhause memory of the server. instead blob services like s3 (simple storage service) and gcp buckets can provide presigned urls, so client uploads directly to blob service and reads directly from blob service. system stores the metadata of the blob and it’s location in the blob service, inside the system’s database. challenges of this is maintaining consistency between metadata and what’s in the blob, failures between client/blob service interactions. blob services anticipate this so provide an api for application servers, need to learn what’s available for you to use in your system
dropbox
i handled large blobs initially by splitting storage of file metadata from the raw bytes of the file. our database stores the metadata only, blob storage service is responsible for the file then. we then split the concern of upload blob/sync blob with 2 services. the client is also smart with polling strategy, sometimes turning off polling entirely so that we don’t download files multiple times. we also split the file into chunks (content defined chunking creates splits smartly, by chunking at series of 0s or something) and then only syncing the changed chunks, this massively helps bandwidth usage. we store these chunks in the metadata store
multi-step processes
instead of state of a process stored in different places and error handling of steps stored by the step, we can pull it out into central store of process state. client interaction kicks off the process, event is started in the store and pub sub topics are populated and subscribed to. the service that manages the “contract” of the process coordinates the different topics and workers and is responsible for kicking off error handling
proximity based services
suppose we want to search something close to us. take for example an uber driver that is close to me, we shouldn’t need to enumerate all drivers that are active o(N) we have a range query index instead o(logN) where we discard sections of our data entirely by location. implemented by special indexes in database. reducing search space well. costs a lot to accurately maintain this index. increases cost of adding a driver perhaps, and then replication of this index is complex as well