It is imperative for any file system to be highly scalable, performant, and fault tolerant. Otherwise…why would you even bother to store data there? But realistically, achieving fault tolerance is done through data redundancy. On the flipside, the cost of redundancy is increased storage overhead. There are two possible encoding schemes for fault tolerance: triple mirroring (RF3) and erasure coding. To ensure the Scale Data Distributed Filesystem (SDFS, codenamed “Atlas”) is fault tolerant while increasing capacity and maintaining higher performance, Rubrik uses a schema called erasure coding.
Month: October 2017
Understanding MongoDB’s Replica Sets
As a part of its native replication, MongoDB maintains multiple copies of data in a construct called a replica set.
Replica Sets
So, what is a replica set? A replica set in MongoDB is a group of mongod (primary daemon process for the MongoDB system) process that maintains the same data set. Put simply, it is a group of MongoDB servers operating in a primary / secondary failover fashion. Replica sets provide redundancy and high availability.
Introduction to Sharding with MongoDB
What is Sharding?
A server’s capacity can be challenged by database systems with large data sets or high throughput applications. For example, high query rates can consume the CPU capacity of the server; likewise working set sizes larger than a system’s RAM stress the IO capacity of disks.