It is imperative for any file system to be highly scalable, performant, and fault tolerant. Otherwise…why would you even bother to store data there? But realistically, achieving fault tolerance is done through data redundancy. On the flipside, the cost of redundancy is increased storage overhead. There are two possible encoding schemes for fault tolerance: triple mirroring (RF3) and erasure coding. To ensure the Scale Data Distributed Filesystem (SDFS, codenamed “Atlas”) is fault tolerant while increasing capacity and maintaining higher performance, Rubrik uses a schema called erasure coding.
As a part of its native replication, MongoDB maintains multiple copies of data in a construct called a replica set.
So, what is a replica set? A replica set in MongoDB is a group of mongod (primary daemon process for the MongoDB system) process that maintains the same data set. Put simply, it is a group of MongoDB servers operating in a primary / secondary failover fashion. Replica sets provide redundancy and high availability.
What is Sharding?
A server’s capacity can be challenged by database systems with large data sets or high throughput applications. For example, high query rates can consume the CPU capacity of the server; likewise working set sizes larger than a system’s RAM stress the IO capacity of disks.