Cassandra is a distributed database management system designed for handling a high volume of structured data across commodity servers
Cassandra handles the huge amount of data with its distributed architecture. Data is placed on different machines with more than one replication factor that provides high availability and no single point of failure.
In the image below, circles are Cassandra nodes and lines between the circles shows distributed architecture, while the client is sending data to the node.
- Facebook open sourced it in July 2008.
- Apache incubator accepted Cassandra in March 2009.
- Cassandra is a top level project of Apache since February 2010.
- The latest version of Apache Cassandra is 3.2.1.
First let’s understand what NoSQL database is.
Nosql Cassandra Database
NoSQL databases are called “Not Only SQL” or “Non-relational” databases. NoSQL databases store and retrieve data other than tabular relations such as relation databases.
NoSQL databases include MongoDB, HBase, and Cassandra.
There are following properties of NoSQL databases.
- Design Simplicity
- Horizontal Scaling
- High Availability
Data structures used in Cassandra are more specified than data structures used in relational databases. Cassandra data structures are faster than relational database structures.
NoSQL databases are increasingly used in Big Data and real-time web applications. NoSQL databases are sometimes called Not Only SQL i.e. they may support SQL-like query language.
Apache Cassandra Features
There are following features that Cassandra provides.
- Massively Scalable Architecture: Cassandra has a masterless design where all nodes are at the same level which provides operational simplicity and easy scale out.
- Masterless Architecture: Data can be written and read on any node.
- Linear Scale Performance: As more nodes are added, the performance of Cassandra increases.
- No Single point of failure: Cassandra replicates data on different nodes that ensures no single point of failure.
- Fault Detection and Recovery: Failed nodes can easily be restored and recovered.
- Flexible and Dynamic Data Model: Supports datatypes with Fast writes and reads.
- Data Protection: Data is protected with commit log design and build in security like backup and restore mechanisms.
- Tunable Data Consistency: Support for strong data consistency across distributed architecture.
- Multi Data Center Replication: Cassandra provides feature to replicate data across multiple data center.
- Data Compression: Cassandra can compress up to 80% data without any overhead.
- Cassandra Query language: Cassandra provides query language that is similar like SQL language. It makes very easy for relational database developers moving from relational database to Cassandra.