Apache HBase And It’s Features
Apache HBase is an open-source and non-relational distributed database modeled after Google’s Bigtable, which is written in Java. It is developed as part of Apache Software Foundation’s Apache Hadoop project and runs on top of HDFS or Alluxio, providing Bigtable-like capabilities for Hadoop.
Apache HBase was began as a project by the company Powerset out of a need to process massive amounts of data for the purposes of natural-language search. It is now a top-level Apache project.
Facebook elected to implement its new messaging platform using HBase in November 2010, but migrated away from HBase in 2018.[4]
Apache HBase is a distributed column-oriented database which is built on top of the Hadoop file system. HBase is an open-source project and is horizontally scalable.
Unlike the relational database systems, HBase does not support a structured query language such as SQL, in fact it is not a relational data store at all. HBase tables are partitioned into multiple regions with every region storing multiple table’s rows.
It provides a fault-tolerant way of storing large quantities of sparse data, which means small amounts of information caught within a large collection of empty or unimportant data, such as finding the 50 largest items in a group of 2 billion records, or finding the non-zero items representing less than 0.1% of a huge collection.
HBase is a column-oriented key-value data store and it has been widely adopted due to its lineage with Hadoop and HDFS. HBase runs on the top of HDFS and is well-suited for faster read and write operations on large datasets with high throughput and low input/output latency.
HBase is now serving several data-driven websites, but Facebook’s Messaging Platform recently migrated from HBase to MyRocks. Unlike relational and traditional databases, HBase does not support SQL scripting; instead the equivalent is written in Java, employing similarity with a MapReduce application.
HBase is linearly scalable, which has automatic failure support. It provides consistent read and writes, integrates with Hadoop, both as a source and a destination. It has easy java API for client and provides data replication across clusters.