HBASE

3 min readSep 9, 2019

HBASE is a column oriented key value store build on top of HDFS. This indicates HBASE use (leverages)the HDFS infrastructure for data replication. HBASE is a CP system(Consistent and Partition tolerant system). It sacrifice avalibility for consistency [See CAP theorem]

HBase is a NoSql system, so you don’t normalize your data model and you don’t have joins to merge two tables. See Hive (and pig) for query language

HBase automatically shards (splits) the large tables into multiple shards (which is called Regions). Regions are stored in region server

Data Model

You can image a table which can store the sparse data (many column are missing). For any cell the complete coordinates are given by

Table:Row:ColumnFamily:Column:Timestamp → Value

When you try to reterive a value the latest timestamp/ version is returned. By default three version are stored

Deleted nodes are marked with tombstones, to indicate they can’t be returned for any query.

Components

In this post we will understand key concepts used in HBASE.

HBASE have following big architechural pieces:

H Master: Manage splitting of table when it become huge to multiple RegionServer. All DDL commands are executed by it. HMaster is not in read or the write path of the query. There is usually a secondary H-Master for fault tolerance reason. At any point of time only one H-master is active.

2. RegionServer: Client directly contact Region Server for read and write query.

3. DataNode: Stores the actual data in H-Files. Can be colocated with Region server in most cases. After the table split RegionServer and datanode can be in different machine untill datanode are moved during the compaction process

4. Zookeeeper (coodination service for detecting and managing failures in both regionServer and H-Master). Uses consensus to guarantee “Only one active HMaster”. Also zookeeper use ephimeral node (heartbeat) to detect failure). Of course Zookeeper need to run in multiple nodes to make itself fault tolerant.

Termonology

Region: its a *contiguous* range of row keys in a table

.META table (Stored in Zookeeper) This table has a very similar purpose to what we have in DNS. It is a mapping between table_name:region_start_key:region_id → regionServer IP. This is how client know which regions server to contact for the particular read or write request. .META table is cached at client

DataStructure maintained by RegionServers

WAL (write ahead log): This is a classic technique for having durability in HBase. Different databases have different name for it. This is the journal concept. You write your DB write commands disk (append only). Region server can create the memstore by replaying the WAL.

Memstore: This is the (in memory/RAM) write cache. there is 1 memstore for each column family. Hence this impose max limit on how many max col family you can. Anything which is in memstore must be present in the WAL as Memstore is creating by executing command in WAL. Once the data is in Memstore the client will get the ACK for the write. Time to time the HBase region flush happens

Each time MemStore is flushed it is written to a new HFile in HDFS by doing the sequential write (fast)

Block Cache: LRU based read cache.

HFile: This are the files which are stored in the HDFS (data nodes). This is sorted by key values. HFile also have the index (B-Tree) written with it. This is needed for the fast access during the read. Also there are Bloom filters used to just check if there is a value for the particular key exisit in the Hfile.

To read a particular value for the RegionServer HBase have to search all the Hfiles stored in it and return the latest value (which does not have tombstone).To make this process fast and rule out files which does not have the key bloom filter are used.

Read Ops

Look in the block cache and if the value is present return
Look into the memstore
Load the HFile and look there

Speed up read

To speedup the read the compaction happens time to time

HBase Minor compaction: Merge multiple HFile. The output can be still multiple files
HBase Major compaction: Rewrites all the HFile to One HFile per column family. Usually scheduled to run automatically in off-peak load

Region Split When table grow big the table is split in two regions (Which lie in same server/ physical node) . This split is reported to HMaster. For loadbalancing HMaster may schedule the new regions to move off to the other server (BUT not the data node). In this case the moved Region server will have data in remote data node (until major compaction move the data node local to the server)

HBASE

Written by Deepak Pant

No responses yet