Hive/HBase integration

by Vivek Mishra

Assuming you are aware of Hive and HBase basic concepts, my POC around Hive/HBase integration is : 1) Provide real time analytics on HBase.
2) Indexing mechanism.
3) Load Data from HDFS to HBase using Hive logical tables.
4) Bidirectional CRUD operation from Hive<->HBase on same dataset.
Please refer to Codebase: i took hive 0.6.0 tag version and applied patch commited on trunk for Hive-1264. HBase version: 0.20.6 To build code base please use “ant tar” task for proper build for $HIVE_SRC. Note: 

  1.      With cloudera distribution of HBase 0.89.X i was facing integration so i moved to Hbase 0.20.6, which worked well.
  2.   In case of getting any heap issue on starting Hive, please change HADOOP_HEAPSIZE in $HIVE_SRC\bin\ext\util\

Starting Zookeeper:sudo /usr/lib/zookeeper/bin/

  • Start HBase shell:
    sudo $HBASE_HOME/bin/hbase master start sudo bin/hbase shell from HBase shell issue “status” to status of master node running. If result is coming as “MasterNotRunningException:null”  then you need to troubleshoot your HBase configuration.
Start Hive shell: sudo /home/impetus/Hadoop/Hadoop-0.21/hive/build/dist/bin/hive --auxpath /home/impetus/Hadoop/Hadoop-0.21/hive/build/dist/lib/hive_hbase-handler.jar,/home/impetus/Hadoop/Hadoop-0.21/hive/build/dist/lib/hbase-0.20.6.jar,/home/impetus/Hadoop/Hadoop-0.21/hive/build/dist/lib/zookeeper-3.2.2.jar -hiveconf hbase.master=








Working Example:

create a new HBase table which is to be managed by Hive
CREATE TABLE hive_hbasetable_k(key int, value string)
STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
WITH SERDEPROPERTIES ("hbase.columns.mapping" = ":key,cf1:val")
TBLPROPERTIES ("" = "hivehbasek");

Create a logical table pokes in Hive
CREATE TABLE pokes (foo INT, bar STRING);

Insert data in pokes from HDFS.
LOAD DATA LOCAL INPATH '/home/impetus/Hadoop/Hadoop-0.21/hive/build/dist/examples/files/kv1.txt' OVERWRITE INTO TABLE pokes;
Load data into HBase table using Hive.
INSERT OVERWRITE TABLE hive_hbasetable_k SELECT * FROM pokes WHERE foo=98;


select * from hive_hbasetable_k;
98 val_98

Let’s scan HBase to validate data is loaded or not by issuing:

hbase(main):015:0> scan "hivehbasek"
98 column=cf1:val, timestamp=1290502385735, value=val_98
1 row(s) in 0.1410 seconds

Before we try to load data from HBase and query it from Hive. Two points i would like to share:

  1.  If you have noticed that upon creating a HBase table managed by Hive you need to specify columnFamily and column name(cf1 is column family and val is it’s name). So while putting data from HBase make sure to define column name, else it will not be included when you query it via Hive.
  2.  Second is, You must have noticed that hive_hbasetable_k is holding key as an “int”. So while putting data please make sure to insert only integer value in rows, else it will come as NULL key while querying via Hive.(e.g. put ‘hivehbasek’,’r1:key’,’cf1:val’,’99_val’)

Load data from HBase

hbase(main):012:0> put 'hivehbasek','99','cf1:val','99_val'
hbase(main):012:0>put 'hivehbasek','r1:key','cf1:val','99_val'
hbase(main):015:0> scan "hivehbasek"
98 column=cf1:val, timestamp=1290502385735, value=val_98
99 column=cf1:val, timestamp=1290502652876, value=99_val
r1:key column=cf1:val, timestamp=1290502601128, value=99_val
3 row(s) in 0.1410 seconds

Now upon querying the same from Hive:
hive> select * from hive_hbasetable_k;
98 val_98
99 99_val
NULL 99_val

Example 2: Create Hive table on existing HBase Table and querying it via Hive.

hbase(main):023:0> create 'hbasetohive', 'colFamily'
hbase(main):023:0> put 'hbasetohive', '1s', 'colFamily:val','1strowval'

hbase(main):023:0> scan 'hbasetohive'
1s column=colFamily:val, timestamp=1290503237527, value=1strowval
1 row(s) in 0.0430 seconds

hive> CREATE EXTERNAL TABLE hbase_hivetable_k(key string, value string)
STORED BY ‘org.apache.hadoop.hive.hbase.HBaseStorageHandler’
WITH SERDEPROPERTIES (“hbase.columns.mapping” = “colFamily:val”)
TBLPROPERTIES(“” = “hbasetohive”);

hive> select * from hbase_hivetable_k;
1s 1strowval
Time taken: 0.515 seconds



HBase 0.93-SNAPSHOT and Hive 0.7.1 Integration:

Today i spent some time for this integration and found out that it is now bit different. So thought to share my way to test this integration.


1)  build HBase (used mvn clean install -Dmaven.test.skip=true)

2) Copied hbase/conf/hbase-site.xml to generated HBase snapshot jar(i.e. target/HBase-0.93.0-SNAPSHOT)

3) build Hive using ” ant jar” and ” ant binary” task execution.

4) Copy generated Hbase jar to Hive/build/dist/lib(remove existing Hbase jars)

5) Follow steps given on top of post to start hbase master.

6)  give a command like /home/impadmin/source/hive/build/dist/bin/hive -hiveconf hive.root.logger=INFO,console,hbase.master=localhost:60000,hbase.zookeeper.quorum=localhost

Change to your hbase master and zookeeper settings. it should connect to you to Hbase master.

7) Follow above given examples to verify .

Happy programming :)