Vivek Mishra's Blog

Perform CRUD over HBase using HBaseTestingUtility/EmbeddedHBase


EmbeddedHBase, Something similar to EmbeddedCassandraService. I have been looking for something like this since last year(March 2011) to incorporate the same in Kundera. HBaseTestingUtility really helped me to achieve continuous integration and also testing junits with in-memory HBase server. So here i am sharing some code snippets about how to configure and start EmbeddedHBase:

  • HBaseCli: Responsible for initializing and configuring HBaseTestingUtility.

public final class HBaseCli
{

/** The Constant logger. */
private static final Logger logger = LoggerFactory.getLogger(HBaseCli.class);

/** The utility. */
private static HBaseTestingUtility utility;

private static Boolean isStarted = false;

}

  • Start cluster: (You can change settings for zookeeper port, timeout etc.)

/**
* Starts a new cluster.
*/
public static void startCluster()
{
if (!isStarted)
{
File workingDirectory = new File(“./”);
Configuration conf = new Configuration();
System.setProperty(“test.build.data”, workingDirectory.getAbsolutePath());
conf.set(“test.build.data”, new File(workingDirectory, “zookeeper”).getAbsolutePath());
conf.set(“fs.default.name”, “file:///”);
conf.set(“zookeeper.session.timeout”, “180000”);
conf.set(“hbase.zookeeper.peerport”, “2888”);
conf.set(“hbase.zookeeper.property.clientPort”, “2181”);
try
{
conf.set(HConstants.HBASE_DIR, new File(workingDirectory, “hbase”).toURI().toURL().toString());
}
catch (MalformedURLException e1)
{
logger.error(e1.getMessage());
}

Configuration hbaseConf = HBaseConfiguration.create(conf);
utility = new HBaseTestingUtility(hbaseConf);
try
{
MiniZooKeeperCluster zkCluster = new MiniZooKeeperCluster(conf);
zkCluster.setClientPort(2181);
zkCluster.setTickTime(18000);
zkCluster.startup(utility.setupClusterTestBuildDir());
utility.setZkCluster(zkCluster);
utility.startMiniCluster();
utility.getHbaseCluster().startMaster();
}
catch (Exception e)
{
logger.error(e.getMessage());
throw new RuntimeException(e);
}
isStarted = true;
}
}

This should get Zookeeper, Hadoop(datanode, namenode, tasktracker etc.) and HBase master server started.

Create table

public static void createTable(String tableName)
{
try
{
if (!utility.getHBaseAdmin().tableExists(tableName))
{
utility.createTable(tableName.getBytes(), tableName.getBytes());
}
else
{
logger.info(“Table:” + tableName + ” already exist:”);
}
}
catch (IOException e)
{
logger.error(e.getMessage());
}
}

Add Column

public static void addColumn(String tableName, String columnFamily)
{
try
{
utility.getHBaseAdmin().disableTable(tableName);
utility.getHBaseAdmin().addColumn(tableName, new HColumnDescriptor(columnFamily));
utility.getHBaseAdmin().enableTable(tableName);
}
catch (InvalidFamilyOperationException ife)
{
logger.info(“Column family:” + columnFamily + ” already exist!”);
}
catch (IOException e)
{
logger.error(e.getMessage());
}
}

Stop Cluster:

public static void stopCluster()
{
try
{
if (utility != null)
{
utility.shutdownMiniCluster();
utility = null;
}
}
catch (IOException e)
{
logger.error(e.getMessage());
}
}

Happy Programming :)

How to: CRUD and JPA association handling using Kundera


Recently we have released Kundera-2.0.5. This post is all about demonstrating how to perform CRUD and association handling using kundera. Kundera is now enabled to use secondary index support provided by cassandra(0.7.x onwards), Hence this example will demonstrate how to leverage that benefit using same JPA style within in Kundera
Example i am referring can be found Here.
Run this script to create column family in cassandra with indexes: =>
  • create keyspace KunderaExamples;
  • create column family PERSON with comparator=UTF8Type and column_metadata=[{column_name: PERSON_NAME, validation_class:UTF8Type, index_type: KEYS}, {column_name: AGE, validation_class:IntegerType, index_type: KEYS}];

Entity: PersonCassandra.java

package com.impetus.kundera.examples.crud;import javax.persistence.Column;
import javax.persistence.Entity;
import javax.persistence.Id;
import javax.persistence.Table;/**
* The Class Person.
*/
@Entity
@Table(name = "PERSON", schema = "KunderaExamples@twissandra")
public class PersonCassandra
{/** The person id. */
@Id
@Column(name = "PERSON_ID")
private String personId;/** The person name. */
@Column(name = "PERSON_NAME")
private String personName;/** The age. */
@Column(name = "AGE")
private Integer age;// Followed by getters and setters method
}


Configuration : Persistence.xml

<persistence xmlns=”http://java.sun.com/xml/ns/persistence&#8221;
xmlns:xsi=”http://www.w3.org/2001/XMLSchema-instance&#8221;
xsi:schemaLocation=”http://java.sun.com/xml/ns/persistence
https://raw.github.com/impetus-opensource/Kundera/Kundera-2.0.4/kundera-core/src/test/resources/META-INF/persistence_2_0.xsd&#8221;
version=”2.0″>

<!– Persistence Units for twissandra application –>
<persistence-unit name=”twissandra”>
<provider>com.impetus.kundera.KunderaPersistence</provider>
<properties>
<property name=”kundera.nodes” value=”localhost”/>
<property name=”kundera.port” value=”9160″/>
<property name=”kundera.keyspace” value=”KunderaExamples”/>
<property name=”kundera.dialect” value=”cassandra”/>
<property name=”kundera.client” value=”Pelops”/>
<property name=”kundera.cache.provider.class” value=”com.impetus.kundera.cache.ehcache.EhCacheProvider”/>
<property name=”kundera.cache.config.resource” value=”/ehcache-test.xml”/>
</properties>
</persistence-unit>
</persistence>

Now, if you notice @ table annotation:


@Table(name = “PERSON”, schema = “KunderaExamples@twissandra”)

For cassandra, “PERSON” is specified column family and schema denotes “keyspace@puname”.

Entity definition:
public class PersonTest
{
/** The emf. */
private EntityManagerFactory emf;
/** The em. */
private EntityManager em;

.... methods defining various operations.

}

Initialize entity manager
emf = Persistence.createEntityManagerFactory("twissandra");
em = emf.createEntityManager();
Insert :
public void onInsertCassandra()
{Object p1 = prepareData("1", 10);
Object p2 = prepareData("2", 20);
Object p3 = prepareData("3", 15);
em.persist(p1);
em.persist(p2);
em.persist(p3);
}

private PersonCassandra prepareData(String rowKey, int age)
{
PersonCassandra o = new PersonCassandra();
o.setPersonId(rowKey);
o.setPersonName("vivek");
o.setAge(age);
return o;
}

Find By Id:
public void onFindByIdCassandra()
{
PersonCassandra p = findById(PersonCassandra.class, "1", em);
}
private E findById(Classclazz, Object rowKey, EntityManager em)
{
return em.find(clazz, rowKey);
}
Find By Name:
public void onFindByName()
{
findByName(em, "PersonCassandra", PersonCassandra.class, "vivek", "PERSON_NAME");
}
/**
* Assert find by name.
*
* @param the element type
* @param em the em
* @param clazz the clazz
* @param e the e
* @param name the name
* @param fieldName the field name
*/
private void findByName(EntityManager em, String clazz, E e, String name, String fieldName)
{

String query = "Select p from " + clazz + " p where p."+fieldName+" = "+name;
// // find by name.
Query q = em.createQuery(query);
List results = q.getResultList();
}

Find By Name and Age:

public void onFindByNameAndAge()
{
findByNameAndAge(em, "PersonCassandra", PersonCassandra.class, "vivek", "10", "PERSON_NAME");
}
private void findByNameAndAge(EntityManager em, String clazz, E e, String name, String minVal, String fieldName)
{

Query q = em.createQuery("Select p from " + clazz + " p where p."+fieldName+" = "+name+" and p.AGE > "+ minVal);
List results = q.getResultList();

}

Find By Range:
public void onFindByRange()
{
findByRange(em, "PersonCassandra", PersonCassandra.class, "10", "20", "PERSON_ID");
}
private void findByRange(EntityManager em, String clazz, E e, String minVal, String maxVal, String fieldName)

{
// find by Range.
Query q = em.createQuery("Select p from " + clazz + " p where p."+fieldName+" Between "+minVal+" and "+maxVal);
List results = q.getResultList();
}

Find by Name and “<” and “>”

public void onFindByNameAndAgeGTAndLT()
{
findByNameAndAgeGTAndLT(em, "PersonCassandra", PersonCassandra.class, "vivek", "10", "20", "PERSON_NAME");
}
private void findByNameAndAgeGTAndLT(EntityManager em, String clazz, E e, String name, String minVal, String maxVal, String fieldName)
{
// // // find by name, age clause
Query q = em.createQuery("Select p from " + clazz
+ " p where p."+fieldName+" = " + name + " and p.AGE > "+ minVal+ " and p.AGE < " +maxVal);
List results = q.getResultList();

}

Self Association:

Example demonstrating about how to define and perform bi directional self association is available Here

Using Default lucene index

Still there are some operations not supported by enabling cassandra secondary indexes(e.g. indexing and search over super column values etc.). Also indexing support over HBase is not yet mature, so Kundera does provide default lucene indexing support for all sort of find operation. What you need to do is simply provide given below property:

<property name=”index_home_dir” value=”$LUCENE_DIR_PATH”/>

This will simply start storing and indexing records on specified local/remote location.

Can we use same example for other supported data stores(e.g. Mongo, HBase, Mysql etc) ?

Answer is YES.  changes required:

  • Define persistence unit in persistence.xml
  • Create script specific to intended database.
  • Modify entity definition (e.g. PersonCassandra) for correct column family name or table name.(see @table annotation above)
  • Modify entity manager factory instantiation for correct column family name.

That’s it !

References:
  • Example to refer for different data type support can be found Here
  • Example to refer for cross data store operations can be found Here
  • Example to refer for flickr like application can be found here

Setup Kundera with standalone Cassandra


 

Setup Kundera with standalone Cassandra:

Note: Currently Kundera supports Cassandra 0.8.2. You can download Cassandra releases from here.

1)      Setup an environment variable “CASSANDRA_HOME” to Cassandra home directory (e.g. D:\vivek\source\cassandra-0.8.2)

2)      Download Kundera configurator from here

3)      Change to the directory where this jar file is saved and run the below command:
java -jar kunderaConfigurator.jar

You’ll see an output similar to the one shown below:

4)       In case you want to use Kundera executable to develop your application, you can download it from here. (Else you can build Kundera-2.0.3 source code for the same).

5)       You need to add “log4j-server.properties”, as unfortunately Cassandra release is not able to locate this file internally within executable jar. (You can find this file /src/main/resources)So you need to put this as follows:

 

 

OR like:

6)     Modify your persistence.xml to include

<property name=“server.config” value=“$CASSANDRA_HOME/conf/cassandra.yaml”/>

 Replace $CASSANDRA_HOME -> your cassandra root directory folder.

 


And that’s it. Enjoy working with Kundera

Tesseract-Ocr/grails installation over amazon EC2.


Tesseract-Ocr/grails installation over amazon EC2.

Before we proceed:

Before proceeding to installation, first thing is to choose right AMI instance for your installation. There are number of cloud instance available to go ahead. I prefer to look for an EBS backed AMI instance (stop/save my instance), which should provide you a built-in platform with already installed:

1)      Java 6

2)      64 bit Centos machine.

I tried it so many ec2 instances, and finally settled down with right Image instance ami-4d42a924.

What do i need to install?

To setup grails and tesseract-ocr on your machine what you need is to install:

1)      Leptonica

2)      Tesseract-ocr

3)      Grails

I will cover them in details during installation steps.

Installation steps

   Svn:

1)      Connect to your cloud instance via root.

2)      Verify if svn is installed or not, type “svn” on command prompt. If you get something like “svn is not found or installed”. Then go to step 3.

3)      Type  yum install svn to install svn

Java:

1)      Type java –version to check for version of installed java. If sun jdk is not installed (to know it type echo $JAVA_HOME)? Go to step 2.

2)      Download jdk-6u27-ea-bin-b03-linux-amd64-27_may_2011-rpm.bin(For more details, please refer here).

3)      Once download is complete, execute

  • chmod +x jdk-6u27-ea-bin-b03-linux-amd64-27_may_2011-rpm.bin
  • ./jdk-6u27-ea-bin-b03-linux-amd64-27_may_2011-rpm.bin
  • ln -s /usr/java/jdk1.6.0_27/bin/java /usr/bin/java(if soft link already exists then execute rm -rf /usr/bin/java  and rm -rf /usr/bin/javac before executing this command)
  • ln -s /usr/java/jdk1.6.0_27/bin/javac /usr/bin/javac

Leptonica:

To install leptonica, please execute given below command sequentially:

  • mkdir leptonica
  • cd leptonica
  • wget http://www.leptonica.com/source/leptonlib-1.67.tar.gz
  • tar -zxvf leptonlib-1.67.tar.gz
  • cd leptonlib-1.67
  • ./configure
  • Make
  • make install
  • yum list(to verify list of installed softwares)
  • yum install gcc gcc-c++ make(to verify if c++ compiler is installed or not)
  • yum install aclocal  (to verify if it is installed or not)
  • yum install automake (to verify if it is installed or not)
  • yum install libtoolize (to verify if it is installed or not)
  • yum install libtool (to verify if it is installed or not)
  • yum install libjpeg-devel libpng-devel libtiff-devel zlib-devel

Tesseract-ocr:

  • mkdir tesseract
  • cd tesseract/
  • svn checkout http://tesseract-ocr.googlecode.com/svn/trunk/ tesseract-ocr
  • ./runautoconf
  • mkdir m4
  • ./configure
  • Make
  • make install
  • tesseract (if it displays  output other than “command not found”, means tesseract is successfully configured)

Grails:

References:

Cassandra Play :Kundera ORM


If need more information on cassandra, please visit :Cassandra In brief, Kundera is an ORM based solution over Cassandra.

Kundera provides you a way to map your existing java objects over Nosql(e.g. Cassandra,HBase,MongoDB).

Let’s proceed with example, I am taking a super column family example to demonstrate Kundera:

Post is a super column family and can be described as:
Post:
column_type:Super
Subcomparator_type: UTF8
Default_validation_class: UTF8

Post Entity:
@Entity
@Table(name = "Posts", schema = "Blog")
public class Post
{

/** The permalink. */
@Id
// row identifier
String permalink;

@Embedded
private PostData data;

@Embedded
private AuthorDetail author;


}

You need to generate getters and setters for the above fields.

Kundera is JPA compliant annotation based solution. So Post is annotated with @Entity.

At next line declaration @Table(name = “Posts”, schema = “Blog”) :
name = “Posts” represents SuperColumnFamily “Posts” and schema=”Blog” represents “Blog” keyspace .

You must have noticed @Embedded annotation marked over PostData. Well PostData is a superColumn defined within Posts and as a POJO it is defined as follows:

package com.impetus.kundera.entity;

import java.util.Date;

import javax.persistence.Column;
import javax.persistence.Embeddable;

import org.apache.commons.lang.builder.HashCodeBuilder;

@Embeddable
public class PostData
{
/** The title. */
@Column(name = “title”)
public String title;

/** The body. */
@Column(name = “body”)
public String body;

/** The created. */
@Column(name = “created”)
public Date created;

/**
*
*/
public PostData()
{
}

/**
* @return the title
*/
public String getTitle()
{
return title;
}

/**
* @param title
* the title to set
*/
public void setTitle(String title)
{
this.title = title;
}

/**
* @return the body
*/
public String getBody()
{
return body;
}

/**
* @param body
* the body to set
*/
public void setBody(String body)
{
this.body = body;
}

/**
* @return the created
*/
public Date getCreated()
{
return created;
}

/**
* @param created
* the created to set
*/
public void setCreated(Date created)
{
this.created = created;
}

@Override
public int hashCode()
{
return HashCodeBuilder.reflectionHashCode(this);
}

/*
* (non-Javadoc)
*
* @see java.lang.Object#toString()
*/
@Override
public String toString()
{
StringBuilder builder = new StringBuilder();
builder.append(“Post [title=”);
builder.append(title);
builder.append(“, body=”);
builder.append(body);
builder.append(“, created=”);
builder.append(created);
builder.append(“]”);
return builder.toString();
}

}

Similarly AuthorDetail is another SuperColumn defined as:
package com.impetus.kundera.entity;

import javax.persistence.Column;
import javax.persistence.Embeddable;

import org.apache.commons.lang.builder.HashCodeBuilder;

@Embeddable
public class AuthorDetail
{
/** The author. */
@Column(name = “authorname”)
public String name;

@Column(name = “email”)
public String email;

public AuthorDetail()
{
}

/**
* @return the author
*/
public String getAuthor()
{
return name;
}

/**
* @param author
* the author to set
*/
public void setAuthor(String author)
{
this.name = author;
}

/**
* @return the email
*/
public String getEmail()
{
return email;
}

/**
* @param email
* the email to set
*/
public void setEmail(String email)
{
this.email = email;
}

@Override
public int hashCode()
{
return HashCodeBuilder.reflectionHashCode(this);
}

/*
* (non-Javadoc)
*
* @see java.lang.Object#toString()
*/
@Override
public String toString()
{
StringBuilder builder = new StringBuilder();
builder.append(“, [author=”);
builder.append(name);
builder.append(“, email=”);
builder.append(email);
builder.append(“]”);
return builder.toString();
}
}

CRUD Operation Using Kundera over Posts:
First is you need to define your own persistence.xml with something like this:

<persistence-unit name="cassandra">
<provider>com.impetus.kundera.ejb.KunderaPersistence</provider>
<properties>
<property name="server.config" value="C:/Cassandra/cassandra-0.7.6/conf  /cassandra.yaml"/>
<property name="kundera.nodes" value="localhost"/>
<property name="kundera.port" value="9160"/>
<property name="kundera.keyspace" value="Blog"/>
<property name="kundera.dialect" value="cassandra"/>
<property name="kundera.client" value="Pelops"/>
<property name="kundera.cache.provider_class" value="com.impetus.kundera.cache.ehcache.EhCacheProvider"/>
</properties>
</persistence-unit>

Next is to hold an instance of EntityManager as:
Configuration conf = new Configuration();
EntityManager manager = conf.getEntityManager(persistenceUnitName);

where persistenceUnitName is the one pointing to Cassandra.

Persist :

Post post = new Post();
String key = System.currentTimeMillis() + “-post”;
AuthorDetail authorDetail = new AuthorDetail();
PostData data = new PostData();
data.setTitle(“Vivek”);
post.setPermalink(key);
data.setBody(“KunderaPlay”);
authorDetail.setAuthor(“vivek”);
authorDetail.setEmail(“impetus@impetus.com”);
post.setAuthor(authorDetail);
post.setData(data);
manager.persist(post);

Find:
Post post_db = manager.find(Post.class, key);

Search via Sql Query:
String sql = "Select p.body from Post p where p.title like :tiitle";
Query query = manager.createQuery(sql);
query.setParameter("tiitle", "Vivek");
List posts = query.getResultList();

List of posts will only hold value for specified superColumn field. Which means it will only retrieve Posts super column and AuthorDetails will be have all values as null.

Thus Kundera also provides Secondry Index Support over Cassandra.

It is really easy programming with kundera over NoSql.

More information on Kundera can be found here

Kundera-Examples are also present at

Hector-Kundera


Feature Kundera GORA Hector
ORM Yes Yes Yes
JPA Compliant Yes No -
Annotation Based Yes No Yes
Index Support Yes Yes NO
Second Level Cache Yes(Optional) No No
NoSql Abstraction Yes - Yes
Entity relationship

Support

Yes No No
Native query support Yes No No

 

JPA Compliant& Annotation Based:

Kundera

Entity Definition:

@Entity

// makes it an entity class

@ColumnFamily(family="Authors", keyspace="Blog")

// assign ColumnFamily type and name

Public class Author implements Serializable{

/** Theusername. */

@Id

// row identifier

String username;

/** The email address. */

@Column(name = "email")

// override column-name

String emailAddress;

/** The country. */

@Column

String country;

/** The registered. */

@Column(name = "registeredSince")

@Temporal(TemporalType.DATE)

@Basic

Date registered;

/** The name. */

String name;

/**

* Instantiates a new author.

*/

public Author() { // must have a default constructor

}

}

@ColumnFamily: Defines column family and keyspace of given entity.

Entity Manager:

Configuration conf = new Configuration();

Conf.getEntityManager(“unit-name”);


 

Persistence and Search Using EntityManager :

/**

     * Test save authors.

     *

     * @throws Exception the exception

     */

Public void testSaveAuthors() throws Exception {

        String key = System.currentTimeMillis() + "-author";

        Author aObj = createAuthor(key, "a@a.org", "India", new Date());

        manager.persist(aObj);

// check if saved?

        Author aObj_db = manager.find(Author.class, key);

        assertEquals(aObj, aObj_db);

    }

/**

     * Creates the author.

     *

     * @param username the user name

     * @param email the email

     * @param country the country

     * @paramregisteredSince the registered since

     *

     * @return the author

     */

Private static Author createAuthor(String username, String email, String country, Date registeredSince) {

        Author author = newAuthor();

        author.setUsername(username);

        author.setCountry(country);

        author.setEmailAddress(email);

        author.setRegistered(registeredSince);

       return author;

    }

 

HECTOR:

Entity Definition:

Package com.mycompany.furniture;

Import javax.persistence.Column;

Import javax.persistence.DiscriminatorValue;

Import javax.persistence.Entity;

@Entity

Public class Chair {

@Column(name="recliner")

Private Boolean recliner;

@Column(name="arms")

Private boolean arms;

Public Boolean isRecliner() {

  return recliner;

  }

Public void setRecliner(boolean recliner) {

  this.recliner = recliner;

 }

Public boolean isArms() {

  return arms;

  }

Public void setArms(boolean arms) {

  this.arms = arms;

 }

}

 

Entity Manager:

<code

EntityManagerFactory entityManagerFactory;

entityManagerFactory = Persistence.createEntityManagerFactory(“unit-name”); 

EntityManagerem = entityManagerFactory.createEntityManager();


 

 

Persistence and Search Using EntityManager :

    Chair stb = new Chair();

    stb.setId(1);

    stb.setRecliner(true);

    em.persist(stb);

 

However,HECTOR and KUNDERA both are JPA complined. Additionally Kundera provides a way to search for entities by executing native query(i.e. search by any key)

 

Native Query(Search by any value):

Query q = entityManager.createQuery(“select a from Author a where         

          a.countrylike :country”);

          q.setParameter(“country”, country);

          List<Author> authors = q.getResultList();

 

Secondry Index support:Secondry index support is built with in Kundera for Cassandra and as well as for HBase using Lucene.

NoSql abstraction:

With respect to a developer, Kundera and Hector are drop dead simple to use and hides complexity of underlying APIs.

Additional Supports:

1)  Additionally Kundera provides support for relationship between entities.

2)  Second level cache support is provided by Kundera. (Which might be useful for other NoSqls).

Hector looks to me more than Object Mapper. But Kundera is designed and developed keeping ORM and JPA in mind.

Hive/HBase integration


Assuming you are aware of Hive and HBase basic concepts, my POC around Hive/HBase integration is : 1) Provide real time analytics on HBase.
2) Indexing mechanism.
3) Load Data from HDFS to HBase using Hive logical tables.
4) Bidirectional CRUD operation from Hive<->HBase on same dataset.
Please refer to http://wiki.apache.org/hadoop/Hive/HBaseIntegration. Codebase: i took hive 0.6.0 tag version and applied patch commited on trunk for Hive-1264. HBase version: 0.20.6 To build code base please use “ant tar” task for proper build for $HIVE_SRC. Note: 

  1.      With cloudera distribution of HBase 0.89.X i was facing integration so i moved to Hbase 0.20.6, which worked well.
  2.   In case of getting any heap issue on starting Hive, please change HADOOP_HEAPSIZE in $HIVE_SRC\bin\ext\util\execHiveCmd.sh.

Starting Zookeeper:sudo /usr/lib/zookeeper/bin/zKServer.sh

  • Start HBase shell:
    sudo $HBASE_HOME/bin/hbase master start sudo bin/hbase shell from HBase shell issue “status” to status of master node running. If result is coming as “MasterNotRunningException:null”  then you need to troubleshoot your HBase configuration.
     
Start Hive shell: sudo /home/impetus/Hadoop/Hadoop-0.21/hive/build/dist/bin/hive --auxpath /home/impetus/Hadoop/Hadoop-0.21/hive/build/dist/lib/hive_hbase-handler.jar,/home/impetus/Hadoop/Hadoop-0.21/hive/build/dist/lib/hbase-0.20.6.jar,/home/impetus/Hadoop/Hadoop-0.21/hive/build/dist/lib/zookeeper-3.2.2.jar -hiveconf hbase.master=127.0.1.1:60000

 

 

 

 

 

 

 

Working Example:

create a new HBase table which is to be managed by Hive
CREATE TABLE hive_hbasetable_k(key int, value string)
STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
WITH SERDEPROPERTIES ("hbase.columns.mapping" = ":key,cf1:val")
TBLPROPERTIES ("hbase.table.name" = "hivehbasek");

Create a logical table pokes in Hive
CREATE TABLE pokes (foo INT, bar STRING);

Insert data in pokes from HDFS.
LOAD DATA LOCAL INPATH '/home/impetus/Hadoop/Hadoop-0.21/hive/build/dist/examples/files/kv1.txt' OVERWRITE INTO TABLE pokes;
Load data into HBase table using Hive.
INSERT OVERWRITE TABLE hive_hbasetable_k SELECT * FROM pokes WHERE foo=98;

Issuing

select * from hive_hbasetable_k;
Result:
OK
98 val_98

Let’s scan HBase to validate data is loaded or not by issuing:

hbase(main):015:0> scan "hivehbasek"
ROW COLUMN+CELL
98 column=cf1:val, timestamp=1290502385735, value=val_98
1 row(s) in 0.1410 seconds

Before we try to load data from HBase and query it from Hive. Two points i would like to share:

  1.  If you have noticed that upon creating a HBase table managed by Hive you need to specify columnFamily and column name(cf1 is column family and val is it’s name). So while putting data from HBase make sure to define column name, else it will not be included when you query it via Hive.
  2.  Second is, You must have noticed that hive_hbasetable_k is holding key as an “int”. So while putting data please make sure to insert only integer value in rows, else it will come as NULL key while querying via Hive.(e.g. put ‘hivehbasek’,’r1:key’,’cf1:val’,’99_val’)

Load data from HBase

hbase(main):012:0> put 'hivehbasek','99','cf1:val','99_val'
hbase(main):012:0>put 'hivehbasek','r1:key','cf1:val','99_val'
hbase(main):015:0> scan "hivehbasek"
ROW COLUMN+CELL
98 column=cf1:val, timestamp=1290502385735, value=val_98
99 column=cf1:val, timestamp=1290502652876, value=99_val
r1:key column=cf1:val, timestamp=1290502601128, value=99_val
3 row(s) in 0.1410 seconds

Now upon querying the same from Hive:
hive> select * from hive_hbasetable_k;
OK
98 val_98
99 99_val
NULL 99_val

Example 2: Create Hive table on existing HBase Table and querying it via Hive.


hbase(main):023:0> create 'hbasetohive', 'colFamily'
hbase(main):023:0> put 'hbasetohive', '1s', 'colFamily:val','1strowval'

hbase(main):023:0> scan 'hbasetohive'
ROW COLUMN+CELL
1s column=colFamily:val, timestamp=1290503237527, value=1strowval
1 row(s) in 0.0430 seconds

hive> CREATE EXTERNAL TABLE hbase_hivetable_k(key string, value string)
STORED BY ‘org.apache.hadoop.hive.hbase.HBaseStorageHandler’
WITH SERDEPROPERTIES (“hbase.columns.mapping” = “colFamily:val”)
TBLPROPERTIES(“hbase.table.name” = “hbasetohive”);

hive> select * from hbase_hivetable_k;
OK
1s 1strowval
Time taken: 0.515 seconds

 

—————————————————————————————————————————————————-

HBase 0.93-SNAPSHOT and Hive 0.7.1 Integration:

Today i spent some time for this integration and found out that it is now bit different. So thought to share my way to test this integration.

Steps:

1)  build HBase (used mvn clean install -Dmaven.test.skip=true)

2) Copied hbase/conf/hbase-site.xml to generated HBase snapshot jar(i.e. target/HBase-0.93.0-SNAPSHOT)

3) build Hive using ” ant jar” and ” ant binary” task execution.

4) Copy generated Hbase jar to Hive/build/dist/lib(remove existing Hbase jars)

5) Follow steps given on top of post to start hbase master.

6)  give a command like /home/impadmin/source/hive/build/dist/bin/hive -hiveconf hive.root.logger=INFO,console,hbase.master=localhost:60000,hbase.zookeeper.quorum=localhost

Change to your hbase master and zookeeper settings. it should connect to you to Hbase master.

7) Follow above given examples to verify .

Happy programming :)

Follow

Get every new post delivered to your Inbox.