Working with Sqoop/Hadoop (with example)…

by Vivek Mishra


Some days back started looking into Sql to Hadoop data transformation.(Relational databases->nosql). I tried my hands on Sqoop(Sql to Hadoop).http://www.cloudera.com/blog/2009/06/introducing-sqoop/. Reason for this post is i am not able to find any definit link to provide all steps together installation of Hadoop, sqoop and then working example with mySql in simple steps. Let’s start something in more simple steps.
1) Install Hadoop(Using Cloudera Hadoop installation).
command: sudo apt-get install hadoop(Current version is 0.20.2)

Note: In case you face an error like size mismatch in fetching http://archive.cloudera.com/$filepath.

Please check for your download limit or proxy settings. and download it seperatly and run:
A)sudo dpkg $PACKAGE_NAME$
B) run again sudo apt-get install hadoop.

Hadoop installation is done! next step is start services for hadoop.
Run given below command to start hadoop services on your linux/ubuntu box.
for service in /etc/init.d/hadoop-0.20-*; do sudo $service start; done

Sqoop Installation:
Upon start of Hadoop services, let’s move towards Sqoop installation with:
sudo apt-get install sqoop

MySql Installation:
sudo apt-get mysql-server(It will install mysql server 5.X)

During installation it will pop up window to enter password for root. Enter password as per your choice.

Database/table creation: First connect to mySql by following command:
mysql -u $USER_NAME$ -p
It will prompt then for password.So enter your password(same password which you entered above)

execute command
create myDB;(It will create a database named myDB.)
issuing “use myDB” will change database to myDB.

Create a table with:
create table employee(name VARCHAR(10), age VARCHAR(10);

Insert a row: insert into employee values(‘XXX’,’21’);

Once this this done. We are all set for sqoop play.

sudo sqoop import –connect jdbc:mysql://$HOST_NAME$:$PORT_NAME$/myDB –username $USER_NAME –password $PASSWORD$ –table employee -m 1

default port on mysql installation is 3306

If you have not defined primary key for your table then you have to give ‘-m 1’ option for import. else it will be fine.

Once import is done you can fine employee.jar at following location
/tmp/sqoop/compile/employee.jar
and corresponding autogenerated java file at
./employee.java

Note: Reason for this post is to bring installation with correct working example together on 1 place for sqoop and mySql.