First encounter with Hadoop

by Vivek Mishra


I can say that changing my employer to Impetus after more than 5 years is a good decision. Now I am new learning path. HPC(high performance computing) team in labs is doing research and development on a number of open sources. Hadoop is such one very exciting cutting edge tech stack. Recently i started my hands on with Hadoop-0.21 release. A number of issues were resolved in this release. So as a first step i tried to build source code on local machine. given below are some issues which i have fixed locally. Posting first time on internet(May be helpful for others too):

 Hadoop-common Build issues:

1) It ask for common/lib directory and need to create it manually.
2) compile-core-test task: Following changes made:

Changed
<javac
     encoding=”${build.encoding}”
     srcdir=”${test.src.dir}”
     includes=”**/org/apache/hadoop/**/*.java”
     destdir=”${test.core.build.classes}”
     debug=”${javac.debug}”
     optimize=”${javac.optimize}”
     target=”${javac.version}”
     source=”${javac.version}”
     deprecation=”${javac.deprecation}”>
      <compilerarg line=”${javac.args} ${javac.args.warnings}” />
      <classpath refid=”test.classpath”/>
     </javac>

Original:
 <javac
     encoding=”${build.encoding}”
     srcdir=”${test.src.dir}/core”
     includes=”org/apache/hadoop/**/*.java”
     destdir=”${test.core.build.classes}”
     debug=”${javac.debug}”
     optimize=”${javac.optimize}”
     target=”${javac.version}”
     source=”${javac.version}”
     deprecation=”${javac.deprecation}”>
      <compilerarg line=”${javac.args} ${javac.args.warnings}” />
      <classpath refid=”test.classpath”/>
     </javac>
  
   Problem: IT was not building src/test/system java classes(which were required for mapred project.
Still Known issue: mvn-install does not refresh .ivy2 cache but copying it to local maven repo only.
Hadoop-hdfs Build issues:

1) Added common.lib to pick all third party jars from baseDir/lib.
  <property name=”common.lib” value=”${basedir}/../lib”/>

2) Modified build.xml to set ivy resolvers to internal.
<property name=”resolvers” value=”internal”/>

3) Modified classpath path id to pick common liberaries.
    <fileset dir=”${common.lib}”>
      <include name=”**.jar” />
    </fileset>
4) Changed liberaries.properties to pick snapshot version of
                         hadoop-common.version=0.21.1-SNAPSHOT
                         hadoop-hdfs.version=0.21.1-SNAPSHOT
Hadoop-mapred Build issues:

1) Modified build.xml to set ivy resolvers to internal.
<property name=”resolvers” value=”internal”/>
2) Changed liberaries.properties to pick snapshot version of
                         hadoop-common.version=0.21.1-SNAPSHOT
                         hadoop-hdfs.version=0.21.1-SNAPSHOT

Note: These are some issues which i have faced. You might be lucky by not having them🙂

More on this:  added eclipse project generation for hdfs and mapred. Copied from hadoop common project.

Was looking into ant-task-download task. (raised a JIRA  https://issues.apache.org/jira/browse/HADOOP-6955). Not sure but looks like it might be a slight improvement.