Wednesday, November 23, 2011

Hadoop Installation

Hadoop has traditionally been a royal pain to setup and configure properly. With recent Cloudera’s distribution releases, this process has gotten simpler, but is still a far cry from straightforward. We’ll try to, if not simplify it, at least document it thoroughly so you follow clear, step-by-step instructions to get your first Hadoop cluster up and running locally. Let’s dive in!

Prerequisites

This tutorial requires the following two hefty installers downloaded to your workstation:
  1. Oracle VirtualBox to in order to run Virtual Machine Images (VMs) on your machine. Here is the link to the Virtual Box download page:
  2. An Ubuntu 10 Image that will house our Hadoop installation. You can grab one from here: NOTE: as of this writing, Cloudera’s Hadoop distribution was not compatible with Ubuntu 11. Just pick the version 10.04 LTS from the downloads drop-down menu to avoid any issues with your installation.

Install VirtualBox

  1. Download the installation package for your operating system (Windows or Mac OS X recommended).
  2. Close all applications and run the installation package following the on screen instructions.
    NOTE: The current tested version is 4.0.8 (08/05/2011).

Install Ubuntu 10 Image

  1. Download Ubuntu OS Version 10.04 LTS.
  2. Start VirtualBox from application selection menu:
  1. Click on the New button to create new virtual machine and click continue
  2. Provide a name for your VM and select Linux and Ubuntu in OS options
 
  1. Keep the rest of the settings as defaults and continue with instructions
  2. Start the VM after it was created by selecting the VM in the left screen and clicking on the Start button


7. Select installation media as the downloaded Ubuntu installation package
  1. Proceed with default settings during the installation.
    NOTE: the user hadoop is reserved and should not be selected as your user.
  2. Restart your VM OS after the installation has been completed. You should see the following screen:

Install Java JDK and Hadoop

  1. Open new terminal by going to Applications => Accessories => Terminal.
2. Check the release version of the Ubuntu by running the following command:
lsb_release -c






The expected output should be lucid
 3.  Inside the Terminal, create an empty file /etc/apt/sources.list.d/cloudera.list by running the following command:

sudo vi /etc/apt/sources.list.d/cloudera.list


4. Paste the following two lines into the file

deb http://archive.cloudera.com/debian lucid-cdh3 contrib 
deb-src http://archive.cloudera.com/debian lucid-cdh3 contrib

 5. Run the following commands in the terminal window:

sudo add-apt-repository "deb http://archive.canonical.com/ lucid partner"
sudo apt-get update
sudo apt-get install sun-java6-jdk
sudo apt-get install hadoop-0.20
 
6. Install Hadoop components:

sudo apt-get install hadoop-0.20-namenode
sudo apt-get install hadoop-0.20-datanode
sudo apt-get install hadoop-0.20-jobtracker
sudo apt-get install hadoop-0.20-tasktracker 
 
7. Install configuration for pseudo distributed cluster: 
 
sudo apt-get install hadoop-0.20-conf-pseudo

8. Start services by running the following command in the terminal window:
 
for x in /etc/init.d/hadoop-* ; do sudo $x start; done 
 
9. Check your installation by opening the following links in your internet browser:
 
http://localhost:50070
http://localhost:50030 
 

 

 




 

4 comments:

  1. Do you have the tutorial for the hadoop installation. Would really appreciate it, if you could please post it. I am trying to learn Hadoop.

    ReplyDelete
  2. http://www.learncomputer.com/hadoop-install/

    I have it posted here

    ReplyDelete
    Replies
    1. Instructions are not good. It is a waste of time. Please review and revise!

      Delete
  3. Hi,

    I tried to follow the installation. I had to install JAVA manually because 1.6 is now outdated. When I type java-version I get

    -laptop:/$ java -version
    java version "1.6.0_37"
    Java(TM) SE Runtime Environment (build 1.6.0_37-b06)
    Java HotSpot(TM) Client VM (build 20.12-b01, mixed mode, sharing)

    and I also have javac installed.
    But when I try the command "for x in /etc/init.d/hadoop-* ; do sudo $x start; done" in the last step , I am getting an error which says

    Error: JAVA_HOME is not set and Java could not be found |
    +----------------------------------------------------------------------+
    | Please download the latest Sun JDK from the Sun Java web site |
    | > http://java.sun.com/javase/downloads/ < |
    | |
    | Hadoop requires Java 1.6 or later. |
    | NOTE: This script will find Sun Java whether you install using the |
    | binary or the RPM based installer.



    Could you please help.

    Thank you

    ReplyDelete