PrerequisitesThis tutorial requires the following two hefty installers downloaded to your workstation:
- Oracle VirtualBox to in order to run Virtual Machine Images (VMs) on your machine. Here is the link to the Virtual Box download page:
- An Ubuntu 10 Image that will house our Hadoop installation. You can grab one from here:
- Download the installation package for your operating system (Windows or Mac OS X recommended).
- Close all applications and run the installation package following the on screen instructions.
NOTE: The current tested version is 4.0.8 (08/05/2011).
Install Ubuntu 10 Image
- Download Ubuntu OS Version 10.04 LTS.
- Start VirtualBox from application selection menu:
- Click on the New button to create new virtual machine and click continue
- Provide a name for your VM and select Linux and Ubuntu in OS options
- Keep the rest of the settings as defaults and continue with instructions
- Start the VM after it was created by selecting the VM in the left screen and clicking on the Start button
7. Select installation media as the downloaded Ubuntu installation package
- Proceed with default settings during the installation.
NOTE: the user hadoop is reserved and should not be selected as your user.
- Restart your VM OS after the installation has been completed. You should see the following screen:
Install Java JDK and Hadoop
- Open new terminal by going to Applications => Accessories => Terminal.
The expected output should be lucid
3. Inside the Terminal, create an empty file
/etc/apt/sources.list.d/cloudera.listby running the following command:
4.Paste the following two lines into the file
deb http://archive.cloudera.com/debian lucid-cdh3 contrib deb-src http://archive.cloudera.com/debian lucid-cdh3 contrib
5. Run the following commands in the terminal window:
sudo add-apt-repository "deb http://archive.canonical.com/ lucid partner" sudo apt-get update sudo apt-get install sun-java6-jdk sudo apt-get install hadoop-0.20
6. Install Hadoop components:
sudo apt-get install hadoop-0.20-namenode sudo apt-get install hadoop-0.20-datanode sudo apt-get install hadoop-0.20-jobtracker sudo apt-get install hadoop-0.20-tasktracker
7. Install configuration for pseudo distributed cluster:
sudo apt-get install hadoop-0.20-conf-pseudo
8. Start services by running the following command in the terminal window:
for x in /etc/init.d/hadoop-* ; do sudo $x start; done
9. Check your installation by opening the following links in your internet browser: