For Those Who Never Give Up: Hadoop Installation for Beginners

Well folks,
Here i will be giving you step by step procedures to install and configure hadoop (version 1.1.0) on a linux (debian based distro) as a single node cluster. This guide is for beginners and you need to boot into your linux machine as a root user

Step 1: First you need to download hadoop source from the following URL
http://apache.techartifact.com/mirror/hadoop/common/hadoop-1.1.1/hadoop-1.1.1.tar.gz

Open a terminal

# cd <to directory where you downloaded hadoop>
# mv hadoop-1.1.0.tar.gz /usr/local/
# cd /usr/local/
# tar zxvf hadoop-1.1.0.tar.gz

From the above commands, you have actually moved hadoop src to /usr/local and uncompressed that file in /usr/local/

Step 2: Hadoop is a standalone java based application, so it requires java 1.6 as its dependency which is to be installed by your own ( if not already installed).

Step 3: Next you need to add a specific user to associate to hadoop

# adduser hadoop

It prompts you to enter password and few other information

Adding user `hadoop' ...
Adding new group `hadoop' (1001) ...
Adding new user `hadoop' (1001) with group `hadoop' ...
Creating home directory `/home/hadoop' ...
Copying files from `/etc/skel' ...
Enter new UNIX password:
Retype new UNIX password:
passwd: password updated successfully
Changing the user information for hadoop
Enter the new value, or press ENTER for the default
   Full Name []:
   Room Number []:
   Work Phone []:
Home Phone []:
   Other []:
Is the information correct? [Y/n] Y

Step 4: Change the configuration files

Befor we configure, type the following to identify java home

# which java

if for example output is

  /usr/bin/java
Then

your JAVA_HOME is /usr

Now,

# cd /usr/local/hadoop-1.1.0/
# cd conf/
# vi hadoop-env.sh

Find the following line

# export JAVA_HOME=/usr/lib/j2sdk1.5-sun

and replace it as

  export JAVA_HOME=/usr/

Next paste the following content into the file core-site.xml

<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<configuration>
   <property>
<name>fs.default.name</name>
<value>hdfs://localhost:8020</value>
</property>
</configuration>

Next paste the following content into the file hdfs-site.xml

<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<configuration>

<property>
<name>dfs.replication</name>
<value>1</value>
</property>

</configuration>

Next paste the following content into the file mapred-site.xml

<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<configuration>
    <property>
<name>mapred.job.tracker</name>
<value>localhost:8021</value>
</property>
</configuration>

Next check the file /etc/hosts if the following content exists as the first line, if not add it

127.0.0.1 localhost <your host name>

Where,
<your host name> is the hostname of your machine.

you can find the hostname by

# hostname

Step 5: Associate user hadoop to your source folder

# cd /usr/local/

# chown -R hadoop hadoop-1.1.0

Step 6: Format HDFS file system Name node and Data Node

# cd /usr/local/hadoop-1.1.0/bin

# su hadoop

# ./hadoop namenode -format

It provides information like

12/10/19 12:00:20 INFO namenode.NameNode: STARTUP_MSG:

/************************************************************

STARTUP_MSG: Starting NameNode

STARTUP_MSG: host = java.net.UnknownHostException: vignesh: vignesh

STARTUP_MSG: args = [-format]

STARTUP_MSG: version = 1.1.0

STARTUP_MSG: build = https://svn.apache.org/repos/asf/hadoop/common/branches/branch-1.1 -r 1394289; compiled by 'hortonfo' on Thu Oct 4 22:06:49 UTC 2012

************************************************************/

12/10/19 12:00:20 INFO util.GSet: VM type = 64-bit

12/10/19 12:00:20 INFO util.GSet: 2% max memory = 17.77875 MB

12/10/19 12:00:20 INFO util.GSet: capacity = 2^21 = 2097152 entries

12/10/19 12:00:20 INFO util.GSet: recommended=2097152, actual=2097152

12/10/19 12:00:21 INFO namenode.FSNamesystem: fsOwner=hadoop

12/10/19 12:00:21 INFO namenode.FSNamesystem: supergroup=supergroup

12/10/19 12:00:21 INFO namenode.FSNamesystem: isPermissionEnabled=true

12/10/19 12:00:21 INFO namenode.FSNamesystem: dfs.block.invalidate.limit=100

12/10/19 12:00:21 INFO namenode.FSNamesystem: isAccessTokenEnabled=false accessKeyUpdateInterval=0 min(s), accessTokenLifetime=0 min(s)

12/10/19 12:00:21 INFO namenode.NameNode: Caching file names occuring more than 10 times

12/10/19 12:00:21 INFO common.Storage: Image file of size 112 saved in 0 seconds.

12/10/19 12:00:21 INFO namenode.FSEditLog: closing edit log: position=4, editlog=/tmp/hadoop-hadoop/dfs/name/current/edits

12/10/19 12:00:21 INFO namenode.FSEditLog: close success: truncate to 4, editlog=/tmp/hadoop-hadoop/dfs/name/current/edits

12/10/19 12:00:21 INFO common.Storage: Storage directory /tmp/hadoop-hadoop/dfs/name has been successfully formatted.

12/10/19 12:00:21 INFO namenode.NameNode: SHUTDOWN_MSG:

/************************************************************

SHUTDOWN_MSG: Shutting down NameNode at java.net.UnknownHostException: vignesh: vignesh

************************************************************/

Similarly format the data node by

# ./hadoop datanode -format

Step 7: Make passwordless ssh for hadoop user

# ssh-keygen -t rsa -P ""

Press enter when it promts

Generating public/private rsa key pair.

Enter file in which to save the key (/home/hadoop/.ssh/id_rsa):

and it generates the key as

Created directory '/home/hadoop/.ssh'.

Your identification has been saved in /home/hadoop/.ssh/id_rsa.

Your public key has been saved in /home/hadoop/.ssh/id_rsa.pub.

The key fingerprint is:

f7:e3:1d:e6:2d:7d:23:2f:64:ea:1c:77:99:26:af:e0 hadoop@vignesh

The key's randomart image is:

+--[ RSA 2048]----+

| |

| S . |

| . . o o|

| o*oo* |

| oo+B*+o|

| .E..B++|

+-----------------+

# cat /home/hadoop/.ssh/id_rsa.pub > /home/hadoop/.ssh/authorized_keys

# ssh hadoop@localhost

type "yes" if it prompts as below

The authenticity of host 'localhost (::1)' can't be established.

RSA key fingerprint is 7e:4a:40:b5:57:06:0d:83:34:58:80:80:c3:e7:18:20.

Are you sure you want to continue connecting (yes/no)?

After this it logs into hadoop user and you have successfully configured passwordless ssh

Now type

# exit

The above command must be used only once. So you are still as hadoop user

Step 8: Start Hadoop services

# ./start-all.sh

it starts 5 services

NameNode
SecondaryNameNode
DataNode
JobTracker
TaskTracker

You can check if the services are running by

# jps

You must see something like this. If not you are facing some errors

26207 TaskTracker
26427 Jps
25847 DataNode
25986 SecondaryNameNode
26089 JobTracker
25738 NameNode

Log into

http://localhost:50030

for hadoop map/reduce administration (optional)

Log into

http://localhost:50070

for browsing the hdfs file system (optional)

Step 9: Follow these commands

# ./hadoop dfsadmin -report

This command gives you information on your hdfs system

# ./hadoop fs -mkdir test

This command creates a directory "test" in your hdfs file system

# vi test_input

In the text editor type

"hi all hello all"

save and exit the file

# ./hadoop fs -put test_input test/input

This command copyies the file (test_input) that we just created into hdfs file system (inside test folder)

#./hadoop fs -ls test

This command list all files in folder "test" of hdfs file system.

#./hadoop jar ../hadoop-examples-1.1.1.jar wordcount test/input test/output

This command runs a mapreduce program (word count) for your input and generates output in "test/output" of hdfs file system.

You can check the output in the following url

http://localhost:50070

Browse the filesystem -> user -> hadoop -> test -> output ->part-r-00000

Step 10: To stop hadoop (optional)

# sh stop-all.sh

Here end our step by step guide to work with hadoop ( for beginners ).

1 comment:

Girish N IyerMarch 24, 2014 at 12:48 PM
how to create jar file of my own mapreduce program ...
For example i have a java file named Spatial and i need to create the jar file

In your example, hadoop-examples.jar is used for running...
For my program,how can i create jar ...and should i execute the program?

could u please help me?

For Those Who Never Give Up

Search This Blog

Friday, October 19, 2012

Hadoop Installation for Beginners

1 comment: