Monday, June 30, 2014

Installing Hadoop-1.2.1(Stable Ver 1 ) on Linux(Ubuntu) Part 2

Continued from Part 1 


Go to bigdata directory and create two directory

$ cd /home/training/bigdata

$ chmod -R 755 hadoop-1.2.1

$ mkdir name

$ mkdir data

$ chmod -R 755 name

$ chmod -R 755 data

To run hadoop on single node the following files need to be configured

  1. hdfs-site.xml
  2. mapred-site.xml
  3. core-site.xml
  4. masters ( for multinode cluster)
  5. slaves (for multinode cluster)


Go to hadoop conf directory

$ cd /home/training/bigdata/hadoop-1.2.1/conf

Open core-site.xm file

$ vim core-site.xml

Add the following inside configuration tag

<property>
<name>fs.default.name</name>
<value>hdfs://localhost:9000</value>
</property>

Save and close the file.

Next, open hdfs-site.xml and copy the following values

$ vim hdfs-site.xml

<property>
<name>dfs.replication</name>
<value>1</value>
</property>
<property>
<name>dfs.name.dir</name>
<value>/home/training/bigdata/name</value>
</property>
<property>
<name>dfs.data.dir</name>
<value>/home/training/bigdata/data</value>
</property>

Save and close the file.

Now, open mapred-site.xml and copy the following values

$ vim mapred-site.xml

<property>
<name>mapred.job.tracker</name>
<value>localhost:9001</value>
</property>

For now for pseudo distributed mode both msters and slaves file will contain the localhost and we don't need to change it.


Next step is to format the namenode.

But , before that lets check if both JAVA_HOME and HADOOP_HOME are set and visible.
On terminal type echo $PATH

$ echo $PATH

It should display path for both java and hadoop as we have already set path for both.


Now lets format the namenode.

Type the following command at namenode

$ cd

$ hadoop namenode -format

Note : Type Uppercase Y when requested, else it will throw error.
Also namenode should be formatted only once, else you might lose all previous data if you format the namenode again.


Note in some cases if JAVA_HOME is not detected and if you get the error that JAVA_HOME is not set then , got to hadoop conf directory and edit
hadoop-env.sh file.

$ cd /home/training/bigdata/hadoop-1.2.1/conf

$ vim hadoop-env.sh

Now remove the # comment sign and change the same path as JAVA_HOME
Here it is

JAVA_HOME = /home/training/java

Now we are all set to start the cluster

At terminal type

$ start-all.sh

$ jps

It should display the process number and the running processes.
If every thing is setup correctly . We should see the following process running.

NameNode
DataNode
SecondaryNameNode
JobTracker
TaskTracker

To stop the cluster type

$ stop-all.sh

Installing Hadoop-1.2.1(Stable Ver 1 ) on Linux(Ubuntu) Part 1

What is Hadoop?

The Apache Hadoop software library is a framework that allows for the distributed processing of large data sets across clusters of computers using simple programming models. It is designed to scale up from single servers to thousands of machines, each offering local computation and storage. Rather than rely on hardware to deliver high-availability, the library itself is designed to detect and handle failures at the application layer, so delivering a highly-available service on top of a cluster of computers, each of which may be prone to failures.

Source : http://hadoop.apache.org/


There are three steps to install Hadoop as a single node cluster.

Step 1.Installing Java

Follow this to install java- How to Install/Setup Java Path on Linux

Step 2.Setup ssh for password-less access- Create and Setup SSH Certificates


Step 3.Install Hadoop


Download the hadoop binary from http://hadoop.apache.org/releases.html
Choose any stable stable version.
I have used Hadoop-1.2.1 here from this repository.

After downloading you would have a file similar to

hadoop-1.2.1.tar.gz

Extract the file using the command

$ tar -xzvf hadoop-1.2.1.tar.gz

Create bigdata directory at home.

Eg: Here the home directory path is

$ pwd
$ /home/training

Create a new directory bigdata

$ mkdir bigdata

Go back to the extracted hadoop directory and
move extracted hadoop folder to bigdata directory

$ mv hadoop-1.2.1  /home/training/bigdata

$ cd /home/training/bigdata/hadoop-1.2.1

To check the current working directory type

$ pwd

It should display

$/home/training/bidgata/hadoop-1.2.1

Now lets set path for HADOOP_HOME

Go to home directory

$ cd

Open .bashrc file as sudo

$ vim .bashrc (or vi .bashrc , if vim is not installed )

Note : To install vim we can use the following command

sudo apt-get install vim

At the end of the file copy the following settings

#my hadoop settings

export HADOOP_HOME=/home/training/bigdata/hadoop-1.2.1
export PATH=$PATH:$HADOOP_HOME/bin

Note : If you have used some other path please replace that accordingly.

Save and close the file.Reload the config using the command


$ source .bashrc


Next we edit the configuration files for pseudo-distributed mode in next part

Installing Hadoop-1.2.1(Stable Ver 1 ) on Linux(Ubuntu) Part 2


Create and Setup SSH Certificates


What is SSH ?

Secure Shell (SSH) is a cryptographic network protocol for secure data communication, remote command-line login, remote command execution, and other secure network services between two networked computers. It connects, via a secure channel over an insecure network, a server and a client running SSH server and SSH client programs, respectively. The protocol specification distinguishes between two major versions that are referred to as SSH-1 and SSH-2.

Source Wikipedia : http://en.wikipedia.org/wiki/Secure_Shell

Now that we know what ssh is and why it is used ,lets setup ssh for Ubuntu.

At the terminal type

$ sudo apt-get install openssh-server

Accept to install by typing Y when requested.

$ cd

$ ssh-keygen -t rsa

Hit enter twice and .ssh folder will be created.

$ cd .ssh

Copy id_rsa.pub to authorized_keys file.

$ cp id_rsa.pub authorized_keys

Test the keys by ssh to localhost

$ ssh localhost

Now go back to .ssh directory

$ cd ~/.ssh

Now we  can sse that a new file known_hosts has been automatically created.

How to Install/Setup Java Path on Linux

Method 1:

If we do  not require any particular version of Java, then by default running the following command at terminal would result in default  open-jdk Java installation .

On Ubuntu
$ sudo apt-get install openjdk-7-jre
On  Redhat/CentOs/Fedora

   $ su -c "yum install java-1.7.0-openjdk"

Agree to install and java should be up and running after the installation completes.

Method 2 :

In some cases, if we require some specific version of java (Eg, Oracle )  due to some business requirements.
Then we need to manually download the binary , install and the set the path for applications to recognize java.

Download the binary from the respective website, here I am using oracle jdk 7 for linux x64 bit PC.

For example the file was  downloaded to /home/vipul/Downloads

I used the following link to download the setup.

Java SE Development Kit 7 Downloads

Now extract the file using the following command

$ tar -xzvf  jdk-7u60-linux-x64.tar.gz

Which would result in extracted directory containing the setup.

Now move the extracted folder(dir) to the path of your choice and here I have renamed the folder
jdk1.7.0_60 to java using the following command

$ mv jdk1.7.0_60 java

Next move java to desired location

$ mv java /home/vipul

Now lets , set the path for Java.

To Set JAVA_HOME / PATH for a single user, Login to your account and open .bashrc file

$ vi ~/.bashrc
Set JAVA_HOME as follows using syntax export JAVA_HOME=<path-to-java>.

export JAVA_HOME=/home/vipul/java

Set PATH as follows:

export PATH=$PATH:$JAVA_HOME/bin

Please replace the path according to your java home directory .
Log out or restart for new settings to show.
Alternatively, type the following command to activate the new path settings immediately:

$ source ~/.bashrc

Verify new settings:

$ echo $JAVA_HOME
$ echo $PATH
Tip: Use the following command to find out exact path to which java executable under UNIX / Linux:

$ which java

Please note that the file ~/.bashrc is similar, with the exception that ~/.bash_profile runs only for Bash login shells and .bashrc runs for every new Bash shell.

To Set JAVA_HOME / PATH for all user, We need  to setup global config in /etc/profile OR /etc/bash.bashrc file for all users:

# vi /etc/profile
Next setup PATH / JAVA_PATH variables as follows:

export JAVA_HOME=/home/vipul/java
export PATH=$PATH:$JAVA_HOME/bin
Save and close the file. Once again you need to type the following command to activate the path settings immediately:

# source /etc/profile

Next : Create and Setup SSH Certificates

For detailed reading , please refer the following WikiHow blog

Install-Oracle-Java-on-Ubuntu-Linux