Continued
from Part 1
Go
to bigdata directory and create two directory
$
cd /home/training/bigdata
$
chmod -R 755 hadoop-1.2.1
$
mkdir name
$
mkdir data
$
chmod -R 755 name
$
chmod -R 755 data
To
run hadoop on single node the following files need to be configured
- hdfs-site.xml
- mapred-site.xml
- core-site.xml
- masters ( for multinode cluster)
- slaves (for multinode cluster)
Go
to hadoop conf directory
$
cd /home/training/bigdata/hadoop-1.2.1/conf
Open
core-site.xm file
$
vim core-site.xml
Add
the following inside configuration tag
<property>
<name>fs.default.name</name>
<value>hdfs://localhost:9000</value>
</property>
Save
and close the file.
Next,
open hdfs-site.xml and copy the following values
$
vim hdfs-site.xml
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
<property>
<name>dfs.name.dir</name>
<value>/home/training/bigdata/name</value>
</property>
<property>
<name>dfs.data.dir</name>
<value>/home/training/bigdata/data</value>
</property>
Save
and close the file.
Now,
open mapred-site.xml and copy the following values
$
vim mapred-site.xml
<property>
<name>mapred.job.tracker</name>
<value>localhost:9001</value>
</property>
For
now for pseudo distributed mode both msters and slaves file will contain the localhost and we don't need to change it.
Next
step is to format the namenode.
But
, before that lets check if both JAVA_HOME and HADOOP_HOME are set
and visible.
On
terminal type echo $PATH
$
echo $PATH
It
should display path for both java and hadoop as we have already set
path for both.
Now
lets format the namenode.
Type
the following command at namenode
$
cd
$
hadoop namenode -format
Note
: Type Uppercase Y when requested, else it will throw error.
Also
namenode should be formatted only once, else you might lose all
previous data if you format the namenode again.
Note
in some cases if JAVA_HOME is not detected and if you get the error
that JAVA_HOME is not set then , got to hadoop conf directory and
edit
hadoop-env.sh
file.
$
cd /home/training/bigdata/hadoop-1.2.1/conf
$
vim hadoop-env.sh
Now
remove the # comment sign and change the same path as JAVA_HOME
Here
it is
JAVA_HOME
= /home/training/java
Now
we are all set to start the cluster
At
terminal type
$
start-all.sh
$
jps
It
should display the process number and the running processes.
If
every thing is setup correctly . We should see the following process
running.
NameNode
DataNode
SecondaryNameNode
JobTracker
TaskTracker
To stop the cluster type
$ stop-all.sh
To stop the cluster type
$ stop-all.sh