I tried to run Hadoop on Ubuntu in pseudo-distributed mode today, following are the detailed steps:
Install Ubuntu 11.10 i386 in VirtualBox. In this release, JDK is located in /usr/lib/jvm/java-6-openjdk by default.
Add a dedicated Hadoop user account for running Hadoop
sudo addgroup hadoop sudo adduser --ingroup hadoop hadoop
Configure SSH for Hadoop user
sudo apt-get install ssh su - hadoop ssh-keygen -t rsa -P "" cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
Download latest stable release of Hadoop from Hadoop’s homepage. I downloaded release 1.0.2 in a gzipped tar file (hadoop-1.0.2-tar.gz). Then uncompress the hadoop-1.0.2.tar.gz.
tar zxvf hadoop-1.0.2.tar.gz mv hadoop-1.0.2 hadoop
Configure Hadoop
The $HADOOP_INSTALL/hadoop/conf directory contains some configuration files for Hadoop.
hadoop-env.sh
export JAVA_HOME=/usr/lib/jvm/java-6-openjdk
core-site.xml
<?xml version=”1.0″?>
<?xml-stylesheet type=”text/xsl” href=”configuration.xsl”?>
<!– Put site-specific property overrides in this file. –>
<configuration>
<property>
<name>fs.default.name</name>
<value>hdfs://localhost:9000</value>
</property>
</configuration>
hdfs-site.xml
<?xml version=”1.0″?>
<?xml-stylesheet type=”text/xsl” href=”configuration.xsl”?>
<!– Put site-specific property overrides in this file. –>
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
</configuration>
mapred-site.xml
<?xml version=”1.0″?>
<?xml-stylesheet type=”text/xsl” href=”configuration.xsl”?>
<!– Put site-specific property overrides in this file. –>
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
</configuration>
Format the HDFS filesystem
bin/hadoop namenode -format
Start your single-node cluster
bin/start-all.shRun the WordCount example job
bin/hadoop fs -copyFromLocal /home/hadoop/test_wc.txt test_wc.txt bin/hadoop fs -ls bin/hadoop jar hadoop-examples-1.0.2.jar wordcount test_wc.txt test_wc-output bin/hadoop fs -cat test_wc-output/part-r-00000 bin/hadoop fs -copyToLocal test_wc-output /home/hadoop/test_wc-output
Stop your single-node cluster
bin/stop-all.shReferences:
Hadoop: The Definitive Guide
