Official Power Up Hosting Blog

Everything about Linux, Windows, and hosting ;)

Selvakumar
Author

I am an Online Marketer and technology lover. I like to learn new things and share that with people.

Share


Our Newsletter


Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua.

Tags


Twitter


Official Power Up Hosting Blog

How to Install Hadoop on Ubuntu 16.04 (Single Node)

SelvakumarSelvakumar

Hadoop was sponsored by Apache Software Foundation. It is a Cluster data Management Project.

Hadoop is a Java-based framework which manages the large data sets among the group of cluster machines.

It is very tough to configure the Cluster with Hadoop. But you can also install Hadoop on a single machine to perform some basic operations.

Hadoop May seem a single software but it has a lot of components behind it.

Here are they.

Hadoop Common:

We can say this as a big library which consists of utilities and libraries to support other Hadoop modules.

HDFS:

the Hadoop Distributed File system is responsible storing the data on the hard disk.

YARN:

YARN is the open source distributed processing framework and it stands for Yet Another Resource Negotiator.

MapReduce

Map reduce is a model for generating and processing big data sets in the cluster using parallel and distributed algorithms.

This is the base model but there are many models available for the updated Hadoop version 2.0.

Requirements

  • An Ubuntu 16.04 server configured according to the initial server setup guide.

Follow the guide and configure the server according to that. After that, you can go for installing the Hadoop and its dependencies.

The Hadoop requires Java to run. First, we will install Java and we will install Hadoop.

After that, we can configure the Hadoop and then we will run it.

Let us see how to install hadoop on ubuntu step by step in this tutorial.

Install Java

First, you have to update the package index.

$ sudo apt-get update

After that install java on your Ubuntu

$ sudo apt-get install java

Now, check the java version.

$ java -version

You will get the following output.

openjdk version "1.8.0_91"
OpenJDK Runtime Environment (build 1.8.0_91-8u91-b14-
3ubuntu1~16.04.1-b14)
OpenJDK 64-Bit Server VM (build 25.91-b14, mixed mode)

Installing Hadoop

We have installed Java and then we have to install the Hadoop. Go to the Apache Hadoop release page to find the latest version of Apache.

install hadoop on ubuntu

You have to find the latest stable version to install hadoop 2.7 on ubuntu. Once you find the latest stable version and then copy the link by doing the right click.

Here we are going to install hadoop 2.7.3 on ubuntu.

install hadoop 2.7 on ubuntu

Use the below command to download the file.

$ wget http://apache.mirrors.tds.net/hadoop/common/hadoop-2.7.3/hadoop-2.7.3.tar.gz

== You will be redirected to the available mirror. So, the current URL may not match with the given one.==

We have to check whether the file has been altered while downloading.

For that, we will be doing an SHA - 256 checks.

Now, once again go back to the release page and then go to the Apache link.

Go to the Apache web directory. You have to find the .mds file for the version which you have downloaded.

Copy the link of that file and use that with wget as mentioned below.

$ wget https://dist.apache.org/repos/dist/release/hadoop/common/hadoop-2.7.3/hadoop-2.7.3.tar.gz.mds

After that, do the verification using the below command.

$ shasum -a 256 hadoop-2.7.3.tar.gz 

You will get the following output.

d489df3808244b906eb38f4d081ba49e50c4603db03efd5e594a1e98b09259c2 hadoop-2.7.3.tar.gz

Now, check the SHA-256 value.

$ cat hadoop-2.7.3.tar.gz.mds

The both output should match.

~/hadoop-2.7.3.tar.gz.mds
...
hadoop-2.7.3.tar.gz: SHA256 = D489DF38 08244B90 6EB38F4D 081BA49E 50C4603D B03EFD5E 594A1E98 B09259C2
...

You can simply ignore the spaces. This way you can verify whether the file is corrupted while downloading.

Once you verified that file is original, then we have to use the tar command to extract the file.

$ tar -xzvf hadoop-2.7.3.tar.gz

Here:

-x is for extracting flag

-z is for uncompressing the file.

-v for verbose output

-f specifies the extraction from the file.

Now, we will move the extracted file to the /usr/local location.

$ sudo mv hadoop-2.7.3 /usr/local/hadoop

Now, the next step is to start the environment.

Configuring the Hadoop to use java

We have to configure the Hadoop to use the java either in Hadoop's configuration file or using the environmental variable.

Here /usr/bin/java and /etc/alternatives/java both are symlink to each other.

Here, we have to use the -f flag to follow the symlink in the every part of the path that is mentioned.

The sed will be used here to trim the path to get the bin/java. We have to do this to get the correct value of java Home from the output.

If you want to get the default java path.

$ readlink -f /usr/bin/java | sed "s:bin/java::"

Output

/usr/lib/jvm/java-8-openjdk-amd64/jre/

We will set this version of Java to the Home path of Hadoop.

There is another way in which you can use the readlink command to dynamically set the path if you use any updated version.

First, Open the hadoop-env.sh:

$ sudo nano /usr/local/hadoop/etc/hadoop/hadoop-env.sh

There are two options available. Here are they.

Setting Up a Static Value

/usr/local/hadoop/etc/hadoop/hadoop-env.sh
 . . .
#export JAVA_HOME=${JAVA_HOME}
export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64/jre/
. . . 

Use the Readlink Directly

/usr/local/hadoop/etc/hadoop/hadoop-env.sh
. . .
#export JAVA_HOME=${JAVA_HOME}
export JAVA_HOME=$(readlink -f /usr/bin/java | sed "s:bin/java::")
. . . 

Running Hadoop

We have set the Java Path and Now we can run the Hadoop.

Both Hadoop 2.7 multi-node cluster setup and Hadoop 2.6 multi-node cluster setup is complex and will be discussed in the upcoming article.

$ /usr/local/hadoop/bin/hadoop

You will get the following output.

Usage: hadoop [--config confdir] [COMMAND | CLASSNAME]
CLASSNAME            run the class named CLASSNAME
or
where COMMAND is one of:
fs                   run a generic filesystem user client
version              print the version
jar <jar>            run a jar file
                   note: please use "yarn jar" to launch
                         YARN applications, not this command.
checknative [-a|-h]  check native hadoop and compression libraries availability
distcp <srcurl> <desturl> copy file or directories recursively
archive -archiveName NAME -p <parent path> <src>* <dest> create a hadoop archive
classpath            prints the class path needed to get the
credential           interact with credential providers
                   Hadoop jar and the required libraries
daemonlog            get/set the log level for each daemon

If you get the above output, then it means that the Hadoop is up and running in the stand alone mode.

We will test whether it is configured properly or not.

We will use the example mapreduce program to test the Hadoop.

The first step is to create a directory called input in the home screen.

After that copy the Hadoop configuration to that file.

$ mkdir ~/input
$ cp /usr/local/hadoop/etc/hadoop/*.xml ~/input

Use the below command to run the Mapreduce hadoop-mapreduce-examples. It is a java archieve with a lot of programs inside.

We are going to use the grep program from the Mapdreduce program.

The Mapreduce program will work on counting the matches of words or regular expression.

We are going to find the occurrence of 'principal' at the end or before the declarative.

Since the expression is case sensitive, we could not find if it is capitalized.

/usr/local/hadoop/bin/hadoop jar /usr/local/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.3.jar grep ~/input ~/grep_example 'principal[.]*'

Once the process completed, you will get the following output.

 Output
. . .
    File System Counters
            FILE: Number of bytes read=1247674
            FILE: Number of bytes written=2324248
            FILE: Number of read operations=0
            FILE: Number of large read operations=0
            FILE: Number of write operations=0
    Map-Reduce Framework
            Map input records=2
            Map output records=2
            Map output bytes=37
            Map output materialized bytes=47
            Input split bytes=114
            Combine input records=0
            Combine output records=0
            Reduce input groups=2
            Reduce shuffle bytes=47
            Reduce input records=2
            Reduce output records=2
            Spilled Records=4
            Shuffled Maps =1
            Failed Shuffles=0
            Merged Map outputs=1
            GC time elapsed (ms)=61
            Total committed heap usage (bytes)=263520256
    Shuffle Errors
            BAD_ID=0
            CONNECTION=0
            IO_ERROR=0
            WRONG_LENGTH=0
            WRONG_MAP=0
            WRONG_REDUCE=0
    File Input Format Counters
            Bytes Read=151
    File Output Format Counters
            Bytes Written=37

If you get the following output, then it means that the output folder which is existing already.

Output
. . .
    at java.lang.reflect.Method.invoke(Method.java:498)
    at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
    at org.apache.hadoop.util.RunJar.main(RunJar.java:136)

The results will be stored in the output directory and you can check it using cat.

 $ cat ~/grep_example/*

Output

6       principal
1       principal.

The output indicates that the word principal was found after six non-occurrence.

The example verifies that the installation has been done properly and the Hadoop is working well on the stand alone mode.

The user who is non-privileged can run Hadoop for exploring and debugging.

Conclusion

Here in this article, you have learned how to install Hadoop in the stand alone mode.

We have demonstrated hadoop single node cluster setup on ubuntu.

Also, you have tested whether the configuration is perfectly working or not.

Be sure to sign up to receive more tutorials that we are going to publish.

Selvakumar
Author

Selvakumar

I am an Online Marketer and technology lover. I like to learn new things and share that with people.

Comments