Monday, 27 January 2014

Install apache ambari using local repository

Hadoop cluster setup using Apache Ambari

Apache Ambari:

Apache Ambari is an open source tool from Apache family which is used in building, monitoring, provisioning the Apache Hadoop cluster.

Apache Hadoop:

Apache Hadoop is an open source framework from Apache family which is used for large mass of data analysis. It is a Massively Parallel processing analytical tool. Apache Hadoop cluster is a commodity cluster.
Apache Hadoop is a framework where you can store large mass of data and make your analysis on it. We have many other tools which are comes under Hadoop Ecosystem. And you can make use of the tools for the analytics and storing the data. We will discuss about the tools later (Refer Topic Happiestminds).


Bunch of machines connected through network is called as a cluster. Communication between each machine is happened using Switch. Switch contains 24 ports which are minimum and most of the companies use 48 port switches in their clusters.
A switch with 48 ports and with 2 GB network switch costs you Rs.15000 this is where you have to make you choices for the performance of the cluster. There are many other factors which impacts on the cluster performance which we will discuss later.


Rack is which hold the bunch of machines in single place. We will need to have multiple racks for building a Hadoop cluster. Where in here we are not using this rack for now since this is just a five node cluster (small cluster).


This is the cluster name which we will discuss now. Happiestminds is the cluster which is built by using Apache Ambari.
Happiestminds cluster has five nodes one acts as master node and rest of four acting as slave nodes.
Happiestminds has most of the Hadoop ecosystem tools in it. The list of tools is as follows.

Apache Hadoop Distributed File System.
Apache Hadoop NextGen MapReduce (Yarn).
Nagios Monitoring and Altering system.
Ganglia Metrics collection system.
Data Warehouse system for ad-hoc queries & analysis of large datasets & table & storage management services.
Non-Relational Distributed database & centralized service for configuration management & synchronization.
Scripting platform for analyzing large datasets.
Tool for transferring bulk data between Apache Hadoop and structured data stores such as relational databases.

System for workflow coordination and execution of Apache Hadoop jobs.
Centralized service which provides highly reliable distributed coordination.
Table 1
These are the Hadoop Ecosystem tools which are installed when we build the cluster. I have given a short description.

Hadoop Services in the Cluster:

Hadoop runs five services in the cluster,

Master Node
Hadoop NameNode, Resource Manager, Secondary NameNode.
Slave Node
Hadoop DataNode, Node Manager.
Table 2

Master Node:

In Hadoop Cluster Master-Node plays the key role ‘cause it has NameNode running on it. This Master take cares about most of the things in the cluster, it manages the data stored on the Slave nodes.

Slave Node:

There can be “n” number of slave nodes in the cluster as per your resources and requirements. The data is stored in the datanodes.
I will not discuss about all the components here but just the cluster setup.

Installing Apache Ambari


To build a Hadoop cluster with ambari to monitor and provision the cluster.


You need to check the existing installation. It may cause problem if you have any existing installations I have mentioned in the Table-1.
Setup password less SSH, this helps the master node to have access on all the slave nodes.
Steps involved in Setup password less SSH are as follows,
  • ssh-keygen
  • cat ~./ssh/id_rsa >> ~/.ssh/authorized_keys
  • chmod 700 ~/.ssh
  • chmod 600 ~/.ssh/authorized_keys
  • ssh-copy-id –i ~/.ssh/id_rsa root@ipaddress (slave node’s)
You have to add all slave nodes’ hostnames in the /etc/hosts file.
NOTE: The hostname should be a fully qualified domain name FQDN.
You have to edit the network config file /etc/sysconfig/network
  • Append the following NETWORKING_IPV6=no
You have to turn off the iptables for now.
  • chkconfig iptables off.
You have to disable the SELinux .
  • setenforce 0
Check the package kit /etc/yum/pluginconf.d/refresh-packagekit.conf and make the following changes.
  • Enabled=0
Make sure unmask is set to 022.

Installation steps:

We have downloaded all the rpms into one local machine and created our own repository.
Steps involved in creating a local repository,
  • yum install cretaerepo
  • cd /var/www/html
  • mkdir HDP
  • move all the downloaded rpms
  • mv rpms /var/www/html/HDP
  • createrepo –dv /var/www/html/HDP
  • repodata will be created
  • yum clean all
  • yum repolist (should list out the HDP)
NOTE: In yum.repos.d you need to set the path of the repository in repomd.xml file.
 Install Ambari-server
  • yum install ambari-server
 ambari-server setup is a command to setup the server here we have many options,
  • ambari-server setup –s –j ///path of the jdk
Ambari-server has been installed successfully and configuring is the next part which needs to be done.
Log into Apache Ambari


Give the cluster name 

Select Stack:

You will have to select the stack if you were installing without local repository but in our case we are going for local repository.

Confirm Hosts:

Register the Hostname with the server, and make sure the Host checks were performed well.


Ambari-Agent comes into picture when the host is registered. Ambari-Agent is the key thing in the communication between master and slave.

Choose services:

You have a choice left to you where you can choose whatever the service to be installed in your system.

Assign Masters:

You can assign the services to the master node. If you are registering the slave you can assign the service for the slave.

Assign Slaves and Clients:

Assign the services to the slave node.

Customize Services:

Set the required configurations for the services.


Just give you a brief message on the installation.

Install Start and Test:

Will install all the services and start.


It is just like reference.


Now you can see the services installed in the cluster. You can start and stop the services too.


You can monitor the cluster using the dashboard.


We have Heatmaps where it shows you the space checks.


You can add hosts using the option as follows,

Add Host:

If you want to add another node to the cluster click on the add host button and you can follow the steps in the wizard.

 If you are restarting your cluster then you need to start the ambari-agent in the slave machines manually.
Check with the permissions of the ssh key and also the copy-id.
 Add the slave hostnames in the Hosts file of the master.
Check with the network issues (that there is no loose connections).
Usage of the memory plays an important role (make sure there is no extra burden on the cpu).
Check the HDP.repo file and make sure that it is referring the local repository that you have created.

