Steps to Configure Hadoop and start cluster services using Ansible Playbook

Rahulkant
3 min readDec 12, 2020

Ansible is an IT automation tool. It can configure systems, deploy software, and orchestrate more advanced IT tasks such as continuous deployments or zero downtime rolling updates. Ansible’s main goals are simplicity and ease-of-use. It also has a strong focus on security and reliability, featuring a minimum of moving parts, usage of OpenSSH for transport (with other transports and pull modes as alternatives), and a language that is designed around auditability by humans–even those not familiar with the program.

To Configure Hadoop Cluster We have to First Configure Ansible, here I am going to launch one Ansible-host, one Hadoop-NameNode and two Hadoop-DataNode.

After That we will Create our inventory file which contains IP address of Datanode and Namenode,

Now, we will create Ansible configuration file ansible.cfg ,

After that we will create Ansible Playbook to configure NameNode or MasterNode

- hosts: NameNode
vars:
jdk: jdk-8u171-linux-x64.rpm
hadoop: hadoop-1.2.1–1.x86_64.rpm

tasks:
— name: “Copy jdk file”
copy:
dest: “/root/”
src: “/home/ec2-user/{{ jdk }}”

- name: “Copy hadoop file”
copy:
dest: “/root/”
src: “/home/ec2-user/{{ hadoop }}”

- name: “Install jdk”
command: “rpm -ivh {{ jdk }}”

- name: “Install hadoop”
command: “rpm -ivh {{ hadoop }} — force”

- name: “Directory Creation”
file:
path: “/root/namenode”
state: directory

- name: “Copy core file”
copy:
dest: “/etc/hadoop”
src: “/ansibleWS/Namenode/core-site.xml”

- name: “Copy hdfs file”
copy:
dest: “/etc/hadoop”
src: “/ansibleWS/Namenode/hdfs-site.xml”

- name: “Formating NameNode”
shell: echo Y | hadoop namenode -format

- name: “Start Hadoop Service”
command: “hadoop-daemon.sh start namenode"

After, our NameNode playbook is made we will run using following command:

$ansible-playbook hadoopNN.yml

Valla! Our namenode is configured now its time to make a playbook for DataNode, Let’s do that,

- hosts: DataNode
vars:
jdk: jdk-8u171-linux-x64.rpm
hadoop: hadoop-1.2.1–1.x86_64.rpm

tasks:
— name: “Copy jdk file”
copy:
dest: “/root/”
src: “/home/ec2-user/{{ jdk }}”

- name: “Copy hadoop file”
copy:
dest: “/root/”
src: “/home/ec2-user/{{ hadoop }}”

- name: “Install jdk”
command: “rpm -ivh {{ jdk }}”
ignore_errors: yes

- name: “Install hadoop”
command: “rpm -ivh {{ hadoop }} — force”
ignore_errors: yes

- name: “Directory Creation”
file:
path: “/root/datanode”
state: directory

- name: “Copy core file”
copy:
dest: “/etc/hadoop”
src: “/ansibleWS/Datanode/core-site.xml”

- name: “Copy hdfs file”
copy:
dest: “/etc/hadoop”
src: “/ansibleWS/Datanode/hdfs-site.xml”

- name: “Start Datanode Service”
command: “hadoop-daemon.sh start datanode”

Running the ansible DataNode playbook:

$ansible-playbook hadoopDN.yml

Hurrahh!! Now our Data node is also configured, let’s see our DataNode is connected to NameNode or not:

$hadoop dfsadmin -report

Finally, We can see that we are able to our Hadoop Cluster is Ready with Automation.

That’s all from my side.

Any query & suggestions, most welcome.

Keep Learning, keep growing.

--

--