Hadoop Elasticity Using LVM😎

Upmanyu Sharma
6 min readNov 21, 2020

In this article, We are going to learn how we can integrate LVM with Hadoop and provide Elasticity to DataNode Storage.🧐

We are going to use the Ec2 instances on AWS.

First of all, we have to configure Hadoop Namenode and Datanode.

You must ensure that you have jdk and hadoop software package installed in your instance on AWS.

How to Configure the Name Node and Datanode?

We have to change configuration in hdfs-site.xml and core-site.xml file in both Name Node and Data Node

Name Node (hdfs-site.xml)

<?xml version=”1.0"?>

<?xml-stylesheet type=”text/xsl” href=”configuration.xsl”?>

<! — Put site-specific property overrides in this file. →

<configuration>

<property>

<name>dfs.name.dir</name>

<value>/n1</value> (Here n1 is name of folder created in root(/) directory)

</property>

</configuration>

Data Node (hdfs-site.xml)

<?xml version=”1.0"?>

<?xml-stylesheet type=”text/xsl” href=”configuration.xsl”?>

<! — Put site-specific property overrides in this file. →

<configuration>

<property>

<name>dfs.data.dir</name>

<value>/dn1</value>(Here dn1 is name of folder created in root(/) directory of datanode instance)

</property>

</configuration>

Core-site.xml(Namenode)

<?xml version=”1.0"?>

<?xml-stylesheet type=”text/xsl” href=”configuration.xsl”?>

<! — Put site-specific property overrides in this file. →

<configuration>

<property>

<name>fs.default.name</name>

<value>hdfs://0.0.0.0:9001</value>

</property>

</configuration>

Core-site.xml(Datanodes and clients)

<?xml version=”1.0"?>

<?xml-stylesheet type=”text/xsl” href=”configuration.xsl”?>

<! — Put site-specific property overrides in this file. →

<configuration>

<property>

<name>fs.default.name</name>

<value>hdfs://13.233.21.212:9001</value>(Here 13.233.21.212 is IP Address of my Name node and 9001 is the port number)

</property>

After configuration , format the name by using command : hadoop namenode -format

And then start the namenode and datanode by using commad:

hadoop-daemon.sh start namenode (for name node)

hadoop-daemon.sh start datanode (for datanode)

— — — — — — — — — — — — — — — — — — — — — — — — — — — — — — —

What is LVM?

LVM is a tool for logical volume management which includes allocating disks, striping, mirroring, and resizing logical volumes.

With LVM, a hard drive or set of hard drives is allocated to one or more physical volumes. LVM physical volumes can be placed on other block devices which might span two or more disks.

The physical volumes are combined into logical volumes, with the exception of the /boot partition. The /boot partition cannot be on a logical volume group because the boot loader cannot read it. If the root (/) partition is on a logical volume, create a separate /boot partition which is not a part of a volume group.

Since a physical volume cannot span over multiple drives, to span over more than one drive, create one or more physical volumes per drive.

— — — — — — — — — — — — — — — — — — — — — — — — — — — — — — —

Now, we are going to attach the volumes to the Data Node that we configured for the elasticity purpose.

You can see the volumes attached in your data node by writing the command: fdisk -l

Firstly we have to create a Physical Volume,

For that use command: pvcreate <disk-name>

We can use the command: pvdisplay to display information of the physical volume created.

Now we have to create a volume group for that.

For that use command: vgcreate <name of volume-group> <disk-name>

We can use the command: vgdisplay to display information of the volume group created.

Now we can create Logical Volume from volume group create ,

For that use commad : lvcreate -n <name of LV> -L <size of LV> <name of VG>

Eg: lvcreate -n lvm1 -L 11G vggroup

Logical Volume

You can use the command: lvdisplay to display information of the logical volume created.

Now to use the Logical Volume that we have created, we have to first format the Logical Group.

Formatting the Logical Volume

Now finally mount the Logical Volume on folder /dn1 of Data Node.

Mounting the volume on the folder

We have to check the report of the Hadoop Cluster.

Here you can see the size of storage contributed by Data Node which is 11GB(10.76 GB).

What if we want to increase the size of capacity.

For that, we have to increase the size of the Logical Volume created in Data node.

Logical Volume Entension
Resize the file system

Now again check the Hadoop Report, we can see the increased size on the fly below:

Now, what if there is a requirement to decrease the size.

Firstly Unmount the mount point and Check the file system and repair.

Then reduce the size of Logical Volume.

We have to mount the volume group on the folder dn1

Now, we can see the reduced size from 19G to 14G on Hadoop Report.

What if we want to extend the size by 19G more means 14+19=34GB but we have only 25GB in the volume group.

We will get an Insufficient space error.

What’s the solution for this.

For this, we have to create another Physical Volume and add it to the Volume Group that we already by extending VG.

Then we can extend the LV.

To extend we can use the command: vgextend <name of VG> <>

Now we can see the size of the increased VG to 60 GB below.

Now we can further increase the size of LV till the size of the Volume Group.

So finally we can see the increased size i.e 43 GB.

Important Point: If we want to increase the size of the Logical Volume beyond the size of the Volume Group, we have to create another Physical Volume and extend it to the Volume Group. And then finally we can increase the size of Logical Volume.

Thank You for reading my article.😊

--

--