Completely Uninstall and Remove HDP from Nodes

Sometimes you might find yourself in a situation where it becomes inevitable to clean up a node from a HDP install. Just like most installs are never really the same, cleaning a node from an install is not a straight path. As the documentation advises, to remove the installed packages using the systems package manager, is a good start. But some folders might remain and databases will be ignored.

This post does not make any guarantees about completeness cleaning HDP from any node. If there is a requirement to do the same on multiple nodes, it would be beneficial to setup a distributed shell environment with pdsh.

1. Stopping all Services

First you would need to stop all HDP services still running. Doing so via the Ambari control panel is a good idea, but if this does not work or is not stopping all services the below commands will make sure to stop all services still running and containing either ‘hdp’ or ‘hadoop’ as part or their dependency.

$ kill -KILL `ps aux | grep hdp | awk '{ print $2 }'`
$ kill -KILL `ps aux | grep hadoop | awk '{ print $2 }'`

Please be aware that this might effect other services that were not part of a HDP install, but still contain ‘hdp’ or ‘hadopp’ in their runtime environment.

2. Ambari Agent Cleanup Script

The Ambari agent contains a host cleanup script named HostCleanup.py  that already makes sure to cleanup most of the components part of a HDP install. For more guidance on how to use it you can refer to the help output of the script:

python /usr/lib/python2.6/site-packages/ambari_agent/HostCleanup.py -h

The script will try to delete and clean packages, users, directories, repositories, processes, and alternatives on the node.

Usage: HostCleanup.py [options]

Options:
  -h, --help            show this help message and exit
  -v, --verbose         output verbosity.
  -f FILE, --file=FILE  host check result file to read.
  -o FILE, --out=FILE   log file to store results.
  -k SKIP, --skip=SKIP  (packages|users|directories|repositories|processes|alt
                        ernatives). Use , as separator.
  -s, --silent          Silently accepts default prompt values

If you for example plan to keep the users as part of a new install this could be done like this:

$ python /usr/lib/python2.6/site-packages/ambari_agent/HostCleanup.py -s -k users

3. Remove Packages

The cleanup script usually does remove a bunch of the installed packages. You can also follow the steps provided in this documentation to remove the packages manually.

But still some package could be left that is why the below command would make sure those get removed as well:

$ yum erase -y `yum list | grep @HDP-2 | awk '{ print $1 }'`
$ yum erase -y `yum list | grep 2_3_ | awk '{ print $1 }'`

Please note that the commands makes certain assumption about the HDP version being installed. In this case HDP 2.3. If your version is different from that just adjust the HDP-2 and 2_3 from above.

4. Clean Folders

If you followed the above steps you would already have a pretty clean environment. But there still might be some folders left you would like to clean from your system. There are some folders like the JBOD drives attached to a DataNode, that no automated process is likely going to offer to delete. It’s just to risky to introduce data loss at that stage.

Deleting folders should rarely be done in a automated script and always with caution. Please verify, if the below list fits to what you want to achieve prior to execution:

rm -rf 
# Log dirs
/var/log/ambari-metrics-monitor
/var/log/hadoop
/var/log/hbase
/var/log/hadoop-yarn 
/var/log/hadoop-mapreduce
/var/log/hive 
/var/log/oozie 
/var/log/zookeeper 
/var/log/flume 
/var/log/hive-hcatalog
/var/log/falcon 
/var/log/knox 
/var/lib/hive 
/var/lib/oozie

# DataNode HDFS dirs
/grid*/hadoop 

# Hadoop dirs
/usr/hdp
/usr/bin/hadoop  
/tmp/hadoop 
/var/hadoop 
/hadoop/*
/local/opt/hadoop

# Config dirs
/etc/hadoop
/etc/hbase
/etc/oozie
/etc/phoenix
/etc/hive 
/etc/zookeeper 
/etc/flume 
/etc/hive-hcatalog 
/etc/tez 
/etc/falcon 
/etc/knox 
/etc/hive-webhcat 
/etc/mahout 
/etc/pig
/etc/hadoop-httpfs

# PIDs
/var/run/hadoop
/var/run/hbase
/var/run/hadoop-yarn
/var/run/hadoop-mapreduce
/var/run/hive 
/var/run/oozie 
/var/run/zookeeper 
/var/run/flume
/var/run/hive-hcatalog 
/var/run/falcon 
/var/run/webhcat 
/var/run/knox 

# ZK db files 
/local/home/zookeeper/*         

# libs
/usr/lib/flume 
/usr/lib/storm 
/var/lib/hadoop-hdfs 
/var/lib/hadoop-yarn 
/var/lib/hadoop-mapreduce  
/var/lib/flume 
/var/lib/knox

# other
/var/tmp/oozie

Depending on other services you might have installed, you should check,
/etc/<service_name>
/usr/lib/<service_name>
/var/lib/<service_name>
/var/log/<service_name>
/var/run/<service_name>
/var/tmp/<service_name>
/tmp/<service_name>

if they exist and should be removed.

5. Clean Repository

Especially if you are planing a new install on the same node it becomes important to clean the repositories to ensure no conflicting resolution of packages.

For Red Hat:

rm -rf /etc/yum.repos.d/HDP*.repo
yum clean all

Further Reading

Leave a comment