Unravel installation and upgrade issues
This section provides information for troubleshooting and recovery.
Edge node fails to communicate with the core node displaying error in the daemon log: Caused by: java.io.IOException: User limit of inotify watches reached
Issue
Sometimes, the edge node fails to communicate with the core node and java.net.ConnectException: Connection refused error is displayed. When you check the daemon log, you will notice that the issue is caused when the user limit is reached for inotify watches: Caused by: java.io.IOException: User limit of inotify watches reached
Solution
To resolve this issue, you can check and increase the threshold limit of the inotify watches on the core node as follows:
On the core node, check and ensure if the max number of inotify watches has been reached.
After ensuring that the inotify watches have reached the upper limit, access the
/etc/sysctl.conf
file as a root user.Using an editor, update
/etc/sysctl.conf
file and set the kernel parameterfs.inotify.max_user_watches
to an increased limit. For example:fs.inotify.max_user_watches=524288
.Apply the changes.
sysctl -p
Upgrading from 4.6.2x, the Precheck fails for Hadoop when you activate the 4.7x version
Issue
When you are upgrading from Unravel version 4.6.2x multi-cluster environment and activate the v4.7x version, the Precheck fails with the following Hadoop error:
Solution
This is because of the com.unraveldata.multicluster.default_cluster.enabled property which indicates whether the core node is directly monitoring the Hadoop cluster or not. By default, this is property is set to true in Unravel 4.6.2x.
However, if you are not using the core node for hadoop monitoring, you must manually set this property to false before performing the upgrade in a multi-cluster environment. This will eliminate the Hadoop error in Precheck when you are upgrading in a multi-cluster environment from Unravel version 4.6.2x to 4.7x.
Before you upgrade to v4.7x, do the following:
Stop Unravel
<Unravel installation directory>/unravel/manager stop
Set the com.unraveldata.multicluster.default_cluster.enabled property to false.
<Unravel installation directory>/unravel/manager config properties set com.unraveldata.multicluster.default_cluster.enabled false
Apply the changes.
<Unravel installation directory>/unravel/manager refresh files
Start Unravel.
<Unravel installation directory>/unravel/manager start
Supplying a configuration for an instance Group in a running cluster on EMR overwrites Unravel Sensor properties added by the bootstrap script.
Issue
If you supply a configuration for an instance Group in a running cluster on EMR, it overrides the Unravel Sensor properties added by the bootstrap script.
Solution
You must add unravel properties along with the new configurations that are modified.
Diagnosing issues from log files
Whenever you face any issues during installation, you should first check the following log files to diagnose the issues:
The installation process is broken
Issue:
The installation process gets broken.
Solution:
Whenever the installation process gets broken, do the following:
Stop Unravel.
manager stop
If the manager does not work, open the
services
directory, each service has a stop.sh script. Stop the service monitor (monit). and then run the stop.sh script.In case you do not have stop.sh scripts, send SIGTERM to all the services starting with the service monitor (monit)
Caution
Avoid using SIGKILL since that may cause some file corruption.
Reinstall Unravel using the content in the
data
directory.
Files got deleted or corrupted
Issue:
The files got deleted or corrupted
Solution:
Stop Unravel.
Assuming that you have installed Unravel in
/opt
, run the following command:/opt/unravel/manager refresh files
This regenerates all the scripts and configuration files.
In case the refresh command did not regenerate the files or the manager is broken, then check
<Unravel installation directory>/data/conf/current.yaml
and run the following. The current.yaml file shows the current version that is installed.<Unravel installation directory>/versions/X.Y.Z/setup --config=<Unravel installation directory>/data/conf/unravel.yaml
Start Unravel.
<Unravel installation directory>/unravel/manager start
Unravel software got deleted
Issue:
Unravel software got deleted.
Solution:
Stop Unravel.
Check
<Unravel installation directory>/data/conf/current.yaml
for the current version that is installed.Unpack that same version in the exact location where it was deployed earlier.
tar zxf unravel-SAME-VERSION.tar.gz -C /opt
Run the following:
<Unravel installation directory>/versions/X.Y.Z/setup --config=<Unravel installation directory>/unravel/data/conf/unravel.yaml
Start the manager.
<Unravel installation directory>/unravel/manager start
Restoring Unravel from a backup
Issue:
How to restore Unravel from a backup?
Solution:
Stop Unravel.
Restore the backup of the data directory.
Open
data/conf/unravel.effective.yaml
and check for the following key paths:base:
<Unravel installation directory>
data:
<Unravel installation directory>/data
Make sure that the
data
is restored to the right location.Make sure the unravel user has full access and ownership of the
base
location and everything in it.Check< Unravel installation directory>
/data/conf/current.yaml
for the current version that is installed.Unpack that same version in the exact location where it was deployed earlier.
tar zxf unravel-SAME-VERSION.tar.gz -C /opt
Run the following:
<Unravel installation directory>/versions/X.Y.Z/setup --config=<Unravel installation directory>/data/conf/unravel.yaml
Start Unravel.
<Unravel installation directory>/manager start
Troubleshooting Cloudera Distribution of Apache Hadoop (CDH) issues
Symptom | Problem | Remedy |
---|---|---|
|
| Install Unravel RPM on Unravel host. or Verify that user |
| Unravel hive hook JAR was not found in in | Confirm that the or Put the Unravel hive-hook JAR corresponding to cd /usr/local/unravel/hive-hook/; cp unravel-hive- |
Oozie shell action fails with ClassNotFoundException on Hcat call after Unravel Hive Hooks were added to the cluster
HCatalog is part of Apache Hive. In such a case, the Hive Hook configuration is found, but the libraries that execute Hive Hook are missing.
Since this is a shell action, libraries need to exist on every node locally so that Sqoop command can locate it during command execution. You can add Unravel Hive Hook jar in /var/lib/sqoop
or wherever the hive-hcatalog jars are located in the cluster.
Unravel stop and start fails with an error
Issue:
When Unravel is stopped and restarted immediately, the following error is displayed:
[Errno 1] Operation not permitted [Errno 1] Operation not permitted INS00160: Process '3366' is not owned by unravel INS00161: Process '3366' is not owned by unravel, this can come from a stale pid file '/opt/unravel/run/mysql.pid'
Solution:
When you do an ungraceful shutdown, the PID files will remain, and if the PID is reused, it may cause problems. You should ensure that unravel is stopped (it will if the server was just restarted) and delete the PID files in /opt/unravel/run
Rolling back after a failed upgrade
Issue:
The upgrade of Unravel fails.
Solution:
You can roll back the upgrade of Unravel to the release from which you upgraded.
Caution
Ensure that you have taken a backup of the data
directory and the external database (if using). For instructions, see Upgrading Unravel.
To revert to the original upgrade, perform the following steps:
Stop Unravel.
<Unravel installation directory>
/unravel/manager stopRestore the
data
directory from the archive.If using an external database, restore the database from the backup.
Note
For the details on how to restore the database, refer to the documentation for your corresponding database.
Run the following command to reconfigure Unravel using the restored data:
/opt/unravel/versions/
<ORIGINAL VERSION>
/setup --config /opt/unravel/data/conf/unravel.yaml