Troubleshooting

This section provides information for troubleshooting and recovery.

Upgrading from 4.6.2x, the Precheck fails for Hadoop when you activate the 4.7x version

Issue

When you are upgrading from Unravel version 4.6.2x multi-cluster environment and activate the v4.7x version, the Precheck fails with the following Hadoop error:

Solution

This is because of the com.unraveldata.multicluster.default_cluster.enabled property which indicates whether the core node is directly monitoring the Hadoop cluster or not. By default, this is property is set to true in Unravel 4.6.2x.

However, if you are not using the core node for hadoop monitoring, you must manually set this property to false before performing the upgrade in a multi-cluster environment. This will eliminate the Hadoop error in Precheck when you are upgrading in a multi-cluster environment from Unravel version 4.6.2x to 4.7x.

Before you upgrade to v4.7x, do the following:

Stop Unravel

<Unravel installation directory>/unravel/manager stop

Set the com.unraveldata.multicluster.default_cluster.enabled property to false.

<Unravel installation directory>/unravel/manager config properties set com.unraveldata.multicluster.default_cluster.enabled false

Apply the changes.

<Unravel installation directory>/unravel/manager refresh files

Start Unravel.

<Unravel installation directory>/unravel/manager start

Diagnosing issues from log files

Whenever you face any issues during installation, you should first check the following log files to diagnose the issues:

The following log files are located in the <Unravel installation directory>/logs directory.

Log files	Description
`manager.log`	All the commands that run through manager tool generate logs into this file. To diagnose thoroughly, run the manager commands with --debug. The setup process is run due to some commands. These get logged in the `setup.log` file.
`setup.log`	When you are doing a first-time install, before the directory structure is created, this log file exists as `/tmp/unravel-setup-YYYYMMDD-HHMMSS.log`. This file exists in case the installation gets broken early in the process. As soon as the log directory is created, the logging location moves to `Unravel installation directory/logs/setup.log` and the contents of the `/tmp/unravel-setup-YYYYMMDD-HHMMSS.log` file is copied to `setup.log`. 2020-06-09 21:23:12 Sending logs to: /tmp/unravel-setup.log 2020-06-09 21:23:12 Prepare to install with: /home/unravel/edge.yaml 2020-06-09 21:23:14 Sending logs to: <Unravel installation directory>/logs/setup.log
`monit.log`	This log file is used to track the status of the services and restart in case the services get stopped due to any issues. Normally logs generated by the stop/start script go to their respective `control.log` file, but in some cases, you can find those logs here.
`.control.log`	Each of the unravel service has a stop/start script that logs to a `control.log`file. For example, Postgres logs to `pgsql.control.log`. This log file provides information generated by the stop/start scripts and any information that is given out to standard out stream or standard error stream. Normal Postgres activities are logged into `pgsql.log` 2020-06-11 18:28:54 INFO * Starting pgsql... waiting for server to start....2020-06-11 11:28:54.773 PDT [36976] LOG: listening on IPv4 address "0.0.0.0", port 4339 2020-06-11 11:28:54.773 PDT [36976] LOG: listening on IPv6 address "::", port 4339 2020-06-11 11:28:54.773 PDT [36976] LOG: listening on Unix socket "<Unravel installation directory>/run/.s.PGSQL.4339" 2020-06-11 11:28:54.860 PDT [36976] LOG: redirecting log output to logging collector process 2020-06-11 11:28:54.860 PDT [36976] HINT: Future log output will appear in directory "<Unravel installation directory>/logs". done server started 2020-06-11 18:28:54 INFO * Starting pgsql... Started 2020-06-11 18:30:07 INFO * Stopping pgsql... waiting for server to shut down.... done server stopped 2020-06-11 18:30:08 INFO * Stopping pgsql... Stopped

The installation process is broken

Issue:

The installation process gets broken.

Solution:

Whenever the installation process gets broken, do the following:

Stop Unravel.
```
manager stop
```
If the manager does not work, open the services directory, each service has a stop.sh script. Stop the service monitor (monit). and then run the stop.sh script.
In case you do not have stop.sh scripts, send SIGTERM to all the services starting with the service monitor (monit)
Caution
Avoid using SIGKILL since that may cause some file corruption.
Reinstall Unravel using the content in the data directory.

Files got deleted or corrupted

Issue:

The files got deleted or corrupted

Solution:

Stop Unravel.
Assuming that you have installed Unravel in /opt, run the following command:
```
/opt/unravel/manager refresh files
```
This regenerates all the scripts and configuration files.
In case the refresh command did not regenerate the files or the manager is broken, then check <Unravel installation directory>/data/conf/current.yaml and run the following. The current.yaml file shows the current version that is installed.
```
<Unravel installation directory>/versions/X.Y.Z/setup --config=<Unravel installation directory>/data/conf/unravel.yaml
```

Start Unravel.

<Unravel installation directory>/unravel/manager start

Unravel software got deleted

Issue:

Unravel software got deleted.

Solution:

Stop Unravel.
Check <Unravel installation directory>/data/conf/current.yaml for the current version that is installed.
Unpack that same version in the exact location where it was deployed earlier.
```
tar zxf unravel-SAME-VERSION.tar.gz -C /opt
```

Run the following:

<Unravel installation directory>/versions/X.Y.Z/setup --config=<Unravel installation directory>/unravel/data/conf/unravel.yaml

Start the manager.

<Unravel installation directory>/unravel/manager start

Restoring Unravel from a backup

Issue:

How to restore Unravel from a backup?

Solution:

Stop Unravel.
Restore the backup of the data directory.
Open data/conf/unravel.effective.yaml and check for the following key paths:
- base: <Unravel installation directory>
- data: <Unravel installation directory>/data
Make sure that the data is restored to the right location.
Make sure the unravel user has full access and ownership of the base location and everything in it.
Check< Unravel installation directory>/data/conf/current.yaml for the current version that is installed.
Unpack that same version in the exact location where it was deployed earlier.
```
tar zxf unravel-SAME-VERSION.tar.gz -C /opt
```

Run the following:

<Unravel installation directory>/versions/X.Y.Z/setup --config=<Unravel installation directory>/data/conf/unravel.yaml

Start Unravel.

<Unravel installation directory>/manager start

Troubleshooting Cloudera Distribution of Apache Hadoop (CDH) issues

Symptom	Problem	Remedy
`hadoop fs -ls /user/unravel/HOOK_RESULT_DIR/` indicates that the directory does not exist	Unravel Server RPM is not yet installed, or Unravel Server RPM is installed on a different HDFS cluster, or HDFS home directory for Unravel does not exist, or kerberos/sentry actions are needed	Install Unravel RPM on Unravel host. or Verify that user `unravel` user exists and has a `/user/unravel/` directory in HDFS with write access to it.
`ClassNotFound` error for com.unraveldata.dataflow.hive.hook.UnravelHiveHook during Hive query execution	Unravel hive hook JAR was not found in in `$HIVE_HOME/lib/`.	Confirm that the `UNRAVEL_SENSOR` parcel was distributed and activated in Cloudera Manager. or Put the Unravel hive-hook JAR corresponding to `hive-version` in `jar-destination` on each gateway as follows: cd /usr/local/unravel/hive-hook/; cp unravel-hive-`hive-version`*hook.jar `jar-destination`

Symptom

Problem

Remedy

hadoop fs -ls /user/unravel/HOOK_RESULT_DIR/ indicates that the directory does not exist

Unravel Server RPM is not yet installed, or
Unravel Server RPM is installed on a different HDFS cluster, or
HDFS home directory for Unravel does not exist, or
kerberos/sentry actions are needed

Install Unravel RPM on Unravel host.

Verify that user unravel user exists and has a /user/unravel/ directory in HDFS with write access to it.

ClassNotFound error for com.unraveldata.dataflow.hive.hook.UnravelHiveHook during Hive query execution

Unravel hive hook JAR was not found in in $HIVE_HOME/lib/.

Confirm that the UNRAVEL_SENSOR parcel was distributed and activated in Cloudera Manager.

Put the Unravel hive-hook JAR corresponding to hive-version in jar-destination on each gateway as follows:

cd /usr/local/unravel/hive-hook/;
cp unravel-hive-hive-version*hook.jar jar-destination

Oozie shell action fails with ClassNotFoundException on Hcat call after Unravel Hive Hooks were added to the cluster

HCatalog is part of Apache Hive. In such a case, the Hive Hook configuration is found, but the libraries that execute Hive Hook are missing.

Since this is a shell action, libraries need to exist on every node locally so that Sqoop command can locate it during command execution. You can add Unravel Hive Hook jar in /var/lib/sqoop or wherever the hive-hcatalog jars are located in the cluster.

Unravel stop and start fails with an error

Issue:

When Unravel is stopped and restarted immediately, the following error is displayed:

[Errno 1] Operation not permitted
[Errno 1] Operation not permitted
INS00160: Process '3366' is not owned by unravel
INS00161: Process '3366' is not owned by unravel, this can come from a stale pid file '/opt/unravel/run/mysql.pid'

Solution

When you do an ungraceful shutdown, the PID files will remain and if the PID is reused it may cause problems. You should ensure that unravel is stopped (it will if the server was just restarted) and delete the PID files in /opt/unravel/run

Amazon EMR: Unravel sensor properties are overwritten when a configuration is supplied for an Instance group on a running cluster

Issue:

When you supply a configuration for an Instance group in a running cluster, the Unravel sensor properties added by the bootstrap script get overwritten.

Solution

Add Unravel properties along with the new configurations.

In this section:

Home

Troubleshooting

Upgrading from 4.6.2x, the Precheck fails for Hadoop when you activate the 4.7x version

Issue

Solution

Diagnosing issues from log files

The installation process is broken

Issue:

Solution:

Caution

Files got deleted or corrupted

Issue:

Solution:

Unravel software got deleted

Issue:

Solution:

Restoring Unravel from a backup

Issue:

Solution:

Troubleshooting Cloudera Distribution of Apache Hadoop (CDH) issues

Oozie shell action fails with ClassNotFoundException on Hcat call after Unravel Hive Hooks were added to the cluster

Unravel stop and start fails with an error

Issue:

Solution

Amazon EMR: Unravel sensor properties are overwritten when a configuration is supplied for an Instance group on a running cluster

Issue:

Solution

Search results