Installing Unravel in a multi-cluster environment

The Multi-cluster feature allows you to manage multiple clusters from a single Unravel installation. Unravel 4.7 supports managing one or more clusters of the same cluster type. Supported cluster types include Cloudera Distribution of Apache Hadoop (CDH), Cloudera Data Platform (CDP), Hortonworks Data Platform (HDP), and Amazon Elastic MapReduce (EMR).

Note

Unravel multi-cluster support is available only for fresh installs. Unravel does not support multi-cluster management of combined on-prem and cloud clusters.

Multi-cluster deployment involves installing Unravel on the core node and one or more edge nodes. To know more about the multi-cluster architecture, refer to the Multi-cluster deployment layout. The following image depicts the basic layout of multi-cluster deployment.

To install and configure Unravel in a multi-cluster setup, check that the pre-requisites are fulfilled, and then do the following:

After the installation, you can also set additional configurations if required.

1. Install and set up Unravel on core node

To install and setup Unravel on the core node, do the following:

Notice

If you already have a single cluster installation of Unravel, you can skip the following instructions to set up Unravel on a core node and proceed to Step 2 (Install and set up Unravel on the edge node.)

1. Download Unravel

2. Deploy Unravel binaries

Unravel binaries are available as a tar file or RPM package. You can deploy the Unravel binaries in any directory on the server. However, the user who installs Unravel must have the write permissions to the directory where the Unravel binaries are deployed.

If the binaries are deployed to <Unravel_installation_directory>, Unravel will be available in <Unravel_installation_directory>/unravel. The directory layout for the Tar and RPM will be unravel/versions/<Directories and files>.

Option 1: Deploy Unravel from a tar file

The following steps to deploy Unravel from a tar file must be performed by a user who will run Unravel.

Create an Installation directory.
```
mkdir /path/to/installation/directory
```
For example: mkdir /opt/unravel
Note
Some locations may require root access to create a directory. In such a case, after the directory is created, change the ownership to unravel user and continue with the installation procedure as the unravel user.
```
chown -R username:groupname /path/to/installation/directory
```
For example: chown -R unravel:unravelgroup /opt/unravel
Extract and copy the Unravel tar file to the installation directory, which was created in the first step. After you extract the contents of the tar file, unravel directory is created within the installation directory.
```
tar -zxf unravel-<version>tar.gz -C /<Unravel-installation-directory>
```
For example: tar -zxf unravel-4.7.x.x.tar.gz -C /opt The Unravel directory will be available within /opt

Option 2: Deploy Unravel from an RPM package

Important

A root user should perform the following steps to deploy Unravel from an RPM package. After the RPM package is deployed, the remaining installation procedures should be performed by the unravel user.

Create an installation directory.
```
mkdir /usr/local/unravel
```
Run the following command:
```
rpm -i unravel-<version>.rpm
```
For example: rpm -i unravel-4.7.x.x.rpm
In case you want to provide a different location, you can do so by using the --prefix command. For example:
```
mkdir /opt/unravel
chown -R username:groupname /opt/unravel
rpm -i unravel-4.7.0.0.rpm --prefix /opt
```
The Unravel directory is available in /opt.
Grant ownership of the directory to a user who runs Unravel. This user executes all the processes involved in Unravel installation.
```
chown -R username:groupname /usr/local/unravel
```
For example: chown -R unravel:unravelgroup /usr/local/unravel The Unravel directory is available in /usr/local.
Continue with the installation procedures as Unravel user.

3. Run setup

You can install Unravel with an Unravel-supported database using the setup command. The setup command can be run with many other options. Refer to Setup options.

Whenever the setup command is run, the Precheck program is run automatically. This program detects issues that prevented a successful installation and provides suggestions to resolve them. Check Precheck filters for the expected value for each filter.

The setup command can be run with additional parameters to install Unravel with any of the following Unravel-supported databases. Unravel managed PostgreSQL, shipped with Unravel, does not need extra parameters with the setup command.

Tip

The Unravel data and configurations are located in the data directory. By default, the installer maintains the data directory under <Unravel installation directory>/unravel/data.

It is recommended to keep the data directory outside the unravel directory. To provide a different data directory location, other than the default location, pass an extra parameter (--data-directory path/to/the/data/directory) with the setup command.

Install Unravel with PostgreSQL

This section provides instructions to install Unravel with Unravel managed PostgreSQL and external PostgreSQL.

Unravel Managed PostgreSQL

After deploying the binaries, if you are the root user, switch to Unravel user.
```
  su - <unravel user>
```
Run setup command and pass extra parameters to install Unravel with any of the following databases.
Note
The commands are different based on whether the core node, where you run the setup command, is a Hadoop client node or not.
- Core node with Hadoop configuration
```
<Unravel installation directory>/unravel/versions/<Unravel version>/setup
##Example: /opt/unravel/versions/abcd.1234/setup
```
- Core node without Hadoop configuration
```
<Unravel installation directory>/unravel/versions/<Unravel version>/setup --enable-core
##Example: /opt/unravel/versions/abcd.1234/setup --enable-core
```
Precheck is run automatically when you run the setup command. Check the Precheck output to view issues that prevented a successful installation and suggestions to resolve them. You must resolve each of the issues before proceeding. See Precheck filters.
After the prechecks are resolved, you must re-login or reload the shell to execute the setup command again.

Notice

If you are using Unravel managed PostgreSQL database and the Hive metastore is using MySQL, refer Set up Unravel Managed PostgreSQL for Hive metastore with MySQL

External PostgreSQL

After deploying the binaries, if you are the root user, switch to Unravel user.
```
  su - <unravel user>
```

Run setup command and pass extra parameters to install Unravel with any of the following databases.

Note

The commands are different based on whether the core node, where you run the setup command, is a Hadoop client node or not.

Core node with Hadoop configuration

<unravel_installation_directory>/versions/<Unravel version>/setup --external-database postgresql <HOST> <PORT> <SCHEMA> <USERNAME> <PASSWORD>
##Example: /opt/unravel/versions/abcd.1234/setup --external-database postgresql xyz.unraveldata.com 5432 unravel_db_prod unravel unraveldata

Core node without Hadoop configuration

<unravel_installation_directory>/versions/<Unravel version>/setup --enable-core --external-database postgresql <HOST> <PORT> <SCHEMA> <USERNAME> <PASSWORD>
##Example: /opt/unravel/versions/abcd.1234/setup --enable-core --external-database postgresql xyz.unraveldata.com 5432 unravel_db_prod unravel unraveldata

Notice

The <HOST> <PORT> <SCHEMA> <USERNAME> <PASSWORD> are optional fields and are prompted if missing.

Precheck is run automatically when you run the setup command. Check the Precheck output to view issues that prevented a successful installation and suggestions to resolve them. You must resolve each of the issues before proceeding. See Precheck filters.
After the prechecks are resolved, you must re-login or reload the shell to execute the setup command again.
Note
In certain situations, you can skip the precheck using the setup --skip-precheck command.
For example:
```
/opt/unravel/versions/<Unravel version>/setup --skip-precheck
```
You can also skip the checks that you know can fail. For example, if you want to skip the Check limits option and the Disk freespace option, pick the command within the parenthesis corresponding to these failed options and run the setup command as follows:
```
setup --filter-precheck ~check_limits,~check_freespace 
```

Install Unravel with MySQL

This section provides instructions to install Unravel with Unravel managed MySQL and external MySQL.

Unravel managed MySQL

Important

The following packages must be installed for fulfilling the OS level requirements for MySQL:

numactl-libs (for libnuma.so)
libaio (for libaio.so)

After deploying the binaries, if you are the root user, switch to Unravel user.
```
  su - <unravel user>
```
Create a directory. For example, mysql directory in /tmp. Provide permissions and make them accessible to the user who installs Unravel.
Download the following tar files (MySQL 5.7 or 8.0) to this directory and provide the directory path when you run setup to install Unravel with Unravel managed MySQL.
For MySQL 5.7
- mysql-5.7.x-linux-glibc2.12-x86_64.tar.gz or
- mysql-connector-java-5.1.x.tar.gz
For MySQL 8.0
- mysql-8.0.x-linux-glibc2.12-x86_64.tar.gz
- mysql-connector-java-8.0.x.tar.gz

Run setup command and pass extra parameters to install Unravel with any of the following databases.

Note

The commands are different based on whether the core node, where you run the setup command, is a Hadoop client node or not.

Core node with Hadoop configuration

<unravel_installation_directory>/versions/<Unravel version>/setup --extra /path/to/directory with MySQL tar files
##Example: /opt/unravel/versions/abcd.1234/setup --extra /tmp/mysql

Core node without Hadoop configuration

<unravel_installation_directory>/versions/<Unravel version>/setup --enable-core --extra /path/to/directory with MySQL tar files
##Example: /opt/unravel/versions/abcd.1234/setup --enable-core --extra tmp/mysql

Precheck is run automatically when you run the setup command. Check the Precheck output to view issues that prevented a successful installation and suggestions to resolve them. You must resolve each of the issues before proceeding. See Precheck filters.
After the prechecks are resolved, you must re-login or reload the shell to execute the setup command again.

External MySQL

For installing Unravel with an external MySQL database, you must provide the JDBC connector. This can either be as a tar file or as a jar file.

Create unravel schema and user on the target database where the unravel user should have full access to the schema.

##Example:
CREATE DATABASE unravel_mysql_prod;
CREATE USER 'unravel'@'localhost' IDENTIFIED BY 'password';
GRANT ALL PRIVILEGES ON unravel_mysql_prod.* TO 'unravel'@'localhost';

Create a directory for MySQL in /tmp. Provide permissions and make them accessible to the user who installs Unravel.
Add the JDBC connector to /tmp/<MySQL-directory/jdbcconnector> directory. This can be either a tar file or a jar file.

Run the setup command, as an Unravel user, to install Unravel with an external MySQL database:

Note

The commands are different based on whether the core node, where you run the setup command, is a Hadoop client node or not.

Core node with Hadoop configuration

<unravel_installation_directory>/versions/<Unravel version>/setup --extra /tmp/<mysql-jdbc-directory> --external-database mysql <HOST> <PORT> <SCHEMA> <USERNAME> <PASSWORD>
##Example: /opt/unravel/versions/abcd.1234/setup --extra /tmp/mysql-jdbc-connector --external-database mysql xyz.unraveldata.com 3306 unravel_db_prod unravel unraveldata

Core node without Hadoop configuration

<unravel_installation_directory>/versions/<Unravel version>/setup --enable-core --extra /tmp/<mysql-jdbc-directory> --external-database mysql <HOST> <PORT> <SCHEMA> <USERNAME> <PASSWORD>
##Example: /opt/unravel/versions/abcd.1234/setup  --enable-core --extra /tmp/mysql-jdbc-connector --external-database mysql xyz.unraveldata.com 3306 unravel_db_prod unravel unraveldata

Notice

The <HOST> <PORT> <SCHEMA> <USERNAME> <PASSWORD> are optional fields and are prompted if missing.

Precheck is run automatically when you run the setup command. Check the Precheck output to view issues that prevented a successful installation and suggestions to resolve them. You must resolve each of the issues before proceeding. See Precheck filters.
After the prechecks are resolved, you must re-login or reload the shell to execute the setup command again.

Install Unravel with MariaDB

This section provides instructions to install Unravel with Unravel managed MariaDB and external MariaDB.

Unravel managed MariaDB

Important

The following packages must be installed for fulfilling the OS level requirements for MariaDB:

numactl-libs (for libnuma.so)
libaio (for libaio.so)

After deploying the binaries, if you are the root user, switch to Unravel user.
```
  su - <unravel user>
```
Create a directory. For example, mariadb directory in /tmp. Provide permissions and make them accessible to the user who installs Unravel.
Download the following tar files to this directory and provide the directory path when you run setup to install Unravel with Unravel managed MariaDB.
- mariadb-<version>-linux-x86_64.tar.gz
- mariadb-java-client-<version>.jar

Run setup command and pass extra parameters to install Unravel with any of the following databases.

Note

The commands are different based on whether the core node, where you run the setup command, is a Hadoop client node or not.

Core node with Hadoop configuration

<unravel_installation_directory>/versions/<Unravel version>/setup --extra /path/to/directory with MariaDB files
##For example: /opt/unravel/versions/abcd.1234/setup --extra /tmp/mariadb

Core node without Hadoop configuration

<unravel_installation_directory>/versions/<Unravel version>/setup --enable-core --extra /path/to/directory with MariaDB files
##Example: /opt/unravel/versions/abcd.1234/setup --enable-core --extra tmp/mariadb

Precheck is run automatically when you run the setup command. Check the Precheck output to view issues that prevented a successful installation and suggestions to resolve them. You must resolve each of the issues before proceeding. See Precheck filters.
After the prechecks are resolved, you must re-login or reload the shell to execute the setup command again.

External MariaDB

Create unravel schema and user on the target database where the unravel user should have full access to the schema.

##Example:
CREATE DATABASE unravel_mariadb_prod;
CREATE USER 'unravel'@'localhost' IDENTIFIED BY 'password';
GRANT ALL PRIVILEGES ON unravel_mariadb_prod.* TO 'unravel'@'localhost';

Create a directory. For example, mariadb directory in /tmp. Provide permissions and make them accessible to the user who installs Unravel.
Add the JDBC connector to /tmp/<MariaDB-directory/jdbcconnector> directory. This can be either a tar file or a jar file.

Run the setup command, as an Unravel user, to install Unravel with an external MariaDB database:

Note

The commands are different based on whether the core node, where you run the setup command, is a Hadoop client node or not.

Core node with Hadoop configuration

<unravel_installation_directory>/versions/<Unravel version>/setup --extra /path/to/mariadb-jdbc-directory --external-database mariadb <HOST> <PORT> <SCHEMA> <USERNAME> <PASSWORD>
##Example: /opt/unravel/versions/abcd.1234/setup --extra /tmp/mariadb-jdbc-connector --external-database mariadb xyz.unraveldata.com 3306 unravel_db_prod unravel unraveldata

Core node without Hadoop configuration

<unravel_installation_directory>/versions/<Unravel version>/setup --enable-core --extra /path/to/mariadb-jdbc-directory --external-database mariadb <HOST> <PORT> <SCHEMA> <USERNAME> <PASSWORD>
##Example: /opt/unravel/versions/abcd.1234/setup --enable-core --extra /tmp/mariadb-jdbc-connector --external-database mariadb xyz.unraveldata.com 3306 unravel_db_prod unravel unraveldata

Notice

The <HOST> <PORT> <SCHEMA> <USERNAME> <PASSWORD> are optional fields and are prompted if missing.

Precheck is run automatically when you run the setup command. Check the Precheck output to view issues that prevented a successful installation and suggestions to resolve them. You must resolve each of the issues before proceeding. See Precheck filters.
After the prechecks are resolved, you must re-login or reload the shell to execute the setup command again.

Setup Options

Tip

Run --help with the setup command and any combination of the setup command for complete usage details.

<unravel_installation_directory>/versions/<Unravel version>/setup --help

setup Options	Description
-h, --help	Shows help for setup.
--config CONFIG	Specify a different path to the configuration file. <unravel_installation_directory>/unravel/versions/`<Unravel version>`/setup --config `path/to/config/directory`
--enable-core	Enables core node support for non-Hadoop clusters.
--cluster-access (Edge node parameter)	Enables cluster access to the core node in a multi-cluster environment.
--data-forwarder host:port cluster-type cluster-id	Data forwarder, main unravel node.
--data-directory	Specify a different path to the data directory.
--external-database [param [param ...]]	Enable external database.
--external-database-ssl	Enable external database with SSL.
--log-file	Setup log file location. Default is `/tmp/unravel-setup-YYYYMMDD-HHMMSS.log`.
--extra DIR, -e DIR	Specify extra packages location.
--precheck	Run the preinstallation check

Precheck

Note

In certain situations, you can skip the precheck using the setup --skip-precheck command.

For example:

/opt/unravel/versions/<Unravel version>/setup --skip-precheck

You can also skip the checks that you know can fail. For example, if you want to skip the Check limits option and the Disk freespace option, pick the command within the parenthesis corresponding to these failed options and run the setup command as follows:

setup --filter-precheck ~check_limits,~check_freespace

Following is a sample of the Precheck run result:

/opt/unravel/versions/abcd.1011/setup --enable-core --extra /tmp/mysql
2021-04-06 16:30:19 Sending logs to: /tmp/unravel-setup-20210406-163019.log
2021-04-06 16:30:19 Running preinstallation check...
2021-04-06 16:30:21 Gathering information ................ Ok
2021-04-06 16:30:21 Running checks ............... Ok
--------------------------------------------------------------------------------
system
 Check limits        : PASSED
 Clock sync          : PASSED
 CPU requirement     : PASSED, Available cores: 8 cores
 Disk access         : PASSED, /opt/unravel/versions/abcd.1011/healthcheck/healthcheck/plugins/system is writable
 Disk freespace      : PASSED, 213 GB of free disk space is available for precheck dir.
 Kerberos tools      : PASSED
 Memory requirement  : PASSED, Available memory: 95 GB
 Network ports       : PASSED
 OS libraries        : PASSED
 OS release          : PASSED, OS release version: centos 7.9
 OS settings         : PASSED
 SELinux             : PASSED
Healthcheck report bundle: /tmp/healthcheck-20210406163019-xyz.unraveldata.com.tar.gz
2021-04-06 16:30:21 Found package: /tmp/mysql/mysql-5.7.27-linux-glibc2.12-x86_64.tar.gz
2021-04-06 16:30:21 Found package: /tmp/mysql/mysql-connector-java-5.1.48.tar.gz
2021-04-06 16:30:21 Prepare to install with: /opt/unravel/versions/abcd.1011/installer/installer/../installer/conf/presets/default.yaml
2021-04-06 16:30:25 Sending logs to: /opt/unravel/logs/setup.log
2021-04-06 16:30:25 Installing mysql server ............................................................................................................................................................................................................................................................................................................................................................................................ Ok
2021-04-06 16:30:42 Instantiating templates ......................................................................................................................................................................................................................... Ok
2021-04-06 16:30:47 Creating parcels .................................... Ok
2021-04-06 16:31:00 Installing sensors file ............................ Ok
2021-04-06 16:31:00 Installing pgsql connector ... Ok
2021-04-06 16:31:00 Installing mysql connector ... Ok
2021-04-06 16:31:02 Starting service monitor ... Ok
2021-04-06 16:31:07 Request start for elasticsearch_1 .... Ok
2021-04-06 16:31:07 Waiting for elasticsearch_1 for 120 sec ......... Ok
2021-04-06 16:31:14 Request start for zookeeper .... Ok
2021-04-06 16:31:14 Request start for kafka .... Ok
2021-04-06 16:31:14 Waiting for kafka for 120 sec ...... Ok
2021-04-06 16:31:16 Waiting for kafka to be alive for 120 sec ..... Ok
2021-04-06 16:31:20 Initializing mysql ... Ok
2021-04-06 16:31:27 Request start for mysql .... Ok
2021-04-06 16:31:27 Waiting for mysql for 120 sec ...... Ok
2021-04-06 16:31:29 Creating database schema ........ Ok
2021-04-06 16:31:31 Generating hashes .... Ok
2021-04-06 16:31:32 Loading elasticsearch templates ............ Ok
2021-04-06 16:31:35 Creating kafka topics .................... Ok
2021-04-06 16:32:10 Creating schema objects ................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................... Ok
2021-04-06 16:33:07 Request stop .................................................................... Ok
2021-04-06 16:33:26 Done

Precheck filters

Filters	Description	Expected Value
System
Check uptime	Verifies the period since the last server reboot.	>24h
Clock sync	Verifies if the clock synchronization service is running on the server.	The clock synchronization service is up and running.
CPU requirement	Verifies if the server has enough CPUs to run Unravel efficiently.	Check requirements.
Memory requirement	Verifies that the server has enough memory to run Unravel efficiently.	Check requirements.
Disk access	Verifies that the user who runs unravel has access to the configured disk locations.	Unravel users can access the configured disk locations.
Disk Freespace	Verifies if the disk locations have enough free space.	Check requirements
Kerberos tools	Verifies that the Kerberos tools are available on the server to support kerberized environments.	Kerberos tools are installed.
Network ports	Verifies that the network ports used by Unravel are available.	Check requirements
OS libraries	Verifies the required libraries if run with Unravel managed MySQL.	You must install the following packages for fulfilling the OS level requirements for MySQL: `numactl-libs` (for libnuma.so) `libaio` (for libaio.so)
OS release	Verifies that the OS distribution is supported.	Check compatibility matrix
OS settings	Verifies vm.max_map_count recommended.	Check requirements
SELinux	Verifies if the SELinux status is enabled or not and provides the mode (Permissive, Disabled, Enforcing).	Check product documentation.
Check limits	Verifies that user limits are set to values	Check requirements
Hadoop
Clients	Ensure that the following Hadoop clients are installed and configured on the server: Apache Hadoop Hadoop Distributed File System (HDFS) Apache Hadoop Yarn Apache Hive Apache Beeline You can ignore the following Precheck limitations on MapR: The Hadoop client check reports missing clients (HDFS and beeline) Any check that depends on the HDFS client (for example, HDFS Access check) reports the following message: `HADH0070: hdfs client is not available` Ensure that Unravel has access to files in the MapR file system or needs to provide access manually.	Check compatibility matrix
Distribution	Verifies that the Hadoop distribution is a supported version.	Check compatibility matrix
RM HA Enabled/Disabled	Verifies if RM is running in HA mode
Healthcheck report bundle	healthcheck report tarball. This report provides the summary and information gathered by the healthcheck with the location.

4. Add configurations

Optional: Set up Kerberos to authenticate Hadoop services.

Kerberos authentication

If you use Kerberos authentication, set the principal path and keytab, enable Kerberos authentication, and apply the changes.

<Unravel installation directory>/unravel/manager config kerberos set --keytab </path/to/keytab file> --principal <server@example.com>
<Unravel installation directory>/unravel/manager config kerberos enable
<unravel_installation_directory>/manager config apply

Truststore certificates

If you are using Truststore certificates, run the following steps from the manager tool to add certificates to the Truststore.

Download the certificates to a directory.
Provide permissions to the user, who installs unravel, to access the certificates directory.
```
chown -R username:groupname /path/to/certificates/directory
```

Upload the certificates using any of the following options:

## Option 1
<unravel_installation_directory>/unravel/manager config tls trust add </path/to/the/certificate/files

or 

## Option 2
<unravel_installation_directory>/unravel/manager config tls trust add --pem </path/to/the/certificate/files>
<unravel_installation_directory>/unravel/manager config tls trust add --jks </path/to/the/certificate/files>
<unravel_installation_directory>/unravel/manager config tls trust add --pkcs12 </path/to/the/certificate/files>

Enable the Truststore.

<unravel_installation_directory>/unravel/manager config tls trust <enable|disable>
<unravel_installation_directory>/unravel/manager config apply

Verify the connection.

<unravel_installation_directory>/unravel/manager verify connect <Cluster Manager-host> <Cluster Manager-port>

For example: /opt/unravel/manager verify connect xyz.unraveldata.com 7180
-- Running: verify connect xyz.unraveldata.com 7180
 - Resolved IP: 111.17.4.123
 - Reverse lookup: ('xyz.unraveldata.com', [], ['111.17.4.123'])
 - Connection:   OK
 - TLS:      No
-- OK

TLS for Unravel UI
If you want to enable TLS for Unravel UI, refer to Enabling Transport Layer Security (TLS) for Unravel UI.

Start all the services.

<unravel_installation_directory>/unravel/manager start

Check the status of services.
```
<unravel_installation_directory>/unravel/manager report 
```
The following service statuses are reported:
- OK: Service is up and running
- Not Monitored: Service is not running. (Has stopped or has failed to start)
- Initializing: Services are starting up.
- Does not exist: The process unexpectedly disappeared. Restarts will be attempted 10 times.
You can also get the status and information for a specific service. Run the manager report command as follows:
```
<unravel_installation_directory>/unravel/manager report <service> 
## For example: /opt/unravel/manager report auto_action
```
Optionally, you can run healthcheck, at this point, to verify that all the configurations and services are running successfully.
```
<unravel_installation_directory>/unravel/manager healthcheck
```
Healthcheck is run automatically on an hourly basis in the backend. You can set your email to receive the healthcheck reports. Refer to Healthcheck for more details.

2. Install and set up Unravel on the edge node

To install and setup Unravel on the edge node, do the following:

1. Download Unravel

2. Deploy Unravel binaries

Unravel binaries are shipped as a tar file and an RPM package. You can deploy the Unravel binaries in any directory on the server. However, the user who installs Unravel must have write permissions to the directory where Unravel binaries are deployed.

After the Unravel binaries are deployed, the directory layout for the Tar and RPM will be unravel/versions/<Directories and files>. The binaries are deployed to <Unravel_installation_directory>, and Unravel will be available in <Unravel_installation_directory/unravel.

Deploy Unravel from a tar file

The following steps to deploy Unravel from a tar file should be performed by a user who will run Unravel.

Create an Installation directory.
```
mkdir /path/to/installation/directory
## For example: mkdir /opt/unravel
```
Note
Some locations may require root access to create a directory. In such a case, after the directory is created, change the ownership to unravel user and continue with the installation procedure as the unravel user.
```
chown -R username:groupname /path/to/installation/directory
## For example:chown -R unravel:unravelgroup /opt/unravel
```
Extract and copy the Unravel tar file to the installation directory, which was created in the first step. After you extract the contents of the tar file, unravel directory is created within the installation directory.
```
tar -zxf unravel-<version>tar.gz -C /path/to/installation/directory
## For example: tar -zxf unravel-4.7.0.0.tar.gz -C /opt
## The unravel directory will be available within /opt
```

Deploy Unravel from an RPM package

A root user should perform the following steps to deploy Unravel from an RPM package. After the RPM package is deployed, the remaining installation procedures should be performed by the unravel user.

Create an installation directory.
```
mkdir /usr/local/unravel
```

Run the following command:

rpm -i unravel-<version>.rpm
## For example: rpm -i unravel-4.7.0.0.rpm 
## The unravel directory will be available in /usr/local

In case you want to provide a different location, you can do so by using the --prefix command. For example:

mkdir /opt/unravel
chown -R username:groupname /opt/unravel
rpm -i unravel-4.7.0.0.rpm --prefix /opt

## The unravel directory will be available in /opt

Grant ownership of the directory to a user who will run Unravel. This user executes all the processes involved in Unravel installation.
```
chown -R username:groupname /usr/local/unravel
## For example: chown -R unravel:unravelgroup /usr/local/unravel
```
Continue with the installation procedures as unravel user.

3. Install and set up Unravel on the edge nodes

Notice

Perform the following steps on each of the edge nodes in the cluster.

After deploying the binaries, if you are the root user, switch to Unravel user.
```
  su - <unravel user>
```

Run setup as follows:

<installation_directory>/versions/4.7.x.x/setup --cluster-access <Unravel-host>
## <Unravel-host>: specify the FQDN or the logical hostname of Unravel core node.

/opt/unravel/versions/develop.1002/setup  --cluster-access xyz.unraveldata.com
2021-04-05 12:36:08 Sending logs to: /tmp/unravel-setup-20210405-123608.log
2021-04-05 12:36:08 Running preinstallation check...
2021-04-05 12:36:11 Gathering information ................. Ok
2021-04-05 12:36:35 Running checks .................. Ok
--------------------------------------------------------------------------------
system
  Check limits                  : PASSED
  Clock sync                    : PASSED
  CPU requirement               : PASSED, Available cores: 8 cores
  Disk access                   : PASSED, /opt/unravel/versions/develop.1002/healthcheck/healthcheck/plugins/system is writable
  Disk freespace                : PASSED, 228 GB of free disk space is available for precheck dir.
  Kerberos tools                : PASSED
  Memory requirement            : PASSED, Available memory: 95 GB
  Network ports                 : PASSED
  OS libraries                  : PASSED
  OS release                    : PASSED, OS release version: centos 7.6
  OS settings                   : PASSED
  SELinux                       : PASSED
--------------------------------------------------------------------------------
hadoop
  Clients                       : PASSED
                                  - Found hadoop
                                  - Found hdfs
                                  - Found yarn
                                  - Found hive
                                  - Found beeline
  Distribution                  : PASSED, found CDP 7.1.3
  RM HA Enabled/Disabled        : PASSED, Disabled
Healthcheck report bundle: /tmp/healthcheck-20210405123609-wnode58.unraveldata.com.tar.gz
2021-04-05 12:36:37 Prepare to install with: /opt/unravel/versions/develop.1002/installer/installer/../installer/conf/presets/cluster-access.yaml
2021-04-05 12:36:42 Sending logs to: /opt/unravel/logs/setup.log
2021-04-05 12:36:42 Instantiating templates ................................ Ok
2021-04-05 12:36:53 Starting service monitor ... Ok
2021-04-05 12:36:57 Generating hashes .... Ok
2021-04-05 12:37:00 Request stop ..... Ok
2021-04-05 12:37:02 Done

Run autoconfig and apply changes.
```
<unravel_installation_directory>/manager config auto
<unravel_installation_directory>/manager config apply
```
When prompted, you can provide the following:
- Cluster manager URL: Provide the URL of the cluster manager. For example: http://abcd79.unraveldata.com:3000, https://xyz.unraveldata.com:7100
- Username
- Password

Optional: Set up Kerberos authentication and secure access to Unravel UI.

Kerberos authentication

If you are using Kerberos authentication, set the principal path and keytab, enable Kerberos authentication, and apply the changes.

<Unravel installation directory>/unravel/manager config kerberos set --keytab </path/to/keytab file> --principal <server@example.com>
<Unravel installation directory>/unravel/manager config kerberos enable
<unravel_installation_directory>/manager config apply

Truststore certificates

If you are using Truststore certificates, run the following steps from the manager tool to add certificates to the Truststore.

Download the certificates to a directory.
Provide permissions to the user, who installs unravel, to access the certificates directory.
```
chown -R username:groupname /path/to/certificates/directory
```

Upload the certificates using any of the following options:

## Option 1
<unravel_installation_directory>/unravel/manager config tls trust add </path/to/the/certificate/files

or 

## Option 2
<unravel_installation_directory>/unravel/manager config tls trust add --pem </path/to/the/certificate/files>
<unravel_installation_directory>/unravel/manager config tls trust add --jks </path/to/the/certificate/files>
<unravel_installation_directory>/unravel/manager config tls trust add --pkcs12 </path/to/the/certificate/files>

Enable the Truststore.

<unravel_installation_directory>/unravel/manager config tls trust <enable|disable>
<unravel_installation_directory>/unravel/manager config apply

Verify the connection.

<unravel_installation_directory>/unravel/manager verify connect <Cluster Manager-host> <Cluster Manager-port>

For example: /opt/unravel/manager verify connect xyz.unraveldata.com 7180
-- Running: verify connect xyz.unraveldata.com 7180
 - Resolved IP: 111.17.4.123
 - Reverse lookup: ('xyz.unraveldata.com', [], ['111.17.4.123'])
 - Connection:   OK
 - TLS:      No
-- OK

Start all the services.

<unravel_installation_directory>/unravel/manager start

Check the status of services.
```
<unravel_installation_directory>/unravel/manager report 
```
The following service statuses are reported:
- OK: Service is up and running
- Not Monitored: Service is not running. (Has stopped or has failed to start)
- Initializing: Services are starting up.
- Does not exist: The process unexpectedly disappeared. Restarts will be attempted 10 times.
You can also get the status and information for a specific service or for all services. Run the manager report command as follows:
```
<unravel_installation_directory>/unravel/manager report <service> 
## For example:/opt/unravel/manager report healthcheck
```
Enable additional instrumentation for your platform.
Set additional configurations.
Optionally, you can run healthcheck to verify that all the configurations and services are running successfully.
```
<unravel_installation_directory>/unravel/manager healthcheck
```
Healthcheck is run automatically, on an hourly basis, in the backend. You can set your email to receive the healthcheck reports.

3. Configure core node with edge node settings

Log in to the core node as an Unravel user.
Obtain the Cluster Access ID, which must be provided when you add the edge nodes. Run the following command on the edge node:
```
<unravel_installation_directory>/unravel/manager support show cluster_access_id
```

Run the following command to get the <EDGE_KEY>.

<unravel_installation_directory>/unravel/manager config edge show

-- Running: config edge show
------------ | ---------------------------------------- | ------------
    EDGE KEY | - edge-a                                 | Enabled
             |     Cluster manager:                     | Enabled
             |     Clusters:                            | 
------------ | ---------------------------------------- | ------------
-- OK

Add each of the edge nodes, involved with Unravel monitoring, to the core node.
```
<unravel_installation_directory>/unravel/manager config edge add <EDGE_KEY> <CLUSTER_ACCESS_ID>
```
Example: /opt/unravel/manager config edge add edge-a 123-123-123-123
Notice
When you are adding edge nodes for CDH or CDP platforms and If you want to configure Migration or Forecasting reports for these clusters, then you must have one of the following roles: MigrationsForecasting
- Full Administrator
- Cluster Administrator
- Operator
- Configurator
These roles are required only in the case of Cloudera manager.
Run auto-configuration.
```
<unravel_installation_directory>/unravel/manager config edge auto <EDGE_KEY> 
```
For example: /opt/unravel/manager config edge auto edge-a
When prompted, you can provide the following:
- Cluster manager URL: Provide the URL of the cluster manager. For example: https://abcd773.unraveldata.com:8443
- Username
- Password
If multiple clusters are handled by Cloudera Manager or Ambari, you are prompted to enable the cluster that you want to monitor. Run the following command to enable the cluster.
```
<unravel_installation_directory>/unravel/manager config edge cluster enable
```
Tip
When prompted, you can provide the following keys:
- <EDGE-KEY>: The edge node that you have provided in Step 3.
- <CLUSTER_KEY>: Name of the cluster that you want to enable for Unravel monitoring. You can retrieve it from the output shown for the manager config auto command.
- <SERVICE_KEY>: Name of the service.

The Hive metastore database password can be recovered automatically only for a cluster manager with an administrative account. Otherwise, it must be set manually as follows:

Run the manager config edge show command to get the <EDGE_KEY>, <HIVE_KEY>, and <CLUSTER_KEY>, which must be provided when you set the Hive metastore password.

<EDGE_KEY> is the label you provide to identify the edge node when you add the edge node in Step 3.
CLUSTER_KEY is the name of the cluster where you set the Hive configurations.
<HIVE_KEY> is the definition of the Hive service. In the output of the manager config edge show command, this is shown as the <SERVICE_KEY> for Hive.

-- Running: config edge show
------------ | ---------------------------------------- | ------------
    EDGE KEY | - edge-a                                 | Enabled
             |     Cluster manager:                     | Enabled
             |     Clusters:                            | 
 CLUSTER KEY |       - Cluster_Name                     | Enabled
             |           HBASE:                         | 
 SERVICE KEY |             - hbase                      | Enabled
             |           HDFS:                          | 
 SERVICE KEY |             - hdfs                       | Enabled
             |           HIVE:                          | 
 SERVICE KEY |             - hive                       | Enabled
 SERVICE KEY |             - hive2                      | Enabled
             |           IMPALA:                        | 
 SERVICE KEY |             - impala                     | Enabled
 SERVICE KEY |             - impala2                    | Enabled
             |           KAFKA:                         | 
 SERVICE KEY |             - kafka                      | Enabled
 SERVICE KEY |             - kafka2                     | Enabled
             |           SPARK_ON_YARN:                 | 
 SERVICE KEY |             - spark_on_yarn              | Enabled
             |           YARN:                          | 
 SERVICE KEY |             - yarn                       | Enabled
             |           ZOOKEEPER:                     | 
 SERVICE KEY |             - zookeeper                  | Enabled
------------ | ---------------------------------------- | ------------
-- OK

In a multi-cluster deployment, where edge nodes are monitoring, set the password on the core node as follows:
```
<Unravel installation directory>/unravel/manager config edge hive metastore password <EDGE_KEY> <CLUSTER-KEY> <HIVE-KEY> <password>  
```
Example: /opt/unravel/manager config edge hive metastore password local-node cluster1 hive password
In case, the core node is monitoring the Hadoop cluster directly, run the following command from the core node.
```
<Unravel installation directory>/unravel/manager config hive metastore password <CLUSTER_KEY> <HIVE_KEY> <password> 
```
Example: /opt/unravel/manager config hive metastore password cluster1 hive P@SSw0rd
If you do not provide a password on the command line, the manager prompts for it. In this case, your password is not displayed on the screen.

Configure the FSImage. Refer to configuring FSImage.
Apply changes.
```
<unravel_installation_directory>/manager config apply
```
You may be prompted to stop Unravel. Run manager stop to stop Unravel.

Start all the services.

<unravel_installation_directory>/unravel/manager start

Check the status of services.

<unravel_installation_directory>/unravel/manager report

Check the list of services, which are enabled for the edge node after running the auto-configurations.

<unravel_installation_directory>/unravel/manager config edge show

-- Running: config edge show
------------ | ---------------------------------------- | ------------
    EDGE KEY | - edge-a                                 | Enabled
             |     Cluster manager:                     | Enabled
             |     Clusters:                            | 
 CLUSTER KEY |       - Cluster_Name                     | Enabled
             |           HBASE:                         | 
 SERVICE KEY |             - hbase                      | Enabled
             |           HDFS:                          | 
 SERVICE KEY |             - hdfs                       | Enabled
             |           HIVE:                          | 
 SERVICE KEY |             - hive                       | Enabled
 SERVICE KEY |             - hive2                      | Enabled
             |           IMPALA:                        | 
 SERVICE KEY |             - impala                     | Enabled
 SERVICE KEY |             - impala2                    | Enabled
             |           KAFKA:                         | 
 SERVICE KEY |             - kafka                      | Enabled
 SERVICE KEY |             - kafka2                     | Enabled
             |           SPARK_ON_YARN:                 | 
 SERVICE KEY |             - spark_on_yarn              | Enabled
             |           YARN:                          | 
 SERVICE KEY |             - yarn                       | Enabled
             |           ZOOKEEPER:                     | 
 SERVICE KEY |             - zookeeper                  | Enabled
------------ | ---------------------------------------- | ------------
-- OK

You can disable any of the services. For example, you want to disable the HBase services:

<unravel_installation_directory>/unravel/manager config edge hbase disable <EDGE_KEY> <CLUSTER KEY> <SERVICE NAME>

Example: /opt/unravel/manager config edge hbase disable edge-a Cluster_Name hbase

Run healthcheck to verify that all the configurations and services are running successfully.
```
<unravel_installation_directory>/manager healthcheck
```
Healthcheck is run automatically, on an hourly basis, in the backend. You can set your email to receive the healthcheck reports.

4. Install Unravel with Interactive Precheck

The Interactive Precheck utility validates the required configurations before installing Unravel. When you run the Interactive Precheck utility, various checks are prompted for gathering configuration information. The responses you provide for these checks generate a bootstrap configuration file. This file contains the configuration information and is then used to install Unravel. Using the Interactive Precheck utility

Do the following to install and configure Unravel with Interactive Precheck.

After you download and deploy the Unravel, run the precheck.sh script from unravel/versions/X.Y.Z/healthcheck/.
For example:
/opt/unravel/versions/X.Y.Z/healthcheck/precheck.sh

Enter the necessary details when you are prompted for the following configuration information:

This section covers general information about your Unravel install. You are prompted for the following:

Data platform you want to monitor.
For Hadoop: type of Unravel node you want to configure.
For edge nodes: core node location and test connectivity.

You must answer the following prompts:

Core node - Multi cluster installation

-- General information
   Which data platform are you installing for?
   1- Hadoop
   2- EMR
   3- HDI
   4- Databricks
   5- Dataproc
   6- BigQuery

   Select one of the above [Hadoop]: 1
   Selected: Hadoop

   Which type of unravel node is this?
   1- Single node installation
   2- Core node - Multi cluster installation
   3- Edge node - Multi cluster installation

   Select one of the above [Single node installation]: 2
   Selected: Core node - Multi cluster installation
   User that will run unravel [unravel]: unravel

Edge-node - Multi cluster installation

If you select Edge-node multi-cluster installation, you are prompted to provide details of the following:

Log Receiver hostname
Log Receiver port
Is the Log Receiver using TLS

Note

If you use the interactive precheck to configure the edge node, you still need to configure the edge node on the core node. For steps, see the "Configure the core node with edge node settings" section in Installing Unravel in a multi-cluster environment.

-- General information
Which type of Unravel node is this?
   1- Single node installation
   2- Core node - Multi cluster installation
   3- Edge node - Multi cluster installation

## You can choose number 3 corresponding to Edge node - Multi cluster installation. For example:
    Select one of the above [Single node installation]: 3
    Selected: Edge node - Multi cluster installation
    User that will run unravel [unravel]:
    -- Core node - Log receiver information
       The unravel edge node needs to know the location of the unravel core node to send information to the Log Receiver service.

       Log Receiver hostname [None]: myhostname111
    
       Log Receiver Port (integer) [4043]: 4043

       Is the Log Receiver using TLS (y/n) [No]: n
       Attempting to connect to: http://congo119:4043/status
       Testing connection...

This check allows you to configure database-related information and an external database for Unravel.

-- Database configuration
Unravel comes with PostgreSQL and can configure it automatically as its database.
Optionally you can:- Configure it to use an external database (MySQL, MariaDB or PostgreSQL).
- Have it configure and manage MySQL or MariaDB for you.


Configure an external database? (y/n) [No]: n

Do you wish to use an Unravel managed MySQL or MariaDB (no = Internal Postgresql)? (y/n) [No]: n

Will Unravel connect to a MySQL or MariaDB database (ex: hive metastore) ? (y/n) [No]: n

If you answer No, you are further prompted to select an Unravel-managed database for the installation.

If you answer Yes, you are further prompted for the type of external database that you want to configure.

-- Database configuration
   
   Configure an external database? (y/n) [No]: y

   Type
   1- PostgresQL
   2- MySQL
   3- MariaDB

If you choose a specific type of external database, you are prompted for the following database information and test connectivity to that database. Refer to Integrating Database for more details. For example: Integrate database (On-prem)

-- Database configuration
  
   Configure an external database? (y/n) [No]: y

   Type
   1- PostgresQL
   2- MySQL
   3- MariaDB

   Select one of the above []: 1
   Selected: PostgresQL

   Database hostname [None]: unrave

   Database port (integer) [None]: 666

   Database schema [None]: unravel

   Does the database use TLS (y/n) [No]: n

   Database username [None]: admin

   Database password [None] (no echo):

   Do you wish to test connecting to the external database? (y/n) [Yes]:-- Database configuration
 Configure an external database? (y/n) [No]:

If you choose MySQL or MariaDB database you are further prompted for extra packages. If you answer Yes, the Extra packages section searches for the required JDBC drivers.
```
-- Database configuration
 Will Unravel connect to a MySQL or MariaDB database (ex: hive metastore) ? (y/n) [No]: 
```

The Extra packages check shows if you use Unravel-managed MySQL/MariaDB or need JDBC drivers. Else, this check is automatically skipped.

-- Extra package location

*** JDBC drivers are required for Unravel managed MySQL or MariaDB.
*** Database software package is required for Unravel managed MySQL or MariaDB.

External package location [None]: /<my-extra-packages>
##This is the path to the directory where the required packages are located.

If the required packages are located, then the following message is shown:

The following packages will be installed:
   Database server: /my-extra-packages/mysql-5.7.27-linux-glibc2.12-x86_64.tar.gz
   JDBC driver:
     - /my-extra-packages/mysql-connector-java-5.1.48.tar.gz

 External package: Ok

If the required packages are not found, then the following error message is showing:

External package: ERROR
 - ERROR: Couldn't find jdbc drivers in /my-extra-packages
 - ERROR: Looked for:
   mysql-connector-java-*.tar.gz
   mysql-connector-java-*.jar
   mariadb-java-client-*.jar
 - ERROR: Couldn't find database server package in /my-extra-packages
 - ERROR: Looked for:
   mysql-*-linux-glibc2.12-x86_64.tar.gz
   mariadb-*-linux-x86_64.tar.gz

This check allows you to configure and test HTTPS for the unravel UI. This check prompts you for the certificate, key, password, and hostname details used to access Unravel.

Use HTTPS to access unravel? (y/n) [Yes]:  
##If you answer “Yes”, you are prompted for the path to the certificate and key. Unravel uses this information to configure TLS during installation. 
##If you answer “No”, you are shown a warning message for confirmation.

The information provided is verified for the following:

If the Key and Certificate match
If the certificate is valid
If the certificate applies for the provided hostname

This check allows you to set the Unravel UI port and verify the connectivity.

-- Unravel default port
  
  Port number (integer) [3000]: 

  Do you want to test if the port is accessible? (y/n) [Yes]: 
    
  This will open port 3000 and listen for connection for 120 seconds.
  Use your browser to test if the Unravel UI will be accessible on that port.
    
  We have detected the following hostnames:
  - some.host.example
    
 Browse to: http://some.host.example:3000
    
 ATTENTION: This address is an example. You should test with the URL that will be used to access Unravel.

A connection on port 3000 is tried and established. If the connection is successful, Unravel Port Test: OK is shown on the browser, and Unravel port: Reached is shown on the server.

This check allows you to set a custom data directory and verify the access if the directories exist. You will always find the software location where you deploy the Unravel binaries. In this check, only the space and access are tested. That data location that you have configured will be used.

-- Unravel directories

  Software [/opt/unravel]: 

  Data [/opt/unravel/data]: 

  Directories: ERROR
  - OK: 33 GB of free disk space for software.
  - ERROR: SYSH0026: Space for data 33 GB is low, recommended minimum is 100 GB.

This check allows you to configure and test email. You are prompted for host and credentials, and the following items are tested:

Connectivity
Authentication, only if provided.
Optional: Send test mail.

Following is a sample:

-- Mail server (SMTP) configuration

Unravel can send notification and alert emails.
   This will allow you to configure and test connection to a SMTP server.
   Optionally, it can also send a test email.

   You will have to provide:
   - Protocol, hostname and port
   - Credentials if required



   Configure a SMTP server? (y/n) [No]: y

   SMTP hostname [None]: smtphostname.gmail.com

   SMTP port (usually 25 for clear text, 465 for SSL, 587 for STARTLS) (integer) [None]: 587

   Security protocol
   1- None
   2- SSL
   3- StartTLS

   Select one of the above [None]: 3
   Selected: StartTLS

   Authentication required? (y/n) [Yes]: y

   Username [None]: daemon@unraveldata.com

   Password [None] (no echo):

   From [None]: daemon@unraveldata.com

   To [None]: user@unraveldata.com

   Send test email (y/n) [No]: y

Note

For more information, refer to Using the Interactive Precheck utility.

The responses you provided for the configuration information are used to generate a configuration file. You can use this configuration file when you run the setup command to install and configure Unravel.

After you have completed the responses, you are prompted to confirm if you want to generate the bootstrap configuration file. Press ENTER if you want to generate the bootstrap configuration file.
```
-- Unravel bootstrap configuration

   Generate a unravel bootstrap configuration file? (y/n) [Yes]:
```
The bootstrap configuration file is generated and located at $HOME/unravel-interactive-precheck/unravel-bootstrap.yaml.

Install Unravel with the bootstrap configuration file.

<unravel_installation_directory>/unravel/versions/<Unravel version>/setup --bootstrap $HOME/unravel-interactive-precheck/unravel-bootstrap.yaml

Apply the changes.

<Unravel installation directory>/unravel/manager config apply

Start all the services.

<unravel_installation_directory>/unravel/manager start

Check the status of services.
```
<unravel_installation_directory>/unravel/manager report 
```
The following service statuses are reported:
- OK: Service is up and running.
- Not Monitored: Service is not running. (Has stopped or has failed to start)
- Initializing: Services are starting up.
- Does not exist: The process unexpectedly disappeared. A restart will be attempted ten times.
You can also get the status and information for a specific service. Run the manager report command as follows:
```
<unravel_installation_directory>/unravel/manager report <service> 
```
For example: /opt/unravel/manager report auto_action

4. Enable additional instrumentation

This section provides information about enabling additional instrumentation for the following platforms:

Enable additional instrumentation for CDH

This topic explains how to enable additional instrumentation on your gateway/edge/client nodes that are used to submit jobs to your big data platform. Additional instrumentation can include:

Sensor jars packaged in a parcel on Unravel server.
Hive queries in Hadoop that are pushed to Unravel Server by the Hive Hook sensor, a JAR file.
Spark job performance metrics that are pushed to Unravel Server by the Spark sensor, a JAR file.
Copying Hive hook and Btrace jars to HDFS shared library path.
Impala queries that are pulled from the Cloudera Manager or from the Impala daemon

1. Download, distribute, and activate Unravel sensor

Sensor JARs are packaged in a parcel on Unravel server. Run the following steps from the Cloudera Manager to download, distribute, and activate this parcel.

Note

Ensure that Unravel is up and running before you perform the following steps.

In Cloudera Manager, click . The Parcel page is displayed.
On the Parcel page, click Configuration or Parcel Repositories & Network settings. The Parcel Configurations dialog box is displayed.
Go to the Remote Parcel Repository URLs section, click + and enter the Unravel host along with the exact directory name for your CDH version.
```
http://<unravel-host>:<port>/parcels/<cdh <major.minor version>/
// For example: http://xyznode46.unraveldata.com:3000/parcels/cdh6.3/
```
- <unravel-host> is the hostname or LAN IP address of Unravel. In a multi-cluster scenario, this would be the host where the log_receiver daemon is running.
- <port> is the Unravel UI port. The default is 3000. In case you have customized the default port, you can add that port number.
- <cdh-version> is your version of CDH. For example, cdh5.16 or cdh6.3.
  You can go to http://<unravel-host>:<port>/parcels/ directory (For example: http://xyznode46.unraveldata.com:3000/parcels) and copy the exact directory name of your CDH version (CDH<major.minor>).
Note
If you're using Active Directory Kerberos, unravel-host must be a fully qualified domain name or IP address.
Tip
If you are running more than one version of CDH (for example, you have multiple clusters), you can add more than one parcel entry for unravel-host.
Click Save Changes.
In the Cloudera Manager, click Check for new parcels, find the UNRAVEL_SENSOR parcel that you want to distribute, and click the corresponding Download button.
In the Cloudera Manager, from Location > Parcel Name, find the UNRAVEL_SENSOR parcel that you want to distribute and click the corresponding Download button.
After the parcel is downloaded, click the corresponding Distribute button. This will distribute the parcel to all the hosts.
After the parcel is distributed, click the corresponding Activate button. The status column will now display Distributed, Activated.
Note
If you have an old sensor parcel from Unravel, you must deactivate it now.

2. Put the Hive Hook JAR in AUX_CLASSPATH

In Cloudera Manager, select the target cluster from the drop-down, click Hive >Configuration, and search for hive-env.
In Gateway Client Environment Advanced Configuration Snippet (Safety Valve) for hive-env.sh, click View as text and enter the following exactly as shown, with no substitutions:
```
AUX_CLASSPATH=${AUX_CLASSPATH}:/opt/cloudera/parcels/UNRAVEL_SENSOR/lib/java/unravel_hive_hook.jar
```
If Sentry is enabled, grant privileges on the JAR files to the Sentry roles that run Hive queries.
Sentry commands may also be needed to enable access to the Hive Hook JAR file. Grant privileges on the JAR files to the roles that run hive queries. Log in to Beeline as user hive and use the Hive SQL GRANT statement to do so.
For example (substitute role as appropriate),
```
GRANT ALL ON URI 'file:///opt/cloudera/parcels/UNRAVEL_SENSOR/lib/java/unravel_hive_hook.jar' TO ROLE <role>
```

3. Oozie: Copy Hive Hook and BTrace JARs to HDFS shared library path

In Cloudera Manager, select the target cluster from the drop-down, click Oozie >Configuration and check the path shown in ShareLib Root Directory.
From a terminal application on the Unravel node (edge node in case of multi-cluster.), pick up the ShareLib Root Directory directory path with the latest timestamp.
```
hdfs dfs -ls <path to ShareLib directory>
// For example: hdfs dfs -ls /user/oozie/share/lib/
```
Important
The jars must be copied to the lib directory (with the latest timestamp), which is shown in ShareLib Root Directory.

Copy the Hive Hook JAR /opt/cloudera/parcels/UNRAVEL_SENSOR/lib/java/unravel_hive_hook.jar and the Btrace JAR, /opt/cloudera/parcels/UNRAVEL_SENSOR/lib/java/btrace-agent.jar to the specified path in ShareLib Root Directory.

For example, if the path specified in ShareLib Root Directory. is /user/oozie, run the following commands to copy the JAR files.

hdfs dfs -copyFromLocal /opt/cloudera/parcels/UNRAVEL_SENSOR/lib/java/unravel_hive_hook.jar /user/oozie/share/lib/<latest timestamp lib directory>/

//For example: 
hdfs dfs -copyFromLocal /opt/cloudera/parcels/UNRAVEL_SENSOR/lib/java/unravel_hive_hook.jar /user/oozie/share/lib/lib_20210326035616/

hdfs dfs -copyFromLocal /opt/cloudera/parcels/UNRAVEL_SENSOR/lib/java/btrace-agent.jar /user/oozie/share/lib/<latest timestamp lib directory>/

//For example: 
hdfs dfs -copyFromLocal /opt/cloudera/parcels/UNRAVEL_SENSOR/lib/java/btrace-agent.jar /user/oozie/share/lib/lib_20210326035616/

From a terminal application, copy the Hive Hook JAR /opt/cloudera/parcels/UNRAVEL_SENSOR/lib/java/unravel_hive_hook.jar and the Btrace JAR, /opt/cloudera/parcels/UNRAVEL_SENSOR/lib/java/btrace-agent.jar to the specified path in ShareLib Root Directory.
Caution
Jobs controlled by Oozie 2.3+ fail if you do not copy the Hive Hook and BTrace JARs to the HDFS shared library path.

4. Set Hive Hook configuration

On the Cloudera Manager, click Hive service and then click the Configuration tab.
Search for hive-site.xml, which will lead to the Hive Client Advanced Configuration Snippet (Safety Valve) for hive-site.xml section.

Specify the hive hook configurations. You have the option to either use the XML text field or Editor to specify the hive hook configuration.

Option 1: XML text field

Click View as XML to open the XML text field and copy-paste the following.

<property>
  <name>com.unraveldata.host</name>
  <value><UNRAVEL HOST NAME></value> 
  <description>Unravel hive-hook processing host</description>
</property>
<property>
  <name>com.unraveldata.hive.hook.tcp</name>
  <value>true</value>
</property>
<property>
  <name>com.unraveldata.hive.hdfs.dir</name>
  <value>/user/unravel/HOOK_RESULT_DIR</value>
  <description>destination for hive-hook, Unravel log processing</description>
</property>
<property>
  <name>hive.exec.driver.run.hooks</name>
<value>com.unraveldata.dataflow.hive.hook.UnravelHiveHook</value>
  <description>for Unravel, from unraveldata.com</description>
</property>
<property>
  <name>hive.exec.pre.hooks</name>  <value>com.unraveldata.dataflow.hive.hook.UnravelHiveHook</value>
  <description>for Unravel, from unraveldata.com</description>
</property>
<property>
  <name>hive.exec.post.hooks</name>  <value>com.unraveldata.dataflow.hive.hook.UnravelHiveHook</value>
  <description>for Unravel, from unraveldata.com</description>
</property>
<property>
  <name>hive.exec.failure.hooks</name>  <value>com.unraveldata.dataflow.hive.hook.UnravelHiveHook</value>
  <description>for Unravel, from unraveldata.com</description>
</property>

Ensure to replace UNRAVEL HOST NAME with the Unravel hostname. Replace The Unravel Host Name with the hostname of the edge node in case of a multi-cluster deployment.

Option 2: Editor:

Click + and enter the property, value, and description (optional).

Property	Value	Description
com.unraveldata.host	Replace with Unravel hostname or with the hostname of the edge node in case of a multi-cluster deployment.	Unravel hive-hook processing host
com.unraveldata.hive.hook.tcp	true	Hive hook tcp protocol.
com.unraveldata.hive.hdfs.dir	/user/unravel/HOOK_RESULT_DIR	Destination directory for hive-hook, Unravel log processing.
hive.exec.driver.run.hooks	com.unraveldata.dataflow.hive.hook.UnravelHiveHook	Hive hook
hive.exec.pre.hooks	com.unraveldata.dataflow.hive.hook.UnravelHiveHook	Hive hook
hive.exec.post.hooks	com.unraveldata.dataflow.hive.hook.UnravelHiveHook	Hive hook
hive.exec.failure.hooks	com.unraveldata.dataflow.hive.hook.UnravelHiveHook	Hive hook

Note

If you configure CDH with Cloudera Navigator's safety valve setting, you must edit the following keys and append the value com.unraveldata.dataflow.hive.hook.UnravelHiveHook without any space.

hive.exec.post.hooks
hive.exec.pre.hooks
hive.exec.failure.hooks

For example:

<property>  
<name>hive.exec.post.hooks</name>  
<value>com.unraveldata.dataflow.hive.hook.UnravelHiveHook,com.cloudera.navigator.audit.hive.HiveExecHookContext,org.apache.hadoop.hive.ql.hooks.LineageLogger</value>  
<description>for Unravel, from unraveldata.com</description>
</property>

Similarly, ensure to add the same hive hook configurations in HiveServer2 Advanced Configuration Snippet for hive-site.xml.
Optionally, add a comment in Reason for change and then click Save Changes.
From the Cloudera Manager page, click the Stale configurations icon () to deploy the configuration and restart the Hive services.
Check Unravel UI to see if all Hive queries are running.
- If queries are running fine and appearing in Unravel UI, then you have successfully added the hive hooks configurations.
- If queries are failing with a class not found error or permission problems:
  - Undo the hive-site.xml changes in Cloudera Manager.
  - Deploy the hive client configuration.
  - Restart the Hive service.
  - Follow the steps in Troubleshooting.

5. Configure Spark properties in spark-defaults.conf

In Cloudera Manager, select the target cluster and then click Spark.
Select Configuration.
Search for spark-defaults.
In Spark Client Advanced Configuration Snippet (Safety Valve) for spark-conf/spark-defaults.conf, enter the following text, replacing placeholders with your particular values:
```
spark.unravel.server.hostport=unravel-host:<port>
spark.driver.extraJavaOptions=-javaagent:/opt/cloudera/parcels/UNRAVEL_SENSOR/lib/java/btrace-agent.jar=config=driver,libs=spark-version
spark.executor.extraJavaOptions=-javaagent:/opt/cloudera/parcels/UNRAVEL_SENSOR/lib/java/btrace-agent.jar=config=executor,libs=spark-version
spark.eventLog.enabled=true
```
- <unravel-host>: Specify the Unravel hostname. In the case of multi-cluster deployment use the FQDN or logical hostname of the edge node for unravel-host.
- <Port>: 4043 is the default port. If you have customized the ports, you can specify that port number here.
- <spark-version>: For spark-version, use a Spark version that is compatible with this version of Unravel. You can check the Spark version with the spark-submit --version command and specify the same version.
  For spark-version, use a Spark version that is compatible with this version of Unravel. For example,
  spark-1.6 for Spark 1.6.x
  spark-2.0 for Spark 2.0.x
  spark-2.1 for Spark 2.1.x
  spark-2.2 for Spark 2.2.x
  spark-2.3 for Spark 2.3.x
  spark-2.4 for Spark 2.4.x
  spark-3.0 for Spark 3.0.x
Click Save changes.
Click () or use the Actions pull-down menu to deploy the client configuration. Your spark-shell will ensure new JVM containers are created with the necessary extraJavaOptions for the Spark drivers and executors.
Enable Spark streaming.
Check Unravel UI to see if all Spark jobs are running.
- If jobs are running and appearing in Unravel UI, you have deployed the Spark jar successfully.
- If queries are failing with a class not found error or permission problems:
  - Undo the spark-defaults.conf changes in Cloudera Manager.
  - Deploy the client configuration.
  - Investigate and fix the issue.
  - Follow the steps in Troubleshooting.

Note

If you have YARN-client mode applications, the default Spark configuration is not sufficient, because the driver JVM starts before the configuration set through the SparkConf is applied. For more information, see Apache Spark Configuration. In this case, configure the Unravel Sensor for Spark to profile specific Spark applications only (in other words, per-application profiling rather than cluster-wide profiling).

7. Retrieve Impala data from Cloudera Manager

Impala properties are automatically configured. Refer to Impala properties for the list of properties that are automatically configured. If it is not set already by auto-configuration, set the properties as follows:

<Unravel installation directory>/manager config properties set <PROPERTY> <VALUE>

For example,

<Unravel installation directory>/manager config properties set com.unraveldata.data.source cm 
<Unravel installation directory>/manager config properties set com.unraveldata.cloudera.manager.url http://my-cm-url  
<Unravel installation directory>/manager config properties set com.unraveldata.cloudera.manager.username mycmname 
<Unravel installation directory>/manager config properties set com.unraveldata.cloudera.manager.password mycmpassword

For multi-cluster, use the following format and set these on the edge node:

<Unravel installation directory>/manager config properties set com.unraveldata.data.source cm 
<Unravel installation directory>/manager config properties set com.unraveldata.cloudera.manager.url http://my-cm-url  
<Unravel installation directory>/manager config properties set com.unraveldata.cloudera.manager.username mycmname 
<Unravel installation directory>/manager config properties set com.unraveldata.cloudera.manager.password mycmpassword

Note

By default, the Impala sensor task is enabled. To disable it, you can edit the following property as follows:

<Unravel installation directory>/manager config properties set com.unraveldata.sensor.tasks.disabled iw

Optionally, you can change the Impala lookback window. By default, when Unravel Server starts, it retrieves the last 5 minutes of Impala queries. To change this, do the following:

Change the value for com.unraveldata.cloudera.manager.impala.look.back.minutes property.

<Unravel installation directory>/manager config properties set com.unraveldata.cloudera.manager.impala.look.back.minutes -<period>
For example: <Unravel installation directory>/manager config properties set com.unraveldata.cloudera.manager.impala.look.back.minutes -7

Note

Include a minus sign in front of the new value.

8. Add more configurations

Set Additional Unravel configurationsConfigurations

Creating an alternate principal

For quick initial installation, you can use the hdfs principal and its keytab. However, for production use, you may want to create an alternate principal with restricted access to specific areas and use its corresponding keytab. This topic explains how to do this.

You can name the alternate principal whatever you prefer; these steps, name it unravel. Its name doesn't need to be the same as the local username.

The steps apply only to CDH and have been tested using Cloudera Manager with the recommended Sentry configuration.

Check the HDFS default umask.
For access via ACL, the group part of the HDFS default umask needs to have read and execute access. This allows Unravel to see subdirectories and read files. The default umask setting on HDFS for both CDH and HDP is 022. The middle digit controls the group mask, and ACLs are masked using this default group mode.
You can check the HDFS umask setting from either Cloudera Manager or in hdfs-site.xml:
- In Cloudera Manager, check the value of dfs.umaskmode and make sure the middle digit is 2 or 0.
- In hdfs-site.xml file search for fs.permissions.umask-mode and make sure the middle digit is 2 or 0.
Enable ACL inheritance.
In Cloudera Manager's HDFS configuration, search for namenode advanced configuration snippet, and set its dfs.namenode.posix.acl.inheritance.enabled property to true in hdfs-site.xml. This is a workaround for an issue where HDFS was not compliant with the Posix standard for ACL inheritance. For details, see Apache JIRA HDFS-6962. Cloudera backported the fix for this issue into CDH5.8.4, CDH5.9.1, and later, setting dfs.namenode.posix.acl.inheritance.enabled to false in Hadoop 2.x and true in Hadoop 3.x.
Restart the cluster to effect the change of dfs.namenode.posix.acl.inheritance.enabled to true.

Change the ACLs of the target HDFS directories.

Run the following commands as global hdfs to change the ACLs of the following HDFS directories. Run these in the order presented.

Set the ACL for future directories.

Note

Be sure to set the permissions at the /user/history level. Files are first written to an intermediate_done folder under /user/history and then moved to /user/history/done.

hadoop fs -setfacl -R -m default:user:unravel:r-x /user/spark/applicationHistory
hadoop fs -setfacl -R -m default:user:unravel:r-x /user/history
hadoop fs -setfacl -R -m default:user:unravel:r-x /tmp/logs
hadoop fs -setfacl -R -m default:user:unravel:r-x /user/hive/warehouse

If you have Spark2 installed, set the ACL of the Spark2 application history folder:

hadoop fs -setfacl -R -m default:user:unravel:r-x /user/spark/spark2ApplicationHistory

Set ACL for existing directories.

hadoop fs -setfacl -R -m user:unravel:r-x /user/spark/applicationHistory
hadoop fs -setfacl -R -m user:unravel:r-x /user/history
hadoop fs -setfacl -R -m user:unravel:r-x /tmp/logs
hadoop fs -setfacl -R -m user:unravel:r-x /user/hive/warehouse

If you have Spark2 installed, set the ACL of the Spark2 application history folder:

hadoop fs -setfacl -R -m user:unravel:r-x /user/spark/spark2ApplicationHistory

Verify the ACL of the target HDFS directories.

hdfs dfs -getfacl /user/spark/applicationHistory
hdfs dfs -getfacl /user/spark/spark2ApplicationHistory
hdfs dfs -getfacl /user/history
hdfs dfs -getfacl /tmp/logs
hdfs dfs -getfacl /user/hive/warehouse

On the Unravel Server, verify HDFS permission on folders as the target user (unravel, hdfs, mapr, or custom) with a valid kerberos ticket corresponding to the keytab principal.

sudo -u unravel kdestroy
sudo -u unravel kinit -kt keytab-file principal
sudo -u unravel hadoop fs -ls /user/history
sudo -u unravel hadoop fs -ls /tmp/logs
sudo -u unravel hadoop fs -ls /user/hive/warehouse

Find and verify the keytab:
```
klist -kt keytab-file
```
Warning
If you're using KMS and HDFS encryption and the hdfs principal, you might need to adjust kms-acls.xml permissions in Cloudera Manager for DECRYPT_EEK if access is denied. In particular, the "done" directory might not allow decryption of logs by the hdfs principal.
If you're using "JNI" based groups for HDFS (a setting in Cloudera Manager), you need to add this line to /usr/local/unravel/etc/unravel.ext.sh:
```
export LD_LIBRARY_PATH=/opt/cloudera/parcels/CDH/lib/hadoop/lib/native
```

If Kerberos is enabled, set the new values for keytab-file and principal:

<Unravel installation directory>/manager config kerberos set --keytab /etc/security/keytabs/unravel.service.keytab --principal unravel/server@example.com

<Unravel installation directory>/manager config kerberos enable

Important

Whenever you change Kerberos tokens or principal, restart all services, <installation directory>/manager restart.

References

For more information on creating permanent functions, see Cloudera documentation.

Enable additional instrumentation for CDP

This topic explains how to enable additional instrumentation on your gateway/edge/client nodes that are used to submit jobs to your big data platform. Additional instrumentation can include:

Hive queries in Hadoop that are pushed to Unravel Server by the Hive Hook sensor, a JAR file.
Spark job performance metrics that are pushed to Unravel Server by the Spark sensor, a JAR file.
Impala queries that are pulled from Cloudera Manager or the Impala daemon.
Sensor JARs packaged in a parcel on Unravel Server.
Tez Dag information is pushed to Unravel server by the Tez sensor, a JAR file.

1. Download, distribute, and activate Unravel sensor

Sensor JARs are packaged in a parcel on Unravel server. Run the following steps from the Cloudera Manager to download, distribute, and activate this parcel.

Note

Ensure that Unravel is up and running before you perform the following steps.

In Cloudera Manager, click . The Parcel page is displayed.
On the Parcel page, click Configuration or Parcel Repositories & Network settings. The Parcel Configurations dialog box is displayed.
Go to the Remote Parcel Repository URLs section, click + and enter the Unravel host along with the exact directory name for your CDH version.
http://<unravel-host>:<port>/parcels/<cdh <major:minor version>/
For example: http://xyz.unraveldata.com:3000/parcels/cdh 7.1
- <unravel-host> is the hostname or LAN IP address of Unravel. In a multi-cluster scenario, this would be the host where the log_receiver daemon is running.
- <port> is the Unravel UI port. The default is 3000. In case you have customized the default port, you can add that port number.
- <cdh-version> is your version of CDP. For example, cdh7.1.
  You can go to http://<unravel-host>:<port>/parcels/ directory (For example: http://xyznode46.unraveldata.com:3000/parcels) and copy the exact directory name of your CDH version (CDH<major.minor>).
Note
If you're using Active Directory Kerberos, unravel-host must be a fully qualified domain name or IP address.
Tip
If you're running more than one version of CDP (for example, you have multiple clusters), you can add more than one parcel entry for unravel-host.
Click Save Changes.
In the Cloudera Manager, click Check for new parcels find the UNRAVEL_SENSOR parcel that you want to distribute, and click the corresponding Download button.
After the parcel is downloaded, click the corresponding Distribute button. This will distribute the parcel to all the hosts.
After the parcel is distributed, click the corresponding Activate button. The status column will now display Distributed, Activated.
Note
If you have an old sensor parcel from Unravel, you must deactivate it now.

2. Put the Hive Hook JAR in AUX_CLASSPATH

In Cloudera Manager, select the target cluster from the drop-down, click Hive on Tez >Configuration, and search for Service Environment.
In Hive on Tez Service Environment Advanced Configuration Snippet (Safety Valve) enter the following exactly as shown, with no substitutions:
```
AUX_CLASSPATH=${AUX_CLASSPATH}:/opt/cloudera/parcels/UNRAVEL_SENSOR/lib/java/unravel_hive_hook.jar
```
Ensure that the Unravel hive hook JAR has the read/execute access for the user running the hive server.

3. Oozie: Copy Hive Hook and BTrace JARs to HDFS shared library path

In Cloudera Manager, select the target cluster from the drop-down, click Oozie >Configuration and check the path shown in ShareLib Root Directory.
From a terminal application on the Unravel node (edge node in case of multi-cluster.), pick up the ShareLib Root Directory directory path with the latest timestamp.
```
hdfs dfs -ls <path to ShareLib directory>
// For example: hdfs dfs -ls /user/oozie/share/lib/
```
Important
The jars must be copied to the lib directory (with the latest timestamp), which is shown in ShareLib Root Directory.

For example, if the path specified in ShareLib Root Directory. is /user/oozie, run the following commands to copy the JAR files.

hdfs dfs -copyFromLocal /opt/cloudera/parcels/UNRAVEL_SENSOR/lib/java/unravel_hive_hook.jar /user/oozie/share/lib/<latest timestamp lib directory>/

//For example: 
hdfs dfs -copyFromLocal /opt/cloudera/parcels/UNRAVEL_SENSOR/lib/java/unravel_hive_hook.jar /user/oozie/share/lib/lib_20210326035616/

hdfs dfs -copyFromLocal /opt/cloudera/parcels/UNRAVEL_SENSOR/lib/java/btrace-agent.jar /user/oozie/share/lib/<latest timestamp lib directory>/

//For example: 
hdfs dfs -copyFromLocal /opt/cloudera/parcels/UNRAVEL_SENSOR/lib/java/btrace-agent.jar /user/oozie/share/lib/lib_20210326035616/

From a terminal application, copy the Hive Hook JAR /opt/cloudera/parcels/UNRAVEL_SENSOR/lib/java/unravel_hive_hook.jar and the Btrace JAR, /opt/cloudera/parcels/UNRAVEL_SENSOR/lib/java/btrace-agent.jar to the specified path in ShareLib Root Directory.
Caution
Jobs controlled by Oozie 2.3+ fail if you do not copy the Hive Hook and BTrace JARs to the HDFS shared library path.

4. Deploy the BTrace JAR for Tez service

On the Cloudera Manager, go to Tez > configuration and search the following properties:
- tez.am.launch.cmd-opts
- tez.task.launch.cmd-opts
Append the following to tez.am.launch.cmd-opts and tez.task.launch.cmd-opts properties:
```
-javaagent:/opt/cloudera/parcels/UNRAVEL_SENSOR/lib/java/btrace-agent.jar=libs=mr,config=tez -Dunravel.server.hostport=<unravel_host>:4043
```
Note
For unravel-host, specify the FQDN or the logical hostname of Unravel or of the edge node in the case of multi-cluster.
Click the Stale configurations icon () to deploy the client configuration and restart the Tez services.

5. Set Hive Hook configuration

On the Cloudera Manager, click Hive on Tez > Configuration tab.
Search for hive-site.xml, which will lead to the Hive Client Advanced Configuration Snippet (Safety Valve) for hive-site.xml section.

Specify the hive hook configurations. You can either use the XML text field or Editor to specify the hive hook configuration.

Option 1: XML text field

Click View as XML to open the XML text field and copy-paste the following.

<property>
  <name>com.unraveldata.host</name>
  <value><UNRAVEL HOST NAME></value> 
  <description>Unravel hive-hook processing host</description>
</property>
<property>
  <name>com.unraveldata.hive.hook.tcp</name>
  <value>true</value>
</property>
<property>
  <name>com.unraveldata.hive.hdfs.dir</name>
  <value>/user/unravel/HOOK_RESULT_DIR</value>
  <description>destination for hive-hook, Unravel log processing</description>
</property>
<property>
  <name>hive.exec.driver.run.hooks</name>
<value>com.unraveldata.dataflow.hive.hook.UnravelHiveHook</value>
  <description>for Unravel, from unraveldata.com</description>
</property>
<property>
  <name>hive.exec.pre.hooks</name>  <value>com.unraveldata.dataflow.hive.hook.UnravelHiveHook</value>
  <description>for Unravel, from unraveldata.com</description>
</property>
<property>
  <name>hive.exec.post.hooks</name>  <value>com.unraveldata.dataflow.hive.hook.UnravelHiveHook</value>
  <description>for Unravel, from unraveldata.com</description>
</property>
<property>
  <name>hive.exec.failure.hooks</name>  <value>com.unraveldata.dataflow.hive.hook.UnravelHiveHook</value>
  <description>for Unravel, from unraveldata.com</description>
</property>

Ensure to replace UNRAVEL HOST NAME with the Unravel hostname. Replace The Unravel Host Name with the hostname of the edge node in case of a multi-cluster deployment.

Option 2: Editor:

Click + and enter the property, value, and description (optional).

Property	Value	Description
com.unraveldata.host	Replace with Unravel hostname or with the hostname of the edge node in case of a multi-cluster deployment.	Unravel hive-hook processing host
com.unraveldata.hive.hook.tcp	true	Hive hook tcp protocol.
com.unraveldata.hive.hdfs.dir	/user/unravel/HOOK_RESULT_DIR	Destination directory for hive-hook, Unravel log processing.
hive.exec.driver.run.hooks	com.unraveldata.dataflow.hive.hook.UnravelHiveHook	Hive hook
hive.exec.pre.hooks	com.unraveldata.dataflow.hive.hook.UnravelHiveHook	Hive hook
hive.exec.post.hooks	com.unraveldata.dataflow.hive.hook.UnravelHiveHook	Hive hook
hive.exec.failure.hooks	com.unraveldata.dataflow.hive.hook.UnravelHiveHook	Hive hook

Similarly, ensure to add the same hive hook configurations in HiveServer2 Advanced Configuration Snippet (Safety Valve) for hive-site.xml.
Optionally, add a comment in Reason for change and then click Save Changes.
From the Cloudera Manager page, Click the Stale configurations icon () to deploy the configuration and restart the Hive services.
Check Unravel UI to see if all Hive queries are running.
- If queries are running fine and appearing in Unravel UI, then you have successfully added the hive hooks configurations.
- If queries are failing with a class not found error or permission problems:
  - Undo the hive-site.xml changes in Cloudera Manager.
  - Deploy the hive client configuration.
  - Restart the Hive service.
  - Follow the steps in Troubleshooting.

6. Set Kafka configuration

In Cloudera Manager, select the target cluster, click Kafka service > Configuration, and search for broker_java_opts.

In Additional Broker Java Options enter the following:

-server -XX:+UseG1GC -XX:MaxGCPauseMillis=20 -XX:InitiatingHeapOccupancyPercent=35 -XX:G1HeapRegionSize=16M -XX:MinMetaspaceFreeRatio=50 -XX:MaxMetaspaceFreeRatio=80 -XX:+DisableExplicitGC -Djava.awt.headless=true -Djava.net.preferIPv4Stack=true -Dcom.sun.management.jmxremote.local.only=true -Djava.rmi.server.useLocalHostname=true -Dcom.sun.management.jmxremote.rmi.port=9393

Click Save Changes.

7. Configure Spark properties in spark-defaults.conf

In Cloudera Manager, select the target cluster and then click Spark.
Select Configuration.
Search for spark-defaults.
In Spark Client Advanced Configuration Snippet (Safety Valve) for spark-conf/spark-defaults.conf, enter the following text, replacing placeholders with your particular values:
```
spark.unravel.server.hostport=unravel-host:port 
spark.driver.extraJavaOptions=-javaagent:/opt/cloudera/parcels/UNRAVEL_SENSOR/lib/java/btrace-agent.jar=config=driver,libs=spark-version
spark.executor.extraJavaOptions=-javaagent:/opt/cloudera/parcels/UNRAVEL_SENSOR/lib/java/btrace-agent.jar=config=executor,libs=spark-version
spark.eventLog.enabled=true
```
- <unravel-host>: Specify the Unravel hostname. In the case of multi-cluster deployment use the FQDN or logical hostname of the edge node for unravel-host.
- <Port>: 4043 is the default port. If you have customized the ports, you can specify that port number here.
- <spark-version>: For spark-version, use a Spark version that is compatible with this version of Unravel. You can check the Spark version with the spark-submit --version command and specify the same version.
  For spark-version, use a Spark version that is compatible with this version of Unravel. For example,
  spark-1.6 for Spark 1.6.x
  spark-2.0 for Spark 2.0.x
  spark-2.1 for Spark 2.1.x
  spark-2.2 for Spark 2.2.x
  spark-2.3 for Spark 2.3.x
  spark-2.4 for Spark 2.4.x
  spark-3.0 for Spark 3.0.x
Click Save changes.
Click the Stale configurations icon () to deploy the client configuration and restart the Spark services. Your spark-shell will ensure new JVM containers are created with the necessary extraJavaOptions for the Spark drivers and executors.
Enable Spark streaming.
Check Unravel UI to see if all Spark jobs are running.
- If jobs are running and appearing in Unravel UI, you have deployed the Spark jar successfully.
- If queries are failing with a class not found error or permission problems:
  - Undo the spark-defaults.conf changes in Cloudera Manager.
  - Deploy the client configuration.
  - Investigate and fix the issue.
  - Follow the steps in Troubleshooting.

Note

8. Retrieve Impala data from Cloudera Manager

<Unravel installation directory>/manager config properties set <PROPERTY> <VALUE>

For example,

<Unravel installation directory>/manager config properties set com.unraveldata.data.source cm 
<Unravel installation directory>/manager config properties set com.unraveldata.cloudera.manager.url http://my-cm-url  
<Unravel installation directory>/manager config properties set com.unraveldata.cloudera.manager.username mycmname 
<Unravel installation directory>/manager config properties set com.unraveldata.cloudera.manager.password mycmpassword

For multi-cluster, use the following format and set these on the edge node:

<Unravel installation directory>/manager config properties set com.unraveldata.data.source cm 
<Unravel installation directory>/manager config properties set com.unraveldata.cloudera.manager.url http://my-cm-url  
<Unravel installation directory>/manager config properties set com.unraveldata.cloudera.manager.username mycmname 
<Unravel installation directory>/manager config properties set com.unraveldata.cloudera.manager.password mycmpassword

Note

By default, the Impala sensor task is enabled. To disable it, you can edit the following property as follows:

<Unravel installation directory>/manager config properties set com.unraveldata.sensor.tasks.disabled iw

Optionally, you can change the Impala lookback window. By default, when Unravel Server starts, it retrieves the last 5 minutes of Impala queries. To change this, do the following:

Change the value for com.unraveldata.cloudera.manager.impala.look.back.minutes property.

<Unravel installation directory>/manager config properties set com.unraveldata.cloudera.manager.impala.look.back.minutes -<period>
For example: <Unravel installation directory>/manager config properties set com.unraveldata.cloudera.manager.impala.look.back.minutes -7

Note

Include a minus sign in front of the new value.

References

For more information on creating permanent functions, see Cloudera documentation.

Enable additional instrumentation for HDP

This topic explains how to configure Unravel to retrieve additional data from Hive, Tez, Spark, and Oozie, such as Hive queries, application timelines, Spark jobs, YARN resource management data, and logs. You can do this by generating Unravel's JARs and distributing them to every node that runs queries in the cluster. Later, after the JARs are distributed to the nodes, you can integrate Hive, Tez, and Spark data with Unravel.

1. Generate and distribute Unravel's Hive Hook and Spark Sensor JARs

Create a directory, for example, /usr/local/unravel-jars, for the JARs on the edge node in a multi-cluster environment.
```
mkdir /usr/local/unravel-jars
chown -R unravel:unravel /usr/local/unravel-jars
```
Generate the JARs using the unravel_hdp_setup.py script.
```
<Unravel installation directory>/unravel/manager run script unravel_hdp_setup.py --sensor-only --unravel-server <unravel-host>:3000 --spark-version <spark-version> --hive-version <hive-version> --ambari-server <ambari-host> --btrace-dir /usr/local/unravel-jars --hive-hook-dir /usr/local/unravel-jars/
```
Replace the values for unravel-host, spark-version, hive-version, and ambari-host with appropriate values.
Tip
For unravel-host, specify the protocol (HTTP or HTTPS) and use the fully qualified domain name (FQDN) or IP address of Unravel Server. For example, https://playground3.unraveldata.com:3000.
For spark-version, use a Spark version that is compatible with this version of Unravel. For example,
spark-1.6 for Spark 1.6.x
spark-2.0 for Spark 2.0.x
spark-2.1 for Spark 2.1.x
spark-2.2 for Spark 2.2.x
spark-2.3 for Spark 2.3.x
spark-2.4 for Spark 2.4.x
spark-3.0 for Spark 3.0.x
For hive-version, use a Hive version that is compatible with this version of Unravel. For example,
HDP 3.x
3.1.0 for Hive 3.1.0
HDP 2.x
1.2.0 for Hive 1.2.0 or 1.2.1
0.13.0 for Hive 0.13.0
Distribute /usr/local/unravel-jars to all worker, edge, and master nodes that run the queries.
For example,
```
scp -r /usr/local/unravel-jars root@hostname:/usr/local/unravel-jars
```
Make sure the node can reach port 4043 of Unravel Server.

2. Configure Ambari to work with Unravel

Hive configurations
1. Import the hive hook sensor jar into the classpath
  On the Ambari UI, click Hive > Configs > Advanced > Advanced hive-env. In the hive-env template, towards the end of line, add:
```
export AUX_CLASSPATH=${AUX_CLASSPATH}:<path to unravel hive hook sensor jar>/unravel-hive-<version>-hook.jar 
```
  For example:
```
export AUX_CLASSPATH=${AUX_CLASSPATH}:/usr/local/unravel-jars/unravel-hive-1.2.0-hook.jar 
```
2. Configure hive hook
  On the Ambari UI, click Hive > Configs > Advanced. In the General section, search for the following hive hooks:
  hive.exec.failure.hooks
  hive.exec.post.hooks
  hive.exec.pre.hooks
  hive.exec.run.hooks
  Copy the ,com.unraveldata.dataflow.hive.hook.UnravelHiveHook property against each of the hooks.
  Important
  Be sure to append with no space before or after the comma, for example, property=existingValue,newValue
  For example:
```
hive.exec.failure.hooks=existing-value,com.unraveldata.dataflow.hive.hook.UnravelHiveHook
hive.exec.post.hooks=existing-value,com.unraveldata.dataflow.hive.hook.UnravelHiveHook
hive.exec.pre.hooks=existing-value,com.unraveldata.dataflow.hive.hook.UnravelHiveHook
hive.exec.run.hooks=existing-value,com.unraveldata.dataflow.hive.hook.UnravelHiveHook
```
  In case you do not find these hive hooks, go to the Custom hive-site section, click Add Property and add these as key and value per line in the Properties text box.
  For example:
```
hive.exec.pre.hooks=com.unraveldata.dataflow.hive.hook.UnravelHiveHook
```
  Similarly, in the Custom hive-site section. ensure to set com.unraveldata.host: to the edge node's hostname.

Configure HDFS

Click HDFS > Configs > Advanced > Advanced hadoop-env. In the hadoop-env template, look for export HADOOP_CLASSPATH and append Unravel's JAR path as shown.

export HADOOP_CLASSPATH=${HADOOP_CLASSPATH}:<Unravel sensor installation directory>/unravel-hive-<version>-hook.jar

For example:

export HADOOP_CLASSPATH=${HADOOP_CLASSPATH}:/opt/unravel-hive-1.2.0-hook.jar

Configure the BTrace agent for Tez
From the Ambari UI, go to Tez > config > Advanced and in the General section, append the Java options below to tez.am.launch.cmd-opts and tez.task.launch.cmd-opts:
```
-javaagent:<Unravel installation directory>/unravel-jars/jars/btrace-agent.jar=libs=mr,config=tez -Dunravel.server.hostport=<unravel-host>:4043
```
<unravel-host> should be the hostname of the Unravel edge node. For example: abcd.unraveldata.com
Tip
In a Kerberos environment, you need to modify tez.am.view-acls property with *.
Configure the Application Timeline Server (ATS)
Note
From Unravel v4.6.1.6, this step is not mandatory.
1. In yarn-site.xml:
```
yarn.timeline-service.enabled=true
yarn.timeline-service.entity-group-fs-store.group-id-plugin-classes=org.apache.tez.dag.history.logging.ats.TimelineCachePluginImpl
yarn.timeline-service.version=1.5 or yarn.timeline-service.versions=1.5f,2.0f
```
2. If yarn.acl.enable is true, add unravel to yarn.admin.acl.
3. In hive-env.sh, add:
```
Use ATS Logging: true
```
4. In tez-site.xml, add:
```
tez.dag.history.logging.enabled=true
tez.am.history.logging.enabled=true
tez.history.logging.service.class=org.apache.tez.dag.history.logging.ats.ATSV15HistoryLoggingService
tez.am.view-acls=*
```
  If tez-site.xml is not available, you can also add these properties from the Ambari UI. Go to Tez > config > Custom tez-site and add the above properties as key and value per line in the Properties text box.
  Note
  From HDP version 3.1.0 onwards, this Tez configuration must be done manually.
Configure Spark-on-Yarn
Tip
For unravel-host, use Unravel Server's fully qualified domain name (FQDN) or IP address.
For spark-version, use a Spark version that is compatible with this version of Unravel. For example,
- spark-1.6 for Spark 1.6.x
- spark-2.0 for Spark 2.0.x
- spark-2.1 for Spark 2.1.x
- spark-2.2 for Spark 2.2.x
- spark-2.3 for Spark 2.3.x
- spark-2.4 for Spark 2.4.x
- spark-3.0 for Spark 3.0.x
1. Add the location of the Spark JARs.
  Click Spark > Configs > Custom spark-defaults > Add Property and use Bulk property add mode, or edit spark-defaults.conf as follows:
  Tip
  If your cluster has only one Spark 1.X version, spark-defaults.conf is in /usr/hdp/current/spark-client/conf.
  If your cluster is running Spark 2.X, spark-defaults.conf is in /usr/hdp/current/spark2-client/conf.
  This example uses default locations for Spark JARs. Your environment may vary.
```
spark.unravel.server.hostport=unravel-host:4043
spark.driver.extraJavaOptions=-javaagent:/usr/local/unravel-jars/jars/btrace-agent.jar=config=driver,libs=<spark-version>
spark.executor.extraJavaOptions=-javaagent:/usr/local/unravel-jars/jars/btrace-agent.jar=config=executor,libs=<spark-version>
spark.eventLog.enabled=true 
```
  For example:
```
spark.unravel.server.hostport=xyznode.unraveldata.com:4043
spark.driver.extraJavaOptions=-javaagent:/usr/local/unravel-jars/jars/btrace-agent.jar=config=driver,libs=spark-2.3
spark.executor.extraJavaOptions=-javaagent:/usr/local/unravel-jars/jars/btrace-agent.jar=config=executor,libs=spark-2.3
spark.eventLog.enabled=true 
```
  Note
  If you have multiple Spark services in the same cluster, you must set the Spark default configuration on each of them.
2. Enable Spark streaming.
Configure Oozie
1. In Ambari, click Oozie >Configs > Advanced .
2. In the Filter box, search for oozie.service.WorkflowAppService.system.libpath and check the path shown.
3. From a terminal application on the Unravel edge node, pick up the ShareLib Root Directory directory path with the latest timestamp.
```
hdfs dfs -ls <path to ShareLib directory>
// For example: hdfs dfs -ls /user/oozie/share/lib/
```
  Important
  The jars must be copied to the lib directory (with the latest timestamp), which is shown in ShareLib Root Directory.
4. From a terminal application, copy the Hive Hook JAR /usr/local/unravel-jars/btrace-agent.jar/unravel-hive-<version>-hook.jar and the Btrace JAR, /usr/local/unravel-jars/jars/btrace-agent.jar to the specified path in ShareLib Root Directory.
```
hdfs dfs -copyFromLocal /usr/local/unravel-jars/btrace-agent.jar/unravel-hive-<version>-hook.jar /user/oozie/share/lib/<latest timestamp lib directory>/

##For example: 
hdfs dfs -copyFromLocal /usr/local/unravel-jars/unravel-hive-3.1.0-hook.jar /user/oozie/share/lib/lib_20210504054909
```
```
hdfs dfs -copyFromLocal /usr/local/unravel-jars/jars/btrace-agent.jar /user/oozie/share/lib/<latest timestamp lib directory>/

##For example: 
hdfs dfs -copyFromLocal /usr/local/unravel-jars/jars/btrace-agent.jar /user/oozie/share/lib/lib_20210504054909
```
  Caution
  Jobs controlled by Oozie 2.3+ fail if you do not copy the Hive Hook and BTrace JARs to the HDFS shared library path.

3. Configure the Unravel Host

Define the Tez and ATS properties using the manager service.

Stop Unravel.

<Unravel installation directory>/unravel/manager stop

Set the properties, shown in the tables below, for TEZ and for the Application Timeline Server (ATS) if it requires authentication. Use the manager config properties set command to set the properties.

<Unravel installation directory>/unravel/manager config properties set <property> <key>

##For example: 
/opt/unravel/manager config properties yarn.ats.webapp.username user1
/opt/unravel/manager config properties yarn.ats.webapp.password pa$$w0rD

Tez

Property/Description	Set by user	Unit	Default
com.unraveldata.yarn.timeline-service.webapp.address The HTTP address of the Timeline service web application.	Optional	string (URL)	-
com.unraveldata.yarn.timeline-service.port Timeline service port.		number	8188

Property/Description

Set by user

Unit

Default

com.unraveldata.yarn.timeline-service.webapp.address

The HTTP address of the Timeline service web application.

Optional

string

(URL)

com.unraveldata.yarn.timeline-service.port

Timeline service port.

number

8188

Note

In a multi-cluster environment, you must add these properties to the Edge node.

Application Timeline Server (ATS) requires authentication

Property/Description	Set by user	Unit	Default
yarn.ats.webapp.username Username required for authentication to the Application Timeline Server (if authentication is required).	Optional	string	-
yarn.ats.webapp.password Password required for authentication to the Application Timeline Server (if authentication is required).	Optional	string	-

Property/Description

Set by user

Unit

Default

yarn.ats.webapp.username

Username required for authentication to the Application Timeline Server (if authentication is required).

Optional

string

yarn.ats.webapp.password

Password required for authentication to the Application Timeline Server (if authentication is required).

Optional

string

Apply the changes.

<Unravel installation directory>/unravel/manager config apply

Start Unravel.

<Unravel installation directory>/unravel/manager start

4. Optional: Confirm that Unravel UI shows Tez data.

Run the hive_test_simple.sh script on the HDP cluster or on any cloud environment where hive.execution.engine=tez.
```
<Unravel installation directory>/unravel/manager run script hive_test_simple.sh
```
Log into Unravel server and go to the Applications page. Check for Tez jobs.
Unravel UI may take a few seconds to load Tez data.

5. Add more configurations

Refer to Additional configurations.

In this section:

Home

Installing Unravel in a multi-cluster environment

Note

1. Install and set up Unravel on core node

Notice

1. Download Unravel

Important

2. Deploy Unravel binaries

Note

Important

3. Run setup

Tip

Note

Notice

Note

Notice

Note

Important

Note

Note

Notice

Important

Note

Note

Notice

Tip

Note

4. Add configurations

2. Install and set up Unravel on the edge node

1. Download Unravel

Important

2. Deploy Unravel binaries

Note

3. Install and set up Unravel on the edge nodes

Notice

3. Configure core node with edge node settings

Notice

Tip

4. Install Unravel with Interactive Precheck

Note

Note

4. Enable additional instrumentation

Enable additional instrumentation for CDH

Note

Note

Tip

Note

Important

Caution

Note

Note

Note

Note

Note

Warning

Important

Enable additional instrumentation for CDP

Note

Note

Tip

Note

Important

Caution

Note

Note

Note

Note

Enable additional instrumentation for HDP

Tip

Important

Tip

Note

Note

Tip

Tip

Note

Important

Caution

Note

Search results