Cloudera Data Platform (CDP)
Before installing, ensure to check and complete the installation requirements. Follow the below instructions to download, install, and set up Unravel for the CDP platform.
Note
These instructions are for a single cluster environment. For installing Unravel on a multi-cluster environment, refer to Multi-cluster install.
1. Download Unravel
2. Deploy Unravel binaries
Unravel binaries are available as a tar file or RPM package. You can deploy the Unravel binaries in any directory on the server. However, the user who installs Unravel must have the write permissions to the directory where the Unravel binaries are deployed.
If the binaries are deployed to <Unravel_installation_directory>
, Unravel will be available in <Unravel_installation_directory>/unravel
. The directory layout for the Tar and RPM will be unravel/versions/<Directories and files>
.
The following steps to deploy Unravel from a tar file must be performed by a user who will run Unravel.
Create an Installation directory.
mkdir
/path/to/installation/directory
For example: mkdir /opt/unravel
Note
Some locations may require root access to create a directory. In such a case, after the directory is created, change the ownership to unravel user and continue with the installation procedure as the unravel user.
chown -R
username
:groupname
/path/to/installation/directory
For example: chown -R unravel:unravelgroup /opt/unravel
Extract and copy the Unravel tar file to the installation directory, which was created in the first step. After you extract the contents of the tar file,
unravel
directory is created within the installation directory.tar -zxf unravel-
<version>
tar.gz -C/<Unravel-installation-directory>
For example: tar -zxf unravel-4.7.x.x.tar.gz -C /opt The Unravel directory will be available within /opt
Important
A root user should perform the following steps to deploy Unravel from an RPM package. After the RPM package is deployed, the remaining installation procedures should be performed by the unravel user.
Create an installation directory.
mkdir /usr/local/unravel
Run the following command:
rpm -i unravel-
<version>
.rpmFor example: rpm -i unravel-4.7.x.x.rpm
In case you want to provide a different location, you can do so by using the --prefix command. For example:
mkdir /opt/unravel chown -R
username
:groupname
/opt/unravel rpm -i unravel-4.7.0.0.rpm --prefix /optThe Unravel directory is available in /opt.
Grant ownership of the directory to a user who runs Unravel. This user executes all the processes involved in Unravel installation.
chown -R
username
:groupname
/usr/local/unravelFor example: chown -R unravel:unravelgroup /usr/local/unravel The Unravel directory is available in /usr/local.
Continue with the installation procedures as Unravel user.
3. Run the setup
You can run the setup command to install Unravel. The setup command allows you to do the following:
Runs Precheck automatically to detect possible issues that prevent a successful installation. Suggestions are provided to resolve issues. Refer to Precheck filters for the expected value for each filter.
Let you run extra parameters to integrate the database of your choice.
The setup command allows you to use a managed database shipped with Unravel or an external database. The setup uses the Unravel managed PostgreSQL database when run without any additional parameters. Otherwise, you can specify one of the following types of databases in the setup command:
MySQL (Unravel managed as well as external MySQL database)
MariaDB (Unravel managed as well as external MariaDB database)
PostgreSQL (External PostgreSQL)
Refer to Integrate database for details.
Let you specify a separate path for the data directory other than the default path.
You can locate the Unravel data and configurations in the
data
directory. By default, the installer maintains the data directory under<Unravel installation directory>/data
. You can also change the data directory's default location by running additional parameters with the setup command. To install Unravel with the setup command.Provides more setup options.
Notice
The Unravel user who owns the installation directory should run the setup command to install Unravel.
To install Unravel with the setup command, do the following:
Switch to Unravel user.
su -
<unravel user>
Run setup command:
Note
Refer to setup Options for all the additional parameters that you can run with the setup command
Refer to Integrate database topic and complete the pre-requisites before running the setup command with any other database other than Unravel managed PostgreSQL, which is shipped with the product. Extra parameters must be passed with the setup command when using another database.
Tip
Optionally, if you want to provide a different data directory, you can pass an extra parameter (--data-directory) with the setup command as follows:
<unravel_installation_directory>/unravel/versions/
<Unravel version>
/setup --data-directory/the/data/directory
Similarly, you can configure separate directories for other unravel directories —contact support for assistance.
PostgreSQL
Unravel managed PostgreSQL
<unravel_installation_directory>/unravel/versions/
<Unravel version>
/setupExternal PostgreSQL
<unravel_installation_directory>/unravel/versions/
<Unravel version>
/setup --external-database postgresql<HOST>
<PORT>
<SCHEMA>
<USERNAME>
<PASSWORD>
The
HOST
,PORT
,SCHEMA
,USERNAME
, andPASSWORD
are optional fields and are prompted if missing. For example: /opt/unravel/versions/abcd.992/setup --external-database postgresql xyz.unraveldata.com 5432 unravel_db_prod unravel unraveldata
MySQL
Unravel managed MySQL
<unravel_installation_directory>/unravel/versions/
<Unravel version>
/setup --extra /tmp/mysqlExternal MySQL
<unravel_installation_directory>/unravel/versions/
<Unravel version>
/setup --extra /tmp/<MySQL-directory> --external-database mysql<HOST>
<PORT>
<SCHEMA>
<USERNAME>
<PASSWORD>
The
HOST
,PORT
,SCHEMA
,USERNAME
, andPASSWORD
are optional fields and are prompted if missing.
MariaDB
Unravel managed MariaDB
<unravel_installation_directory>
/unravel/versions/<Unravel version>
/setup --extra /tmp/mariadbExternal MariaDB
<unravel_installation_directory>
unravel/versions/<Unravel version>
/setup --extra /tmp/<MariaDB-directory>
--external-database mariadb<HOST>
<PORT>
<SCHEMA>
<USERNAME>
<PASSWORD>
The
HOST
,PORT
,SCHEMA
,USERNAME
, andPASSWORD
are optional fields and are prompted if missing.
Precheck is automatically run when you run the setup command. Refer to Precheck filters for the expected value for each filter.
Apply the changes.
<Unravel installation directory>/unravel/manager config apply
Start all the services.
<unravel_installation_directory>/unravel/manager start
Check the status of services.
<unravel_installation_directory>/unravel/manager report
The following service statuses are reported:
OK: Service is up and running.
Not Monitored: Service is not running. (Has stopped or has failed to start)
Initializing: Services are starting up.
Does not exist: The process unexpectedly disappeared. A restart will be attempted ten times.
You can also get the status and information for a specific service. Run the manager report command as follows:
<unravel_installation_directory>/unravel/manager report <service>
For example: /opt/unravel/manager report auto_action
The Precheck output displays the issues that prevent a successful installation and provides suggestions to resolve them. You must resolve each of the issues before proceeding. See Precheck filters.
After resolving the precheck issues, you must re-login or reload the shell to execute the setup command again.
Note
You can skip the precheck using the setup --skip-precheck command in certain situations.
For example:
/opt/unravel/versions/<Unravel version>/setup --skip-precheck
You can also skip the checks that you know can fail. For example, if you want to skip the Check limits option and the Disk freespace option, pick the command within the parenthesis corresponding to these failed options and run the setup command as follows:
setup --filter-precheck ~check_limits,~check_freespace
Tip
Run --help with the setup command and any combination of the setup command for complete usage details.
<unravel_installation_directory>/unravel/versions/<Unravel version>
/setup --help
Precheck Sample
/opt/unravel/versions/abcd.1004/setup 2021-04-05 15:51:30 Sending logs to: /tmp/unravel-setup-20210405-155130.log 2021-04-05 15:51:30 Running preinstallation check... 2021-04-05 15:51:31 Gathering information ................. Ok 2021-04-05 15:51:51 Running checks .................. Ok -------------------------------------------------------------------------------- system Check limits : PASSED Clock sync : PASSED CPU requirement : PASSED, Available cores: 8 cores Disk access : PASSED, /opt/unravel/versions/develop.1004/healthcheck/healthcheck/plugins/system is writable Disk freespace : PASSED, 229 GB of free disk space is available for precheck dir. Kerberos tools : PASSED Memory requirement : PASSED, Available memory: 79 GB Network ports : PASSED OS libraries : PASSED OS release : PASSED, OS release version: centos 7.6 OS settings : PASSED SELinux : PASSED -------------------------------------------------------------------------------- Healthcheck report bundle: /tmp/healthcheck-20210405155130-xyz.unraveldata.com.tar.gz 2021-04-05 15:51:53 Prepare to install with: /opt/unravel/versions/abcd.1004/installer/installer/../installer/conf/presets/default.yaml 2021-04-05 15:51:57 Sending logs to: /opt/unravel/logs/setup.log 2021-04-05 15:51:57 Instantiating templates ................................................................................................................................................................................................................................ Ok 2021-04-05 15:52:05 Creating parcels .................................... Ok 2021-04-05 15:52:20 Installing sensors file ............................ Ok 2021-04-05 15:52:20 Installing pgsql connector ... Ok 2021-04-05 15:52:22 Starting service monitor ... Ok 2021-04-05 15:52:27 Request start for elasticsearch_1 .... Ok 2021-04-05 15:52:27 Waiting for elasticsearch_1 for 120 sec ......... Ok 2021-04-05 15:52:35 Request start for zookeeper .... Ok 2021-04-05 15:52:35 Request start for kafka .... Ok 2021-04-05 15:52:35 Waiting for kafka for 120 sec ...... Ok 2021-04-05 15:52:37 Waiting for kafka to be alive for 120 sec ..... Ok 2021-04-05 15:52:42 Initializing pgsql ... Ok 2021-04-05 15:52:46 Request start for pgsql .... Ok 2021-04-05 15:52:46 Waiting for pgsql for 120 sec ..... Ok 2021-04-05 15:52:47 Creating database schema ................. Ok 2021-04-05 15:52:50 Generating hashes .... Ok 2021-04-05 15:52:52 Loading elasticsearch templates ............ Ok 2021-04-05 15:52:55 Creating kafka topics .................... Ok 2021-04-05 15:53:36 Creating schema objects ....................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................... Ok 2021-04-05 15:54:03 Request stop ....................................................... Ok 2021-04-05 15:54:16 Done [unravel@xyz ~]$
4. Add configurations
Run manager config auto command to automatically pull in all the Hadoop configurations. You will be prompted to provide the location and credentials for Cloudera Manager or Ambari UI.
<unravel_installation_directory>
/unravel/manager config autoIf there are more than one clusters that are handled by Cloudera Manager or Ambari, you will be prompted to enable the cluster that you want to monitor. Run the following command to enable a cluster.
<unravel_installation_directory>
/unravel/manager config cluster enable<CLUSTER_KEY>
Example: /opt/unravel/manager config cluster enable cluster1
Tip
Here <CLUSTER_KEY> is the name of the cluster that you want to enable for Unravel monitoring. This can be retrieved from the output shown for the manager config auto command.
The Hive metastore database password can be recovered automatically only for a cluster manager with an administrative account. Otherwise, it must be set manually as follows:
<Unravel installation directory>
/unravel/manager config hive metastore password<CLUSTER_KEY>
<HIVE_KEY>
<PASSWORD>
Example: /opt/unravel/manager config hive metastore password cluster1 HIVE p@P@SsWorD
Tip
Here
<CLUSTER_KEY>
is the name of the cluster where you want to set the Hive configurations.Also, refer to Connecting to Hive metastore in a single cluster environment.
Optional: Set up Kerberos to authenticate Hadoop services.
If you are using Kerberos authentication, set the principal path and keytab, enable Kerberos authentication, and apply the changes.
<Unravel installation directory>
/unravel/manager config kerberos set --keytab</path/to/keytab file>
--principal<server@example.com>
<Unravel installation directory>
/unravel/manager config kerberos enable<unravel_installation_directory>
/manager config applyIf you are using Truststore certificates, run the following steps from the manager tool to add certificates to the Truststore:
Download the certificates to a directory.
Provide permissions to the user, who installs unravel, to access the certificates directory.
chown -R
username
:groupname
/path/to/certificates/directory
Upload the certificates.
## Option 1
<unravel_installation_directory>
/unravel/manager config tls trust add</path/to/the/certificate/files
or ## Option 2<unravel_installation_directory>
/unravel/manager config tls trust add --pem</path/to/the/certificate/files>
<unravel_installation_directory>
/unravel/manager config tls trust add --jks</path/to/the/certificate/files>
<unravel_installation_directory>
/unravel/manager config tls trust add --pkcs12</path/to/the/certificate/files>
Enable the Truststore
<unravel_installation_directory>
/unravel/manager config tls trust<enable|disable>
<unravel_installation_directory>
/unravel/manager config applyVerify the connection.
<unravel_installation_directory>
/unravel/manager verify connect <Cluster Manager-host> <Cluster Manager-port>For example:
/opt/unravel/manager verify connect xyz.unraveldata.com 7180 -- Running: verify connect xyz.unraveldata.com 7180 - Resolved IP: 111.17.4.123 - Reverse lookup: ('xyz.unraveldata.com', [], ['111.17.4.123']) - Connection: OK - TLS: No -- OK
If you are using TLS protocol, refer to Enabling Transport Layer Security (TLS) for Unravel UI.
Apply changes.
<unravel_installation_directory>
/unravel/manager config applyStart all the services.
<unravel_installation_directory>
/unravel/manager startCheck the status of services.
<unravel_installation_directory>
/unravel/manager reportThe following service statuses are reported:
OK: Service is up and running
Not Monitored: Service is not running. (Has stopped or has failed to start)
Initializing: Services are starting up.
Does not exist: The process unexpectedly disappeared. Restarts will be attempted 10 times.
You can also get the status and information for a specific service. Run the manager report command as follows:
<unravel_installation_directory>
/unravel/manager report <service>For example: /opt/unravel/manager report auto_action
Set additional configurations, if required.
Optionally, you can run healthcheck, at this point, to verify that all the configurations and services are running successfully.
<unravel_installation_directory>/unravel/manager healthcheck
Healthcheck is run automatically on an hourly basis in the backend. You can set your email to receive the healthcheck reports.
5. Enable additional instrumentation for CDP
This topic explains how to enable additional instrumentation on your gateway/edge/client nodes that are used to submit jobs to your big data platform. Additional instrumentation can include:
Hive queries in Hadoop that are pushed to Unravel Server by the Hive Hook sensor, a JAR file.
Spark job performance metrics that are pushed to Unravel Server by the Spark sensor, a JAR file.
Impala queries that are pulled from Cloudera Manager .
Sensor JARs packaged in a parcel on Unravel Server.
Tez Dag information is pushed to Unravel server by the Tez sensor, a JAR file.
1. Download, distribute, and activate Unravel sensor
Sensor JARs are packaged in a parcel on Unravel server. Run the following steps from the Cloudera Manager to download, distribute, and activate this parcel.
Note
Ensure that Unravel is up and running before you perform the following steps.
In Cloudera Manager, click . The Parcel page is displayed.
On the Parcel page, click Configuration or Parcel Repositories & Network settings. The Parcel Configurations dialog box is displayed.
Go to the Remote Parcel Repository URLs section, click + and enter the Unravel host along with the exact directory name for your CDH version.
http://
<unravel-host>
:<port>/
parcels/<cdh <major:minor version>
/For example: http://xyz.unraveldata.com:3000/parcels/cdh 7.1
<unravel-host>
is the hostname or LAN IP address of Unravel. In a multi-cluster scenario, this would be the host where thelog_receiver
daemon is running.<port>
is the Unravel UI port. The default is 3000. In case you have customized the default port, you can add that port number.<cdh-version>
is your version of CDP. For example,cdh7.1
.You can go to
http://
directory (For example: http://xyznode46.unraveldata.com:3000/parcels) and copy the exact directory name of your CDH version (CDH<major.minor>).<unravel-host>
:<port>
/parcels/
Note
If you're using Active Directory Kerberos,
unravel-host
must be a fully qualified domain name or IP address.Tip
If you're running more than one version of CDP (for example, you have multiple clusters), you can add more than one parcel entry for
unravel-host
.Click Save Changes.
In the Cloudera Manager, click Check for new parcels find the
UNRAVEL_SENSOR
parcel that you want to distribute, and click the corresponding Download button.After the parcel is downloaded, click the corresponding Distribute button. This will distribute the parcel to all the hosts.
After the parcel is distributed, click the corresponding Activate button. The status column will now display Distributed, Activated.
Note
If you have an old sensor parcel from Unravel, you must deactivate it now.
2. Put the Hive Hook JAR in AUX_CLASSPATH
In Cloudera Manager, select the target cluster from the drop-down, click Hive on Tez >Configuration, and search for
Service Environment
.In Hive on Tez Service Environment Advanced Configuration Snippet (Safety Valve) enter the following exactly as shown, with no substitutions:
AUX_CLASSPATH=${AUX_CLASSPATH}:/opt/cloudera/parcels/UNRAVEL_SENSOR/lib/java/unravel_hive_hook.jar
Ensure that the Unravel hive hook JAR has the read/execute access for the user running the hive server.
3. Oozie: Copy Hive Hook and BTrace JARs to HDFS shared library path
In Cloudera Manager, select the target cluster from the drop-down, click Oozie >Configuration and check the path shown in ShareLib Root Directory.
From a terminal application on the Unravel node (edge node in case of multi-cluster.), pick up the ShareLib Root Directory directory path with the latest timestamp.
hdfs dfs -ls
<path to ShareLib directory>
// For example: hdfs dfs -ls /user/oozie/share/lib/Important
The jars must be copied to the
lib
directory (with the latest timestamp), which is shown inShareLib Root Directory
.Copy the Hive Hook JAR
/opt/cloudera/parcels/UNRAVEL_SENSOR/lib/java/unravel_hive_hook.jar
and the Btrace JAR,/opt/cloudera/parcels/UNRAVEL_SENSOR/lib/java/btrace-agent.jar
to the specified path in ShareLib Root Directory.For example, if the path specified in ShareLib Root Directory is
/user/oozie
, run the following commands to copy the JAR files.hdfs dfs -copyFromLocal /opt/cloudera/parcels/UNRAVEL_SENSOR/lib/java/unravel_hive_hook.jar /user/oozie/share/lib/
<latest timestamp lib directory>
/ //For example: hdfs dfs -copyFromLocal /opt/cloudera/parcels/UNRAVEL_SENSOR/lib/java/unravel_hive_hook.jar /user/oozie/share/lib/lib_20210326035616/hdfs dfs -copyFromLocal /opt/cloudera/parcels/UNRAVEL_SENSOR/lib/java/btrace-agent.jar /user/oozie/share/lib/
<latest timestamp lib directory>
/ //For example: hdfs dfs -copyFromLocal /opt/cloudera/parcels/UNRAVEL_SENSOR/lib/java/btrace-agent.jar /user/oozie/share/lib/lib_20210326035616/From a terminal application, copy the Hive Hook JAR
/opt/cloudera/parcels/UNRAVEL_SENSOR/lib/java/unravel_hive_hook.jar
and the Btrace JAR,/opt/cloudera/parcels/UNRAVEL_SENSOR/lib/java/btrace-agent.jar
to the specified path in ShareLib Root Directory.Caution
Jobs controlled by Oozie 2.3+ fail if you do not copy the Hive Hook and BTrace JARs to the HDFS shared library path.
4. Deploy the BTrace JAR for Tez service
On the Cloudera Manager, go to Tez > configuration and search the following properties:
tez.am.launch.cmd-opts
tez.task.launch.cmd-opts
Append the following to tez.am.launch.cmd-opts and tez.task.launch.cmd-opts properties:
-javaagent:/opt/cloudera/parcels/UNRAVEL_SENSOR/lib/java/btrace-agent.jar=libs=mr,config=tez -Dunravel.server.hostport=
<unravel_host>
:4043Note
For
unravel-host
, specify the FQDN or the logical hostname of Unravel or of the edge node in case of multi-cluster.Note
In case you are using JDK version 9 or later, ensure to add the following to the existing JAVA options:
--add-exports java.base/jdk.internal.perf=ALL-UNNAMED --add-opens jdk.management/com.sun.management.internal=ALL-UNNAMED --add-exports java.management/sun.management.counter.perf=ALL-UNNAMED --add-exports java.management/sun.management.counter=ALL-UNNAMED
For example, the complete JAVA options are specified as follows:
-javaagent:/opt/cloudera/parcels/UNRAVEL_SENSOR/lib/java/btrace-agent.jar=libs=mr,config=tez -Dunravel.server.hostport=
<unravel_host>
:4043 --add-exports java.base/jdk.internal.perf=ALL-UNNAMED --add-opens jdk.management/com.sun.management.internal=ALL-UNNAMED --add-exports java.management/sun.management.counter.perf=ALL-UNNAMED --add-exports java.management/sun.management.counter=ALL-UNNAMEDClick the Stale configurations icon () to deploy the client configuration and restart the Tez services.
5. Set Hive Hook configuration
On the Cloudera Manager, click Hive on Tez > Configuration tab.
Search for
hive-site.xml
, which will lead to the Hive Client Advanced Configuration Snippet (Safety Valve) for hive-site.xml section.Specify the hive hook configurations. You have the option to either use the XML text field or Editor to specify the hive hook configuration.
Option 1: XML text field
Click View as XML to open the XML text field and copy-paste the following.
<property> <name>com.unraveldata.host</name> <value>
<UNRAVEL HOST NAME>
</value> <description>Unravel hive-hook processing host</description> </property> <property> <name>com.unraveldata.hive.hook.tcp</name> <value>true</value> </property> <property> <name>com.unraveldata.hive.hdfs.dir</name> <value>/user/unravel/HOOK_RESULT_DIR</value> <description>destination for hive-hook, Unravel log processing</description> </property> <property> <name>hive.exec.driver.run.hooks</name> <value>com.unraveldata.dataflow.hive.hook.UnravelHiveHook</value> <description>for Unravel, from unraveldata.com</description> </property> <property> <name>hive.exec.pre.hooks</name> <value>com.unraveldata.dataflow.hive.hook.UnravelHiveHook</value> <description>for Unravel, from unraveldata.com</description> </property> <property> <name>hive.exec.post.hooks</name> <value>com.unraveldata.dataflow.hive.hook.UnravelHiveHook</value> <description>for Unravel, from unraveldata.com</description> </property> <property> <name>hive.exec.failure.hooks</name> <value>com.unraveldata.dataflow.hive.hook.UnravelHiveHook</value> <description>for Unravel, from unraveldata.com</description> </property>Ensure to replace
UNRAVEL HOST NAME
with the Unravel hostname. Replace TheUnravel Host Name
with the hostname of the edge node in case of a multi-cluster deployment.Option 2: Editor:
Click + and enter the property, value, and description (optional).
Property
Value
Description
com.unraveldata.host
Replace with Unravel hostname or with the hostname of the edge node in case of a multi-cluster deployment.
Unravel hive-hook processing host
com.unraveldata.hive.hook.tcp
true
Hive hook tcp protocol.
com.unraveldata.hive.hdfs.dir
/user/unravel/HOOK_RESULT_DIR
Destination directory for hive-hook, Unravel log processing.
hive.exec.driver.run.hooks
com.unraveldata.dataflow.hive.hook.UnravelHiveHook
Hive hook
hive.exec.pre.hooks
com.unraveldata.dataflow.hive.hook.UnravelHiveHook
Hive hook
hive.exec.post.hooks
com.unraveldata.dataflow.hive.hook.UnravelHiveHook
Hive hook
hive.exec.failure.hooks
com.unraveldata.dataflow.hive.hook.UnravelHiveHook
Hive hook
Note
If you configure CDP with Cloudera Navigator's safety valve setting, you must edit the following keys and append the value com.unraveldata.dataflow.hive.hook.UnravelHiveHook without any space.
hive.exec.post.hooks
hive.exec.pre.hooks
hive.exec.failure.hooks
For example:
<property> <name>hive.exec.post.hooks</name> <value>com.unraveldata.dataflow.hive.hook.UnravelHiveHook,com.cloudera.navigator.audit.hive.HiveExecHookContext,org.apache.hadoop.hive.ql.hooks.LineageLogger</value> <description>for Unravel, from unraveldata.com</description> </property>
Similarly, ensure to add the same hive hook configurations in HiveServer2 Advanced Configuration Snippet (Safety Valve) for hive-site.xml.
Optionally, add a comment in Reason for change and then click Save Changes.
From the Cloudera Manager page, Click the Stale configurations icon () to deploy the configuration and restart the Hive services.
Check Unravel UI to see if all Hive queries are running.
If queries are running fine and appearing in Unravel UI, then you have successfully added the hive hooks configurations.
If queries are failing with a
class not found
error or permission problems:Undo the
hive-site.xml
changes in Cloudera Manager.Deploy the hive client configuration.
Restart the Hive service.
Follow the steps in Troubleshooting.
6. Set Kafka configuration
In Cloudera Manager, select the target cluster, click Kafka service > Configuration, and search for
broker_java_opts
.In Additional Broker Java Options enter the following:
-server -XX:+UseG1GC -XX:MaxGCPauseMillis=20 -XX:InitiatingHeapOccupancyPercent=35 -XX:G1HeapRegionSize=16M -XX:MinMetaspaceFreeRatio=50 -XX:MaxMetaspaceFreeRatio=80 -XX:+DisableExplicitGC -Djava.awt.headless=true -Djava.net.preferIPv4Stack=true -Dcom.sun.management.jmxremote.local.only=true -Djava.rmi.server.useLocalHostname=true -Dcom.sun.management.jmxremote.rmi.port=9393
Click Save Changes.
7. Configure Spark properties in spark-defaults.conf
In Cloudera Manager, select the target cluster and then click Spark.
Select Configuration.
Search for
spark-defaults
.In Spark Client Advanced Configuration Snippet (Safety Valve) for spark-conf/spark-defaults.conf, enter the following text, replacing placeholders with your particular values:
spark.unravel.server.hostport=
unravel-host
:port
spark.driver.extraJavaOptions=-javaagent:/opt/cloudera/parcels/UNRAVEL_SENSOR/lib/java/btrace-agent.jar=config=driver,libs=spark-version
spark.executor.extraJavaOptions=-javaagent:/opt/cloudera/parcels/UNRAVEL_SENSOR/lib/java/btrace-agent.jar=config=executor,libs=spark-version
spark.eventLog.enabled=trueNote
If you are using JDK version 9 or later, ensure to add the following to the existing JAVA options:
--add-exports java.base/jdk.internal.perf=ALL-UNNAMED --add-opens jdk.management/com.sun.management.internal=ALL-UNNAMED --add-exports java.management/sun.management.counter.perf=ALL-UNNAMED --add-exports java.management/sun.management.counter=ALL-UNNAMED
For example, the complete JAVA options are specified as follows:
-javaagent:/opt/cloudera/parcels/UNRAVEL_SENSOR/lib/java/btrace-agent.jar=libs=mr -Dunravel.server.hostport=
unravel-host
:4043 --add-exports java.base/jdk.internal.perf=ALL-UNNAMED --add-opens jdk.management/com.sun.management.internal=ALL-UNNAMED --add-exports java.management/sun.management.counter.perf=ALL-UNNAMED --add-exports java.management/sun.management.counter=ALL-UNNAMED<unravel-host>
: Specify the Unravel hostname. In the case of multi-cluster deployment use the FQDN or logical hostname of the edge node forunravel-host
.<Port>
: 4043 is the default port. If you have customized the ports, you can specify that port number here.<spark-version>
: Forspark-version
, use a Spark version that is compatible with this version of Unravel. You can check the Spark version with the spark-submit --version command and specify the same version.
Click Save changes.
Click the Stale configurations icon () to deploy the client configuration and restart the Spark services. Your spark-shell will ensure new JVM containers are created with the necessary extraJavaOptions for the Spark drivers and executors.
Check Unravel UI to see if all Spark jobs are running.
If jobs are running and appearing in Unravel UI, you have deployed the Spark jar successfully.
If queries are failing with a
class not found
error or permission problems:Undo the
spark-defaults.conf
changes in Cloudera Manager.Deploy the client configuration.
Investigate and fix the issue.
Follow the steps in Troubleshooting.
Note
If you have YARN-client mode applications, the default Spark configuration is not sufficient, because the driver JVM starts before the configuration set through the SparkConf is applied. For more information, see Apache Spark Configuration. In this case, configure the Unravel Sensor for Spark to profile specific Spark applications only (in other words, per-application profiling rather than cluster-wide profiling).
8. Retrieve Impala data from Cloudera Manager
Impala properties are automatically configured. Refer to Impala properties for the list of properties that are automatically configured. If it is not set already by auto-configuration, set the properties as follows:
<Unravel installation directory>/manager config properties set <PROPERTY> <VALUE>
For example,
<Unravel installation directory>/manager config properties set com.unraveldata.data.source cm <Unravel installation directory>/manager config properties set com.unraveldata.cloudera.manager.url http://my-cm-url
<Unravel installation directory>/manager config properties set com.unraveldata.cloudera.manager.usernamemycmname
<Unravel installation directory>/manager config properties set com.unraveldata.cloudera.manager.passwordmycmpassword
For multi-cluster, use the following format and set these on the edge node:
<Unravel installation directory>/manager config properties set com.unraveldata.data.source cm <Unravel installation directory>/manager config properties set com.unraveldata.cloudera.manager.url http://my-cm-url
<Unravel installation directory>/manager config properties set com.unraveldata.cloudera.manager.usernamemycmname
<Unravel installation directory>/manager config properties set com.unraveldata.cloudera.manager.passwordmycmpassword
Note
By default, the Impala sensor task is enabled. To disable it, you can edit the following property as follows:
<Unravel installation directory>/manager config properties set com.unraveldata.sensor.tasks.disabled iw
Optionally, you can change the Impala lookback window. By default, when Unravel Server starts, it retrieves the last 5 minutes of Impala queries. To change this, do the following:
Change the value for com.unraveldata.cloudera.manager.impala.look.back.minutes property.
<Unravel installation directory>/manager config properties set com.unraveldata.cloudera.manager.impala.look.back.minutes -<period>
For example: <Unravel installation directory>/manager config properties set com.unraveldata.cloudera.manager.impala.look.back.minutes -7
Note
Include a minus sign in front of the new value.
9. Enable Impala Monitoring
Refer to Monitoring Impala.
10. Add more configurations
References
For more information on creating permanent functions, see Cloudera documentation.