Install Unravel on GCP Dataproc

Before installing Unravel on Google Dataproc, check and ensure that the Unravel installation requirements are completed and follow the instructions to install and configure Unravel:

1. Create and configure the GCE instance

On your GCP console, go to the GCEs dashboard and click Create Instance.
Select the following options based on Unravel's instance requirements:
- Base OS
- Instance type and size
- GCE instance's Firewall Rules
- Ports
- Networking
  The GCE instance must be in the same network as the target Dataproc clusters, which the Unravel compute node is monitoring.
- Firewall rules or policies
  - Create a Cloud storage ReadAccess only IAM role and assign it to Unravel GCE to read the archive logs on the Cloud storage bucket configured for the Dataproc cluster.
  - Create TCP and UDP connections from the Dataproc master node to Unravel Compute node.
  - Create a firewall rule that allows port 3000 and port 4043 from Dataproc cluster nodes' IP addresses, and put the member of the Firewall Rules used on Dataproc cluster in this rule.
  Sample inbound rule
  Type
  Protocol
  Port range
  Source
  All traffic
  All
  All
  For example, 10.10.0.0/16
  SSH
  TCP
  22
  0.0.0.0/0 or trusted public IP for SSH access
  Custom TCP Rule
  TCP
  3000
  Custom TCP Rule
  TCP
  4043
  Sample outbound rule
  Type
  Protocol
  Port range
  Source
  All traffic
  All
  All
  0.0.0.0/0
  Note
  The GCE instance should have all TCP access to the Dataproc cluster (server/parent or worker) nodes. You can grant access by inserting adding firewall rules of the Dataproc server/parent and worker with all TCP, and all port ranges.
  If it isn't possible to allow the Unravel VM access to all traffic to the Dataproc cluster, you must minimally allow it to access cluster nodes' TCP ports 9870, 9866, and 9867.
  While creating the GCE instance add the Firewall properties, Enable the HTTP and HTTPS traffic Go to Network tab, and add Network tags. (This is the firewall rule that is already created.)

Type	Protocol	Port range	Source
All traffic	All	All	For example, 10.10.0.0/16
SSH	TCP	22	0.0.0.0/0 or trusted public IP for SSH access
Custom TCP Rule	TCP	3000
Custom TCP Rule	TCP	4043

Type	Protocol	Port range	Source
All traffic	All	All	0.0.0.0/0

Configure the GCE instance

Disable selinux.
```
sudo setenforce Permissive
```
Edit /etc/selinux/config to make sure the setting persists after reboot and make sure SELINUX=permissive.
```
sudo vi /etc/selinux/config
```

Install libaio.x86_64, lzop.x86_64, and ntp.x86_64.

sudo yum install -y libaio.x86_64
sudo yum install -y lzop.x86_64
sudo yum install -y ntp.x86_64

Start ntpd and check the system time.
```
sudo service ntpd start
sudo ntpq -p
```
Create a new user named hadoop.
```
sudo useradd hadoop
```

2. Download Unravel

Important

Before you download Unravel for your platform, ensure to get the username and password from Unravel Support.

Download Unravel package (RPM or Tar). Refer to Download section for the complete list of Unravel product downloads along with the corresponding md5sum.

For example:

Tar

curl -v https://preview.unraveldata.com/unravel/RPM/4.7.2/unravel-4.7.2.0.tar.gz -o unravel-4.7.2.0.tar.gz -u username:password

RPM

curl -v https://preview.unraveldata.com/unravel/RPM/4.7.2/unravel-4.7.2.0.rpm -o unravel-4.7.2.0.rpm -u username:password

3. Deploy Unravel binaries

Unravel binaries are available as a tar file or RPM package. You can deploy the Unravel binaries in any directory on the server. However, the user who installs Unravel must have the write permissions to the directory where the Unravel binaries are deployed.

After deploying the Unravel binaries, the directory layout for the Tar and RPM will be unravel/versions/<Directories and files>. The binaries are deployed to <Unravel_installation_directory>, and Unravel will be available in <Unravel_installation_directory/unravel.

The following steps to deploy Unravel from a tar file should be performed by a user who will run Unravel.

Create an Installation directory.
```
mkdir /path/to/installation/directory
```
For example: mkdir /opt/
Extract Unravel tar file to the installation directory, which you have created in the first step. After you extract the contents of the tar file, unravel directory is created within the installation directory.
```
tar zxf unravel-<version>tar.gz -C /path/to/installation/directory
```
For example: tar zxf unravel-4.7.x.x.tar.gz -C /opt
The unravel directory will be available within /opt
Grant ownership of the directory to a user who will run Unravel.
```
chown -R username:groupname /opt/unravel/
```
For example: chown -R hadoop:hadoop /opt/unravel/

Important

A root user should perform the following steps to deploy Unravel from an RPM package. After the RPM package is deployed, the remaining installation procedures should be performed by the Unravel user.

Create an installation directory.
```
mkdir /usr/local/unravel
```
Run the following command:
```
rpm -i unravel-<version>.rpm
```
For example: rpm -i unravel-4.7.0.0.rpm
The unravel directory will be available in /usr/local
If you want to provide a different location, use the --prefix command. For example:
```
mkdir /opt/unravel
rpm -i unravel-4.7.0.0.rpm --prefix /opt
```
The unravel directory will be available in /opt.
Grant ownership of the directory to a user who will run Unravel. This user executes all the processes involved in Unravel installation.
```
chown -R username:groupname /usr/local/unravel
```
For example: chown -R hadoop:hadoop /usr/local/unravel
Continue with the installation procedures as unravel user.

4. Install Unravel on GCE

You can install Unravel either with Interactive Precheck or manually without Interactive Precheck.

Note

Unravel recommends installation with Interactive Precheck.

Install Unravel with Interactive Precheck on GCE

The Interactive Precheck utility is run to validate the required configurations before installing Unravel. When you run the Interactive Precheck utility various checks are prompted for gathering configuration information. The responses you provide for these checks are used to generate a bootstrap configuration file. This file, which contains the configuration information, is then used to install Unravel.

Do the following to install and configure Unravel with Interactive Precheck.

After you download and deploy the Unravel, run the precheck.sh script from unravel/versions/X.Y.Z/healthcheck/.
For example:
/opt/unravel/versions/X.Y.Z/healthcheck/precheck.sh

Enter the necessary details when you are prompted for the following configuration information:

This section covers general information about your Unravel install. You are prompted for the following:

Data platform you want to monitor.
For Hadoop: type of Unravel node you want to configure.
For edge nodes: core node location and test connectivity.

You must answer the following prompts:

-- General information
Which data platform are you installing for?
   1- Hadoop
   2- EMR
   3- HDI
   4- Databricks
   5- Dataproc
   6- BigQuery

Select one of the above [Hadoop]: 

## You can choose a number corresponding to the platform.

This check allows you to configure database-related information and an external database for Unravel.

-- Database configuration
 Configure an external database? (y/n) [No]:

If you answer No, an Unravel-managed database is used for the installation.

If you answer Yes, you are further prompted for the type of external database that you want to configure.

-- Database configuration
   
   Configure an external database? (y/n) [No]: y

   Type
   1- PostgresQL
   2- MySQL
   3- MariaDB

If you choose a specific type of external database, you are prompted for the following database information and test connectivity to that database. Refer to Integrating Database for more details. For example:

-- Database configuration
  
   Configure an external database? (y/n) [No]: y

   Type
   1- PostgresQL
   2- MySQL
   3- MariaDB

   Select one of the above []: 1
   Selected: PostgresQL

   Database hostname [None]:

   Database port (integer) [None]:

   Database schema [None]:

   Does the database use TLS (y/n) [No]: 

   Database username [None]:

   Database password [None] (no echo):

   Do you wish to test connecting to the external database? (y/n) [Yes]:-- Database configuration
 Configure an external database? (y/n) [No]:

If you choose MySQL or MariaDB database you are further prompted for extra packages. If you answer Yes, the Extra packages section searches for the required JDBC drivers.
```
-- Database configuration
 Will Unravel connect to a MySQL or MariaDB database (ex: hive metastore) ? (y/n) [No]: 
```

The Extra packages check shows if you use Unravel-managed MySQL/MariaDB or need JDBC drivers. Else, this check is automatically skipped.

-- Extra package location

*** JDBC drivers are required for Unravel managed MySQL or MariaDB.
*** Database software package is required for Unravel managed MySQL or MariaDB.

External package location [None]: /<my-extra-packages>
##This is the path to the directory where the required packages are located.

If the required packages are located, then the following message is shown:

The following packages will be installed:
   Database server: /my-extra-packages/mysql-5.7.27-linux-glibc2.12-x86_64.tar.gz
   JDBC driver:
     - /my-extra-packages/mysql-connector-java-5.1.48.tar.gz

 External package: Ok

If the required packages are not found, then the following error message is showing:

External package: ERROR
 - ERROR: Couldn't find jdbc drivers in /my-extra-packages
 - ERROR: Looked for:
   mysql-connector-java-*.tar.gz
   mysql-connector-java-*.jar
   mariadb-java-client-*.jar
 - ERROR: Couldn't find database server package in /my-extra-packages
 - ERROR: Looked for:
   mysql-*-linux-glibc2.12-x86_64.tar.gz
   mariadb-*-linux-x86_64.tar.gz

This check allows you to configure and test HTTPS for the unravel UI. This check prompts you for the certificate, key, password, and hostname details used to access Unravel.

Use HTTPS to access unravel? (y/n) [Yes]:  
##If you answer “Yes”, you are prompted for the path to the certificate and key. Unravel uses this information to configure TLS during installation. 
##If you answer “No”, you are shown a warning message for confirmation.

The information provided is verified for the following:

If the Key and Certificate match
If the certificate is valid
If the certificate applies for the provided hostname

This check allows you to set the Unravel UI port and verify the connectivity.

-- Unravel default port
  
  Port number (integer) [3000]: 

  Do you want to test if the port is accessible? (y/n) [Yes]: 
    
  This will open port 3000 and listen for connection for 120 seconds.
  Use your browser to test if the Unravel UI will be accessible on that port.
    
  We have detected the following hostnames:
  - some.host.example
    
 Browse to: http://some.host.example:3000
    
 ATTENTION: This address is an example. You should test with the URL that will be used to access Unravel.

A connection on port 3000 is tried and established. If the connection is successful, Unravel Port Test: OK is shown on the browser, and Unravel port: Reached is shown on the server.

This check allows you to set a custom data directory and verify the access if the directories exist. You will always find the software location where you deploy the Unravel binaries. In this check, only the space and access are tested. That data location that you have configured will be used.

-- Unravel directories

  Software [/opt/unravel]: 

  Data [/opt/unravel/data]: 

  Directories: ERROR
  - OK: 33 GB of free disk space for software.
  - ERROR: SYSH0026: Space for data 33 GB is low, recommended minimum is 100 GB.

This check allows you to configure and test email. You are prompted for host and credentials, and the following items are tested:

Connectivity
Authentication, only if provided.
Optional: Send test mail.

Following is a sample:

-- Mail server (SMTP) configuration

Unravel can send notification and alert emails.
   This will allow you to configure and test connection to a SMTP server.
   Optionally, it can also send a test email.

   You will have to provide:
   - Protocol, hostname and port
   - Credentials if required



   Configure a SMTP server? (y/n) [No]: y

   SMTP hostname [None]: smtphostname.gmail.com

   SMTP port (usually 25 for clear text, 465 for SSL, 587 for STARTLS) (integer) [None]: 587

   Security protocol
   1- None
   2- SSL
   3- StartTLS

   Select one of the above [None]: 3
   Selected: StartTLS

   Authentication required? (y/n) [Yes]: y

   Username [None]: daemon@unraveldata.com

   Password [None] (no echo):

   From [None]: daemon@unraveldata.com

   To [None]: user@unraveldata.com

   Send test email (y/n) [No]: y

Note

For more information, refer to Using the Interactive Precheck utility.

The responses that you have provided for the configuration information are used to generate a configuration file. You can use this configuration file when you run the setup command to install and configure Unravel.

After you have completed the responses, you are prompted to confirm if you want to generate the bootstrap configuration file. Press ENTER if you want to generate the bootstrap configuration file.
```
-- Unravel bootstrap configuration

   Generate a unravel bootstrap configuration file? (y/n) [Yes]:
```
The bootstrap configuration file is generated and located at /tmp/unravel-interactive-precheck/unravel-bootstrap.yaml.

Install Unravel with the bootstrap configuration file.

<unravel_installation_directory>/unravel/versions/<Unravel version>/setup --bootstrap /tmp/unravel-interactive-precheck/unravel-bootstrap.yaml

Apply the changes.

<Unravel installation directory>/unravel/manager config apply

Start all the services.

<unravel_installation_directory>/unravel/manager start

Check the status of services.
```
<unravel_installation_directory>/unravel/manager report 
```
The following service statuses are reported:
- OK: Service is up and running.
- Not Monitored: Service is not running. (Has stopped or has failed to start)
- Initializing: Services are starting up.
- Does not exist: The process unexpectedly disappeared. A restart will be attempted ten times.
You can also get the status and information for a specific service. Run the manager report command as follows:
```
<unravel_installation_directory>/unravel/manager report <service> 
```
For example: /opt/unravel/manager report auto_action

Install Unravel manually

You can run the setup command to install Unravel. The setup command does the following:

Runs Precheck automatically to detect possible issues that prevent a successful installation. Suggestions are provided to resolve issues. Refer to Precheck filters for the expected value for each filter.
Let you run extra parameters to integrate the database of your choice.
The setup command allows you to use a managed database shipped with Unravel or an external database. When run without any additional parameters, the setup uses the Unravel managed PostgreSQL database. Otherwise, you can specify one of the following types of databases in the setup command:
- MySQL (Unravel managed as well as external MySQL database)
- MariaDB (Unravel managed)
- PostgreSQL (Unravel managed)
Refer to Integrate database for details.
Let you specify a separate path for the data directory other than the default path.
The Unravel data and configurations are located in the data directory. By default, the installer maintains the data directory under <Unravel installation directory>/data. You can also change the data directory's default location by running additional parameters with the setup command.
Provides more setup options.

Notice

The Unravel user who owns the installation directory should run the setup command to install Unravel.

To install Unravel with the setup command, do the following:

After deploying the binaries, if you are the root user, switch to Unravel user.
```
  su - <unravel user>
```

Run setup command:

Note

Refer to setup Options for all the additional parameters that can be run with the setup command

Refer to Integrate database topic and complete the pre-requisites before running the setup command with any other database other than Unravel managed PostgreSQL, which is shipped with the product. Extra parameters must be passed with the setup command when you use another database.

Tip

Optionally, if you want to provide a different data directory, you can pass an extra parameter (--data-directory) with the setup command as shown below:

<unravel_installation_directory>/unravel/versions/<Unravel version>/setup --data-directory /the/data/directory

Similarly, you can configure separate directories for other unravel directories. Contact support for assistance.

PostgreSQL

Unravel managed PostgreSQL

<unravel_installation_directory>/unravel/versions/<Unravel version>/setup --enable-dataproc

External PostgreSQL

<unravel_installation_directory>/unravel/versions/<Unravel version>/setup --enable-dataproc --external-database postgresql <HOST> <PORT> <SCHEMA> <USERNAME> <PASSWORD>

##The HOST, PORT, SCHEMA, USERNAME, PASSWORD are optional fields and are prompted if missing.

##For example:
/opt/unravel/versions/abcd.992/setup --enable-dataproc --external-database postgresql xyz.unraveldata.com 5432 unravel_db_prod unravel unraveldata

MySQL

Unravel managed MySQL

<unravel_installation_directory>/unravel/versions/<Unravel version>/setup --enable-dataproc --extra /tmp/mysql

External MySQL

<unravel_installation_directory>/unravel/versions/<Unravel version>/setup --enable-dataproc --extra /tmp/<MySQL-directory> --external-database mysql <HOST> <PORT> <SCHEMA> <USERNAME> <PASSWORD>

##The HOST, PORT, SCHEMA, USERNAME, PASSWORD are optional fields and are prompted if missing.

MariaDB

Unravel managed MariaDB

<unravel_installation_directory>/unravel/versions/<Unravel version>/setup --enable-dataproc --extra /tmp/mariadb

External MariaDB

<unravel_installation_directory>unravel/versions/<Unravel version>/setup --enable-dataproc --extra /tmp/<MariaDB-directory> --external-database mariadb <HOST> <PORT> <SCHEMA> <USERNAME> <PASSWORD>

##The HOST, PORT, SCHEMA, USERNAME, PASSWORD are optional fields and are prompted if missing.

Precheck is automatically run when you run the setup command. Refer to Precheck filters for the expected value for each filter.

Set the following property:

<unravel_installation_directory>/unravel/manager config properties set com.unraveldata.process.event.log false

Apply changes

<unravel_installation_directory>/unravel/manager config apply

Start all the services.

<unravel_installation_directory>/unravel/manager start

Check the status of services.
```
<unravel_installation_directory>/unravel/manager report 
```
The following service statuses are reported:
- OK: Service is up and running.
- Not Monitored: Service is not running. (Has stopped or has failed to start)
- Initializing: Services are starting up.
- Does not exist: The process unexpectedly disappeared. Restarts will be attempted 10 times.
You can also get the status and information for a specific service. Run the manager report command as follows:
```
<unravel_installation_directory>/unravel/manager report <service> 
## For example: /opt/unravel/manager report auto_action
```

Precheck

The Precheck output displays the issues that prevent a successful installation and also provides suggestions to resolve them. You must resolve each of the issues before proceeding. See Precheck filters.

After the prechecks are resolved, you must re-login or reload the shell to execute the setup command again.

Here is a sample of the Precheck run result:

/opt/unravel/versions/abcd.1004/setup 
2021-04-05 15:51:30 Sending logs to: /tmp/unravel-setup-20210405-155130.log
2021-04-05 15:51:30 Running preinstallation check...
2021-04-05 15:51:31 Gathering information ................. Ok
2021-04-05 15:51:51 Running checks .................. Ok
--------------------------------------------------------------------------------
system
 Check limits        : PASSED
 Clock sync          : PASSED
 CPU requirement     : PASSED, Available cores: 8 cores
 Disk access         : PASSED, /opt/unravel/versions/develop.1004/healthcheck/healthcheck/plugins/system is writable
 Disk freespace      : PASSED, 229 GB of free disk space is available for precheck dir.
 Kerberos tools      : PASSED
 Memory requirement  : PASSED, Available memory: 79 GB
 Network ports       : PASSED
 OS libraries        : PASSED
 OS release          : PASSED, OS release version: centos 7.6
 OS settings         : PASSED
 SELinux             : PASSED
--------------------------------------------------------------------------------
Healthcheck report bundle: /tmp/healthcheck-20210405155130-xyz.unraveldata.com.tar.gz
2021-04-05 15:51:53 Prepare to install with: /opt/unravel/versions/abcd.1004/installer/installer/../installer/conf/presets/default.yaml
2021-04-05 15:51:57 Sending logs to: /opt/unravel/logs/setup.log
2021-04-05 15:51:57 Instantiating templates ................................................................................................................................................................................................................................ Ok
2021-04-05 15:52:05 Creating parcels .................................... Ok
2021-04-05 15:52:20 Installing sensors file ............................ Ok
2021-04-05 15:52:20 Installing pgsql connector ... Ok
2021-04-05 15:52:22 Starting service monitor ... Ok
2021-04-05 15:52:27 Request start for elasticsearch_1 .... Ok
2021-04-05 15:52:27 Waiting for elasticsearch_1 for 120 sec ......... Ok
2021-04-05 15:52:35 Request start for zookeeper .... Ok
2021-04-05 15:52:35 Request start for kafka .... Ok
2021-04-05 15:52:35 Waiting for kafka for 120 sec ...... Ok
2021-04-05 15:52:37 Waiting for kafka to be alive for 120 sec ..... Ok
2021-04-05 15:52:42 Initializing pgsql ... Ok
2021-04-05 15:52:46 Request start for pgsql .... Ok
2021-04-05 15:52:46 Waiting for pgsql for 120 sec ..... Ok
2021-04-05 15:52:47 Creating database schema ................. Ok
2021-04-05 15:52:50 Generating hashes .... Ok
2021-04-05 15:52:52 Loading elasticsearch templates ............ Ok
2021-04-05 15:52:55 Creating kafka topics .................... Ok
2021-04-05 15:53:36 Creating schema objects ....................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................... Ok
2021-04-05 15:54:03 Request stop ....................................................... Ok
2021-04-05 15:54:16 Done
[unravel@xyz ~]$

Note

In certain situations, you can skip the precheck using the setup --skip-precheck command

For example:

/opt/unravel/versions/<Unravel version>/setup --skip-precheck

You can also skip the checks that you know can fail. For example, if you want to skip the Check limits option and the Disk freespace option, pick the command within the parenthesis corresponding to these failed options and run the setup command as follows:

setup --filter-precheck ~check_limits,~check_freespace

Tip

Run --help with the setup command and any combination of the setup command for complete usage details.

<unravel_installation_directory>/unravel/versions/<Unravel version>/setup --help

Precheck filters

Filters	Description	Expected Value
System
Check uptime	Verifies the period since the last server reboot.	>24h
Clock sync	Verifies if the clock synchronization service is running on the server.	The clock synchronization service is up and running.
CPU requirement	Verifies if the server has enough CPUs to run Unravel efficiently.	Check requirements.
Memory requirement	Verifies that the server has enough memory to run Unravel efficiently.	Check requirements.
Disk access	Verifies that the user who runs unravel has access to the configured disk locations.	Unravel users can access the configured disk locations.
Disk Freespace	Verifies if the disk locations have enough free space.	Check requirements
Kerberos tools	Verifies that the Kerberos tools are available on the server to support kerberized environments.	Kerberos tools are installed.
Network ports	Verifies that the network ports used by Unravel are available.	Check requirements
OS libraries	Verifies the required libraries if run with Unravel managed MySQL.	The following packages must be installed for fulfilling the OS level requirements for MySQL: `numactl-libs` (for libnuma.so) `libaio` (for libaio.so)
OS release	Verifies that the OS distribution is supported.	Check compatibility matrix
OS settings	Verifies vm.max_map_count recommended.	Check requirements
SELinux	Verifies if the SELinux status is enabled or not and provides in which mode it is(Permissive, Disabled, Enforcing).	Check product documentation.
Check limits	Verifies that user limits are set to values	Check requirements
Healthcheck report bundle	Healthcheck report tarball. This report provides the summary and information gathered by the healthcheck with the location.

Setup Options

setup Options	Description
-h, --help	Shows help for setup.
--config CONFIG	Specify a different path to the configuration file. <unravel_installation_directory>/unravel/versions/`<Unravel version>`/setup --config `path/to/config/directory`
--enable-core	Enables core node support for non-Hadoop clusters.
--cluster-access (Edge node parameter)	Enables cluster access to the core node in a multi-cluster environment.
--data-forwarder host:port cluster-type cluster-id	Data forwarder, main unravel node.
--data-directory	Specify a different path to the data directory.
--external-database [param [param ...]]	Enable external database.
--external-database-ssl	Enable external database with SSL.
--log-file	Setup log file location. Default is `/tmp/unravel-setup-YYYYMMDD-HHMMSS.log`.
--extra DIR, -e DIR	Specify extra packages location.
--precheck	Run the preinstallation check

5. Connecting Unravel Server to a new Dataproc cluster

This section explains how to set up and configure your Dataproc cluster so Unravel can begin monitoring jobs running on the cluster.

Assumptions

The GCE instance for Unravel Server has been created.
Unravel services are running.
The Unravel GCE instance and Dataproc clusters allow all outbound traffic.
The nodes in the Dataproc cluster allow all traffic from the Unravel GCE. This implies either one of the following configurations:
- The DataProc cluster is on a different VPC, and you've configured VPC peering, route table creation, and updated your Firewall policy.
- The DataProc cluster is on a different VPC, and you've configured VPC peering, route table creation, and updated your Firewall policy.
Network ACL on VPC allows all traffic.

Connect to a new DataProc cluster

Perform the following steps to run Initialization actions, unravel_dataproc_init.py, on all nodes in the cluster. The bootstrap script makes the following changes:

On the server/parent node:
- On Hive clusters, it updates /etc/hive/conf/hive-site.xml.
- On Spark clusters, it updates /etc/spark/conf/spark-defaults.conf.
- It updates /etc/hadoop/conf/mapred-site.xml.
- It updates /etc/hadoop/conf/yarn-site.xml.
- If Tez is installed, it updates /etc/tez/conf/tez-site.xml.
- It installs and starts the unravel_es daemon in /usr/local/unravel_es.
- It installs the Spark and MapReduce sensors in /usr/local/unravel-agent.
- It installs the Hive Hook sensor in /usr/lib/hive/lib/.
On all other nodes:
- It installs the Spark and MapReduce sensors in /usr/local/unravel-agent.
- It installs Hive sensors in /usr/lib/hive/lib.

Be sure to substitute your specific bucket location for my-bucket.

Download Unravel's bootstrap script, unravel_dataproc_init.py using curl or gsutil.

curl

curl https://storage.cloud.google.com/unraveldata.com/unravel_dataproc_init.py -o /tmp/unravel_dataproc_bootstrap.py

gsutil

gsutil cp gs://unraveldata.com/unravel_dataproc_init.py /tmp/unravel_dataproc_init.py

Upload the bootstrap script to a Google Cloud Storage Bucket.
Permissions needed
You need the write access to the Cloud Storage bucket that you want to upload the init actions script to. In addition, the GCP account you use to create the Dataproc cluster must have read access to the init action script to execute its directives.
Use gsutil to upload the init action script to the default Dataproc logging bucket.
```
gsutil cp unravel_dataproc_init.py gs://my-bucket/unravel_dataproc_init.py
```
In the GCP console, select the Dataproc services and click Create cluster.
In the Create Dataproc cluster window, click CREATE for the Cluster on Compute Engine option.
In the Set up cluster section, enter the cluster name and select Standard or Single Node cluster type.
In the Versioning section, ensure that the 2.0 (Debian 10, Hadoop 3.2, Spark 3.1) standard dataproc image is selected.
You can skip the Configure nodes section.

In the Customize cluster section, perform the following actions:

Section	Option
Network configuration	Specify the Network Options such as the VPC and subnet. Important Ensure that the Dataproc cluster and the Unravel server are created on the same VPC and subnet. For more information, see prerequisites.
Initialization actions	Click ADD INITIALIZATION ACTION and select the `<my-bucket>/unravel_dataproc_init.py` script to connect your Dataproc cluster to the Unravel node. For example, `unraveldata.com/unravel_dataproc_init.py`
Custom cluster metadata	Add the Unravel server details in the following fields: Key: `unravel-server` Value: `<your-unravel-server-public-IP-address>` Caution If the Unravel server name is not configured, the cluster is not integrated with the virtual machine (Unravel node).

Skip the Manage security section.
Click CREATE.
A new Dataproc cluster is created.

Sanity check

After you connect the Unravel GCE to your Dataproc cluster, run some jobs on the Dataproc cluster and monitor the information displayed in Unravel UI (http://unravel_VM_node_public_IP:3000).

In this section:

Home

Install Unravel on GCP Dataproc

1. Create and configure the GCE instance

Note

2. Download Unravel

Important

3. Deploy Unravel binaries

Important

4. Install Unravel on GCE

Note

Note

Notice

Note

Tip

Note

Tip

5. Connecting Unravel Server to a new Dataproc cluster

Assumptions

Permissions needed

Important

Caution

Search results