Enabling multi-node deployment of Spark workers for high-volume data processing
You can deploy additional Spark workers on a separate server other than the server where Unravel is installed with services to process high-volume data. This section provides instructions for the multi-node setup of the Spark worker.
Setting multi-node deployment for Spark workers
The multi-node setting must first be established on the Unravel main node, where Unravel is installed with all the services, and then on the worker node.
Install Unravel on the main node using the setup command, set the license, and set the LR endpoint. Refer to Unravel Databricks installation.
Stop Unravel.
<Unravel installation directory>
/unravel/manager stopSet the worker node details in
unravel.yaml
file. Run the following manager commands:Make Unravel daemons accessible for Spark daemons over the network.
<Unravel installation directory>
/unravel/manager config multinode public-listen enableAdd the hostname of the Spark daemon in
unravel.yaml
file.<Unravel installation directory>
/unravel/manager config multinode add<host-key>
<name>
<hostname>
--role spark_workerEnter the following values:
host-key - Node identifier
name - A name to identify the node.
hostname - The hostname of the new node, which is used for generating the configuration.
For example:
/opt/unravel/manager config multinode add workers-1 My Spark Workers my.server.domain --role spark_worker
Optionally for advanced configuration, --host-alias can be added to look up the configuration when the server name used in the configuration differs from the name used when looking up the configuration.
Tip
You can run the following commands for assistance:
<Unravel installation directory>
/unravel/manager/config multinode public-listen --help<Unravel installation directory>
/unravel/manager/config multinode add --help
Set the JAVA_HOME environment and increase the partitions of the Spark topic. This ensures you use the JVM provided by Unravel for the multi-node deployment.
Note
This has to be configured only when increasing the number of Spark consumers. Refer to Advanced Spark configurations. Preferably, consumers and partitions should be either in equal numbers, or the number of consumers can be half the number of partitions. By default, there are 8 partitions.
export JAVA_HOME=/
<unravel_installation_directory>
/unravel/versions/<version>
/java /<unravel_installation_directory>
/unravel/versions/<version>/kafka/bin/kafka-topics.sh --bootstrap-server localhost:4091 --alter --topic spark --partitions<count of partitions>
For example:
/opt/unravel/versions/<version>/kafka/bin/kafka-topics.sh --bootstrap-server localhost:4091 --alter --topic spark --partitions 16
Apply changes.
<Unravel installation directory>
/unravel/manager config applyStart Unravel
<Unravel installation directory>
/unravel/manager start
Multi-node setup on the Worker node
After you set up the multi-node configurations on the main node, do the following on the worker node:
Copy the
unravel.yaml
file from the main node to the worker node.For example:
scp /opt/unravel/data/conf/unravel.yaml example@myserver:/tmp
Install Unravel on worker node using
unravel.yaml
file that is copied from the main node. Refer Unravel Databricks installation >Manual installation section. Run the setup command as follows:<Unravel installation directory>
/unravel/versions/<Unravel version>/setup --config /tmp/unravel.yamlSet Unravel license. Also, refer to Setting Unravel license.
Set the Spark worker instance count using the manager command if it is not done from the main node. Refer to Enabling multiple daemon workers for high-volume data.
Set the count for Spark consumers. Unravel supports the processing of multiple records in parallel in a single Spark daemon. The number of Spark consumers defines how many records are processed simultaneously.
/
<Unravel installation directory>
/opt/unravel/manager config worker set spark_worker consumer_count<count>
For example:
/opt/unravel/manager config worker set spark_worker consumer_count 4
Apply changes.
<Unravel installation directory>
/unravel/manager config applyStart Unravel
<Unravel installation directory>
/unravel/manager start
Note
All the workspaces included in the main node must be registered on the worker node.
When you set up the multi-node on the Sparker worker node, all the workspaces on the main node are automatically registered with the worker node. However, if you have added or deleted a workspace on the main node later, you must register the workspace manually on the worker node. Refer to Importing workspaces to register workspace on a worker node.
Upgrading a multi-node deployment
Run the following command on both the main node and worker node to stop Unravel.
/
<unravel_installation_directory>
/unravel/manager stopUpgrade the main node.
<unravel_installation_directory>
/unravel/manager activate<unravel-version>
Start Unravel on the main node.
/
<unravel_installation_directory>
/unravel/manager startUpgrade the Spark worker node.
<unravel_installation_directory>
/unravel/manager activate<unravel-version>
Start Unravel on the Spark worker node.
/
<unravel_installation_directory>
/unravel/manager start
Importing workspaces to Spark worker node
When you set up the multi-node on the Sparker worker node, all the workspaces are automatically registered with the worker node. However, later if you add or remove a workspace from the main node using the UI, this will not get registered on the worker node. In such a scenario, you must import the workspaces on the worker node.
Copy the
unravel.yaml
file from the main node to the worker node. This registers all the workspaces onto the Sparker worker node.For example:
scp /opt/unravel/data/conf/unravel.yaml example@myserver:/tmp
In case any of the workspaces do not get registered on the Spark worker node, you can do the following:
Add all the workspaces in the
workspaces:
block ofunravel.yaml
file as shown:unravel: config: databricks: workspaces: ...
Otherwise, you can create a custom file with only workspaces and provide the file path to that custom file.
Specify the file path to
unravel.yaml
file or the custom file and run the following command:<Unravel installation directory>
/unravel/manager config databricks import/path/to/file