Multi-cluster
This document describes the properties that are handled in a multi-cluster deployment.
Layout of multi-cluster configurations for a component:
com.unraveldata.<component>.list = Comma-delimited list of variables representing the component instances.
com.unraveldata.<component>.<variable>.<property> = Value
The variables are for configuration purposes only and do not need to correspond to any hostnames or other properties of the components. Acceptable characters: alphanumerics, hyphens, and underscores only.
Core node configurations for components
The following components are accessed from the Core node, where Unravel is installed.
Cloudera Manager (CM)
Ambari
Hive Metastore
Kafka
Pipeline (Workflows)
The following properties must be configured on the Core node in a multi-cluster setup.
You must set the following property when you install the core node on a server where there is no Hadoop configuration:
Edge node configurations for components
In a multi-cluster deployment for on-prem platforms, the following properties must be added to the Edge node server, for MR jobs to load jhist and logs, HDFS path for jhist/conf and yarn logs:
Property/Description | Default |
---|---|
com.unraveldata.min.job.duration.for.attempt.log Minimum duration of a successful application or which executor logs are processed (in milliseconds). | 600000 (10 mins) |
com.unraveldata.job.collector.log.aggregation.base HDFS path to the aggregated container logs (logs to process). Don't include the hdfs:// prefix. The log format defaults to TFile. You can specify multiple logs and log formats (TFile or IndexedFormat). Example: For HDP set this to: | /tmp/logs/*/logs/ |
com.unraveldata.job.collector.done.log.base HDFS path to "done" directory of MR logs. Do not include the For HDP set this to: | hdfs:///user/spark/applicationHistory/ |
The following properties must be added for Tez to the Edge node server in a multi-cluster deployment for on-prem platforms.