- Home
- Unravel 4.7.5.x Documentation
- Installation
- Multi-cluster installation (On-prem)
- Prerequisites - Multi-cluster (On-prem)
Prerequisites - Multi-cluster (On-prem)
To deploy Unravel, ensure that your environment meets these requirements:
Each version of Unravel has specific platform requirements. Check the compatibility matrix to confirm that your cluster meets the requirements for the version of Unravel that you are installing.
In a multi-cluster deployment of Unravel, you must fulfill the following requirements for the host on the core node and edge node:
Core node
Accessible to Unravel Admins.
The server should be dedicated only to Unravel. Must have no other Hadoop service or third-party applications installed.
Edge node
Be managed by Ambari/Cloudera.
Must have Hadoop clients pre-installed.
Must have no other Hadoop service or third-party applications installed.
Accessible to only Hadoop and Unravel Admins.
PATH
includes the path to the HDFS+Hive+YARN+Spark client/gateway, Hadoop commands, and Hive commands.Clock synchronization service (such as NTP) is running and in sync with the cluster.
Database connectivity
Ensure to fulfill the following prerequisites for database connectivity:
MySQL
Create a
mysql
directory in/tmp
. Provide permissions and make them accessible to the user who installs Unravel.Download the following tar files to
/tmp/mysql
directory:mysql-5.7.27-linux-glibc2.12-x86_64.tar.gz
mysql-connector-java-<version>.tar.gz
For an external MySQL database, add the JDBC connector to
/tmp/<MySQL-directory>/<jdbcconnector>
directory. This can be either a tar file or a jar file.
MariaDB
Create a
mariadb
directory in/tmp
. Provide permissions and make them accessible to the user who installs Unravel.Download the following tar files to
/tmp/mariadb
directory:mariadb-10.4.13-linux-x86_64.tar.gz
mariadb-java-client-2.6.0.jar
For external MariaDB, add the JDBC connector to
/tmp/<MariaDB-directory>/<jdbcconnector>
directory. This can be either a tar file or a jar file.
Unravel managed database service
Ensure to install the following for fulfilling the OS level requirements for Unravel managed database service.
numactl-libs (for libnuma.so)
libaio (for libaio.so)
Minimum requirements to install Unravel:
Cores: 8
RAM: 96 GB
The following tables provide estimated sizing details for default data retention and lookback settings. For calculating accurate server requirements for your environment, contact Unravel Support.
The Unravel server/ core node sizing table lists the estimated server requirements in a typical environment.
The Unravel data server sizing table lists the estimated server requirements for a data server.
Tip
In production environments, you can keep the Unravel software and Data directory on separate disks. Unravel recommends putting data on high bandwidth/low latency networks.
Table 3. Unravel server/ core node sizingYARN jobs per day
Impala jobs per day
vCores
RAM
I/O
Software
Retention
Storage
ES
MySQL
ES
MySQL
Less than 50,000
Less than 50,000
1 x 8 vCores
96 GB
15K IOPS
8 GB free
6 months
6 months
0.5 TB
1.5 TB
100,000
100,000
1 x 16 vCores
128 GB
15K IOPS
8 GB free
6 months
30 days
1 TB
3 TB
200,000
200,000
1 x 24 vCores
192 GB
15K IOPS
8 GB free
6 months
30 days
2 TB
6 TB
400,000
400,000
1 x 56 vCores
256 GB
15K IOPS
8 GB free
6 months
30 days
4 TB
10 TB
Table 4. MySQL server sizing tableYARN jobs per day
Impala jobs per day
vCores
RAM
I/O
Retention
Storage
Less than 50,000
Less than 50,000
16 vCores
96 GB
10K IOPS
30 days
1.5 TB
100,000
100,000
24 vCores
128 GB
10K IOPS
30 days
3 TB
200,000
200,000
24 vCores
192 GB
10K IOPS
30 days
6 TB
400,000
400,000
36 vCores
256 GB
10K IOPS
30 days
10 TB
Notice
If specific features such as Kafka and HBase monitoring are enabled, the memory requirements will increase and vary from the above table.
Architecture: x86_64
vm.max_map_count
is set to262144
nproc limit is set to unlimited.
Add
unravel.conf
file with the following settings to/etc/security/limits.d
:unravel soft nproc unlimited
A dedicated node is not required for Unravel. The existing applications can be shared as long as there are adequate resources. In case there are no nodes available for sharing, then a VM with the following requirements can be used:
VCores: 8
RAM: 32 GB
Disk: 50 GB
Core node
Create an Installation directory and grant ownership of the directory to the user who installs Unravel. This user executes all the processes involved in running Unravel.
If you are using Kerberos, you must create a principal and keytab for Unravel daemons to use.
Unravel needs access to the YARN Resource Manager's REST API.
Unravel needs read-only access to the database used by the Hive metastore.
Unravel users should have read-only access to hive server2.
URL and credentials of the Cluster Manager (Cloudera/Ambari).
Edge node
Create an Installation directory and grant ownership of the directory to the user who installs Unravel. This user executes all the processes involved in Unravel installation.
If you are using Kerberos, you must create a principal and keytab for Unravel daemons to use.
URL and credentials of the Cluster Manager (Cloudera/Ambari).
Unravel must have read access to these HDFS resources:
MapReduce logs (
hdfs://user/history
)YARN's log aggregation directory (
hdfs://tmp/logs
)Spark and Spark2 event logs (
hdfs://user/spark/applicationHistory
andhdfs://user/spark/spark2ApplicationHistory
)File and partition sizes in the Hive warehouse directory (typically
hdfs://apps/hive/warehouse
)
Unravel needs access to the YARN Resource Manager's REST API.
Unravel needs read-only access to the database used by the Hive metastore.
Unravel users should have read-only access to hive server2.
Unravel users should keep the cluster access ID handy.
If you plan to use Unravel's move or kill AutoActions, the Unravel username needs to be added to YARN's yarn.admin.acl property.
If you're using Impala, Unravel needs access to the Cloudera Manager API. Read-only access is sufficient.
Note
You can customize all the Unravel ports. Refer to Configuring custom ports.
On the new node, open the following ports.
Port(s) | Direction | Description |
---|---|---|
3000 | Both | Traffic to and from Unravel UI.
|
3316 | Both | Database traffic. This can be MySQL or PostgreSQL database traffic. For MySQL, the default port is 3306. For PostgreSQL, the default port is 5432. |
4020 | Both | Unravel APIs. This is an internal. |
4021 | Both | Host monitoring of JMX on |
4043 | In | UDP and TCP ingest traffic from the entire cluster to Unravel Server(s). Unravel host listens on this port. The cluster nodes should be able to send traffic to the Unravel host's 4043 port. |
4044-4049 | In | UDP and TCP ingest spares for |
4091-4099 | Both | Kafka brokers. Refers to Unravel internal Kafka service. |
4171-4174, 4176-4179 | Both | ElasticSearch; localhost communication between Unravel daemons or Unravel Servers in a multi-host deployment. Refers to Unravel internal Elasticsearch. |
4181-4189 | Both | Zookeeper daemons. Refers to Unravel’s internal zookeeper service |
4210 | Both | Cluster access service. This is only required for multi-cluster setup. |
5432 | Database traffic. The default port for PostgreSQL. | |
HDFS ports | Both | Traffic to/from the cluster to Unravel Server(s). Unravel server should be able to access the following HDFS ports:
|
Hive metastore port | Out | For YARN only. Traffic from Hive to Unravel Server(s) for partition reporting. |
8080/8088 | Out | Traffic from Unravel Server(s) to the Resource Manager (RM) API. Unravel host server should be able to access these ports on RM. |
11000 | Out | For Oozie only. Traffic from Unravel Server(s) to the Oozie server. Unravel host needs to be able to access this port on the Oozie service. |
For HDFS, you must provide access to the NameNode and DataNode. The default value for NameNode is 8020 , and that of DataNode is 9866 and 9867. However, these can be configured to any other ports.
Services | Default port | Direction | Description |
---|---|---|---|
NameNode | 8020 | Both | Traffic to/from the cluster to Unravel servers. |
DataNode | 9866,9867 | Both | Traffic to/from the cluster to Unravel servers. |
CDH specific port requirements
Port(s) | Direction | Description |
---|---|---|
3000 | Both | Traffic to and from Unravel UI If you plan to use Cloudera Manager to install Unravel's sensors, the Cloudera Manager service must also be able to reach the Unravel host on port 3000. |
7180 (or 7183 for HTTPS) | Out | Traffic from Unravel Server(s) to Cloudera Manager |