Prerequisites
To deploy Unravel, first ensure that your environment meets these requirements.
Important
You must use an independent host for the Unravel server.
This host must:
Be managed by Cloudera.
Have Hadoop clients pre-installed.
Have no other Hadoop service or third-party applications installed.
Accessible to only Hadoop and Unravel Admins.
Platform
Each version of Unravel has specific platform requirements. Check the compatibility matrix to confirm that your cluster meets the requirements for the version of Unravel that you are installing. Your CDP environment must be running Cloudera Manager (CM).
Sizing
Software
If the Unravel host is running Red Hat Enterprise Linux (RHEL) 6.x, set its bootstrap.system_call_filter to
false
inelasticsearch.yml
:bootstrap.system_call_filter: false
libaio.x86_64
is installed.PATH
includes the path to the HDFS+Hive+YARN+Spark client/gateway, Hadoop commands, and Hive commands.If Spark2 service is installed, the Unravel host should be a client/gateway.
Zookeeper is not installed on the same host as the Unravel host.
NTP is running and in-sync with the cluster.
Permissions
Tip
The installation creates a local user unravel:unravel
, but you can change this later.
You must have root access or "sudo root" permission to install the Unravel Server RPM.
If you're using Kerberos, we'll explain how to create a principal and keytab for Unravel daemons to use to access these HDFS resources:
YARN's log aggregation directory (
hdfs://tmp/logs
)Spark and Spark2 event logs (
hdfs://user/spark/applicationHistory
andhdfs://user/spark/spark2ApplicationHistory
)File and partition sizes in the Hive warehouse directory (typically
hdfs://apps/hive/warehouse
)
Unravel needs access to the YARN Resource Manager's REST API (so that the principal can determine which resource manager is active).
Unravel needs access to the JDBC access to the Hive Metastore. Read-only access is sufficient.
If you're using Impala, Unravel needs access to the Cloudera Manager API. Read-only access is sufficient.
Network
For HDFS, access to the NameNode and DataNode should be provided. The default value for NameNode is 8020 , and that of DataNode is 9866 and 9867. However, these can be configured to any other ports.
Services | Default port | Direction | Description |
---|---|---|---|
NameNode | 8020 | Both | Traffic to/from the cluster to Unravel servers. |
DataNode | 9866,9867 | Both | Traffic to/from the cluster to Unravel servers. |