Prerequisites
Platform
Each version of Unravel has specific platform requirements. Check the Compatibility Matrix for Google Cloud (Dataproc) to confirm that your Dataproc platform meets the requirements for the Unravel version you are installing.
Hardware
Compute Engine GCE type: General-purpose:
Minimum: n2-standard-16 / n1-standard-16 (64 GiB RAM)
Maximum: n2-standard-64 / n1-standard-64(256 GiB RAM)
Recommended: n2-standard-32 / n1-standard-32 (128 GiB RAM)
Virtualization type: HVM
Root device type: Standard Persistent Disk / SSD persistent disks
Volume specifications:
Minimum: 200GiB.
In a PoC or evaluation, the minimum root disk space should be sufficient.
When monitoring more Dataproc clusters or lots of jobs, we recommend a 300-500GB SSD persistent disks that can handle high rates of IOPS
For production use, we recommend 500GiB SSD persistent disks.
The Baseline IOPS (3 IOPS per GiB with a minimum of 100 IOPS, burstable to 3000 IOPS) is sufficient for Unravel.
Note
Unravel Server doesn't require heavy resources, but it's best to check your Dataproc Quotas as you proceed.
Sizing
Important
You must have separate nodes for the Unravel server and for the external database.
The minimum requirements for cores, RAM, and disk.
Software
Operating system: RedHat/CentOS 6.4 - 7.4
Network
The following ports must be open on the Unravel GCE. In addition, the Unravel GCE must be able to access all ports on the Dataproc cluster.
In order to manage, monitor, and optimize the modern data applications running on your Dataproc cluster, Unravel needs data from the cluster as well as from apps running on the cluster. This data includes metrics, configuration information, and logs. Parts of this data is pushed to Unravel, and part of it is pulled by the daemons running on Unravel Server. In order for all data to be accessible, there must be both inbound and outbound access between Unravel Server (on the GCE) and the Dataproc cluster.
The Unravel Server must be in the same region as the target Dataproc clusters it is monitoring. There are two possible scenarios:
Both the Dataproc cluster and the Unravel server are created on the same VPC, same subnet; and the security group allows all traffic from the same subnet.
The Dataproc cluster is located on a different VPC than the Unravel server. In this case you must configure VPC peering, route table creation, and update the firewall policy.
The Unravel Server needs a TCP and UDP connection to the Dataproc master node. To implement this, do either of the following:
Create a firewall rule that allows port 3000 and port 4043 from the Dataproc cluster node's IP address. Configure the firewall rule on Unravel Server to allow TCP traffic on ports 3000 for Dataproc cluster nodes.
Put the member of the firewall rule used on the Dataproc cluster in this rule.
The Unravel Server and Dataproc clusters must allow all outbound traffic.
Dataproc cluster nodes must allow all traffic from Unravel Server. If you can't allow the Unravel server to access all traffic, you must minimally allow it to access the cluster nodes' TCP ports 9870, 9866, and 9867
Ports | Direction | Description |
---|---|---|
3000 | Both | Non-HTTPS traffic to and from Unravel UI. |
4043 | In | UDP and TCP ingest traffic from the entire cluster to Unravel Servers. |
Skill set
These instructions are self-contained and require only basic knowledge of GCP. You don't need to create any scripts or be familiar with any specific programming or scripting language.
These instructions assume you're proficient in:
Provisioning GCEs.
Creating and configuring the required IAM roles, firewall rules, etc.
Understanding GCP networking concepts such as virtual private clouds (VPCs) and subnets.
Running Ansible scripts, basic Unix commands, and AWS CLI commands.