Prerequisites (Amazon EMR)
Platform
Each version of Unravel has specific platform requirements. Check Unravel's Unravel's Amazon EMR compatibility matrix to confirm your Amazon EMR platform meets the requirements for the Unravel version you are installing.
Hardware
EC2 instance type:
Minimum: r4.2xlarge (61 GiB RAM)
Maximum: r4.8xlarge (244 GiB RAM)
Recommended: r4.4xlarge (122 GiB RAM)
Virtualization type: HVM
Root device type: EBS
EBS volume specifications:
Minimum: 100GiB.
In a PoC or evaluation, the minimum root disk space should be sufficient.
When monitoring more EMR clusters or many jobs, we recommend a 300-500GB Provisioned IOPS SSD (io1) volume with 3000 IOPS.
For production use, we recommend a 200GB Provisioned IOPS EBS and RDS volume.
The Baseline IOPS (3 IOPS per GiB with a minimum of 100 IOPS, burstable to 3000 IOPS) is sufficient for Unravel
(Optional) RDS specifications:
DB instance class: db.r3.xlarge (4 vCPU, 30.5 GiB RAM)
Storage type: Provisioned IOPS (SSD)
Allocated storage: 200 GiB or above
Provisioned IOPS: 1000
Also, refer to setting up Amazon RDS.
Note
Unravel Server does not require heavy resources, but it's best to check your AWS Service Limits as you proceed. For example, if you provision an Unravel EC2 instance from our CloudFormation template, check Virtual Private Cloud (Amazon VPC) Limits.
Sizing
Important
You must have separate nodes for the Unravel server and the external database.
The minimum requirements for cores, RAM, and disk.
Access permissions
The Unravel EC2 instance must have read permission on the S3 bucket used by EMR clusters.
You need an AWS account. You must be able to connect to AWS for the deployment process.
Create an S3 ReadAccess only IAM role and assign it to Unravel Server to READ the archive logs on the S3 bucket configured for the EMR cluster. In other words, create an IAM role that contains the policy that can only READ the specific S3 bucket used on the EMR cluster; then, create an EC2 instance profile and add the IAM role to it.
AWS Permissions and Access
You must have permission to:
Create EC2 instances
Connect to EC2 instances
Install software on EC2 instances (you must have root access or "sudo root" permission in order to install the Unravel Server RPM)
Create security groups and IAM roles
Update IAM roles for the EMR cluster and the corresponding S3 storage
If you want to deploy Unravel for a new EMR cluster, you also need AWS permissions to create an EMR cluster and necessary S3 buckets, create and configure VPCs, etc.
Network
The following ports must be open on the Unravel EC2 instance. In addition, the Unravel EC2 instance must be able to access all ports on the EMR cluster.
To manage, monitor, and optimize the modern data applications running on your EMR cluster, Unravel needs data from the cluster as well as from apps running on the cluster. This data includes metrics, configuration information, and logs. Parts of this data are pushed to Unravel, and part of it is pulled by the daemons running on Unravel Server. For all the accessibility to all the data, there must be both inbound and outbound access between Unravel Server (on the EC2 instance) and the EMR cluster.
The Unravel Server must be in the same region as the target EMR cluster(s) it will be monitoring. There are two possible scenarios:
Both the EMR cluster and the Unravel Server are created on the same VPC, same subnet; and the security group allows all traffic from the same subnet.
The EMR cluster is located on a different VPC than Unravel Server. In this case you must configure VPC peering, route table creation, and update the security policy.
The Unravel Server needs a TCP and UDP connection to the EMR master node. To implement this, do either of the following:
Create a security group that allows port 3000 and port 4043 from the EMR cluster node's IP address. Configure the security group on Unravel Server to allow TCP traffic on ports 3000 for EMR cluster nodes.
Put the member of the security group used on the EMR cluster in this rule.
The Unravel Server and EMR cluster(s) must allow all outbound traffic.
EMR cluster nodes must allow all traffic from Unravel Server. If you can't allow Unravel Server to access all traffic, you must minimally allow Unravel Server to access cluster nodes' TCP port 8020, 50010, and 50020.
Port(s) | Direction | Description |
---|---|---|
3000 | Both | Non- HTTPS traffic to and from Unravel UI |
4043 | In | UDP and TCP ingest traffic from the entire cluster to Unravel Server(s) |
Skillset
These instructions assume that you are proficient in:
Provisioning EC2 instances and RDS instances
Creating and configuring the required IAM roles, security groups, and so on
Understanding AWS networking concepts such as virtual private clouds (VPCs), subnets, and so on
Running Ansible scripts, basic Unix commands, and AWS CLI commands
These instructions are self-contained and require only basic knowledge of AWS. Expert-level knowledge of AWS is not required.