v4.7.3.0 Release notes
Software version
Release Date: 28/Jan/2022
See 4.7.3.0 for download information.
Software upgrade support
Fresh installations are supported along with the following upgrade path:
v4.7.0.x, v4.7.1.x, v4.7.2.x → v4.7.3.0
v4.6.2.x → v4.7.3.0
v4.6.1.9 → v4.7.3.0
v4.6.1.8 or earlier → v4.6.1.9 → v4.7.3.0
Refer to Upgrading Unravel server for instructions to upgrade to Unravel 4.6.1.9 version.
Refer to Upgrading Unravel for instructions to upgrade to Unravel 4.7.3.0 version.
Refer to Installing Unravel for fresh installations.
Sensor upgrade
Sensor upgrade is mandatory.
Refer to Upgrading Sensors.
Certified platforms
The following platforms are tested and certified in this release:
Cloudera Distribution of Apache Hadoop (CDH)
Cloudera Data Platform (CDP)
Hortonworks Data Platform (HDP)
Amazon Elastic MapReduce (EMR)
Databricks (Azure Databricks and AWS Databricks)
Google Cloud Platform (Dataproc, BigQuery)
Review your platform's compatibility matrix before you install Unravel.
Updates to Unravel's configuration properties
Refer to 4.7.x - Updates to Unravel properties.
New features
App store
The App store page is added from where you can manage all the Unravel apps. From the App store, you can install an app, run the administrative tasks for managing your apps, navigate to different apps, open and run the apps.
Billing service - EMR
The Billing service is now supported for the EMR platform. The Billing tab in Unravel shows the charges of Unravel for its support for EMR. The following pricing plans are supported.
pay-as-you-go
As per this plan, Unravel tracks the number of instance hours that the user has incurred and showed the charges based on the usage.
pay-in-advance
As per this plan, you can pay in advance for a specific number of Instance Hours). Credits will be taken based on the usage, with the remaining credits shown on the Billing tab daily. Thus users can monitor when they run out of credits.
Cost page
From the Cost page, you can govern your spending for the cloud and optimize your jobs to manage the cost. This page is accessible only for users with admin or read-only admin permissions. The cost page has the following sections:
Trends
You can view the trends of DBUs, cost (in $), and the number of clusters, for the total usage. These trends help you to identify periods with anomalies, (for example a sudden spike in the cost).
Chargeback
You can view the Chargeback cost attributed to cluster creators and job run creators for the specified time range and filters, which are organized based on the selected Group by option.
Budget
The budget page helps you to control costs by comparing the target budget to the actual incurred cost. You can set a target budget (DBUs) based on workspaces, users, clusters, and tags and check if an incurred cost is approaching or has already exceeded the target budget.
Support for Delta Lake on Databricks
You can configure Unravel to fetch the metadata of the Delta tables and monitor them from the Data page on Unravel UI.
Google Cloud BigQuery monitoring (Private Preview)
A private preview of support for BigQuery, which includes:
Observability
View the jobs running on the cluster
View details of errors encountered by the jobs
Governance
Chargeback view of resources used in BigQuery.
Associating jobs with business priority tags
Optimization
Analysis of the job execution based on the resource usage and time spent
Visibility into data/tables usage (hot, warm, cold)
Interactive Precheck
Interactive Precheck can be used to validate configuration before installing unravel and reuse that information to bootstrap an Unravel install. The software will guide you through a series of questions to verify your environment and get the basic configuration information and verify it.
Interactive precheck is available as a standalone package and as part of the full unravel package.
Multi-cluster support
Multi-cluster support is now available for the Dashboard app on the App store.
Multi-cluster support is now available for Notebooks.
Reports
TopX reports are now supported on EMR and Databricks platforms.
Support multiple LDAP servers
Unravel now supports users on multiple LDAP servers.
Improvements and enhancements
Applications
RM tracking URL is shown under the Logs tab of the application details page. (ASP-1300)
Live resource metrics on the Application details page for running applications. (ASP-1270)
The application name is shown on the Spark application details page.
Databricks
Cloud Clusters page renamed to Compute. (DT-1017)
Capture cost of untracked Databricks clusters. (DT-964)
Capture cost of a Databricks cluster in a running state (DT-965)
Under the new Compute tab, the UIX is improved to have pages for each cluster showing metadata, KPI, analysis, and trends.
Unravel Init script for Databricks as global init script. (DT-1017)
Support Chargeback by Databricks tags. (DT-967, CUSTOMER-1881)
Azure AD: Microsoft Graph API: fetch group names API fetches only the required fields. (CDI-333)
Healthcheck
Healthcheck implementation for App store: Check if appstore is running. (APP-490)
Impala
Properties added that let you control the following timeouts for Impala CM connector:
HTTP connection time out for Impala connector.
HTTP read time out for Impala connector
HTTP client backoff time for Impala connector. Time for which HTTP client sleeps before reattempting after unsuccessful read attempt. (ASP-1354)
Properties added that can let you specify the number of retries that will be done to fetch the profile tree for any Impala query. (DOC-1066)
Insights
Add links to Operator and Stage ID for all Insight events. (INSIGHTS-219)
Process the Metadata file and update the Table info table. (INSIGHTS-198)
Migration
Handle multiple name services for HDFS connector. (MIG-180)
Add cdp-7.1.7 service definitions for Migration reports. (MIG-177)
Cluster Discovery report
Show jobs with unknown queues. (MIG-160)
Workload Fit report
DFS and Non-DFS: Minimum storage selected should be the default storage. (MIG-156)
Cloud Mapping Per Host report
Format numbers for cost values. (MIG-138)
Platform
Log Receiver (LR): use DocumentStorage by default instead of the file system. (CDI-329)
Log processing for Spark and Tez moved to a task_worker for better performance. (ASP-1212)
Reports
Support customizable ports for ondemand. (REPORT-1477)
RBAC
Move LDAP APIs to datastore. (RBAC-64)
Support for custom roles and permissions (RBAC-29)
Define custom roles beyond the default - admin, read-only, and user.(RBAC-54)
Define views that a role can see. (RBAC-60)
Define data filters to apply. You can choose from user tags, app tags, app data fields and even write an es query filter to meet your requirements. (RBAC-68, RBAC-69, RBAC-70, RBAC-71)
Generate user tags using user tagging script. (RBAC-73)
Spark
Storage and performance for Spark SQL data. (ASP-667)
UI
Add headers for Manage pages. (UIX-4449)
The Last 1 hour filter is fixed for the Databricks Compute page. (UIX-4438)
App store icon placement enhancement. (UIX-4435)
Bring back the API Token in the User profile to copy the current token. (UIX-4423)
Move Manage page items to the top-right header section on Unravel UI. (UIX-4414)
Show Platform information in the Help Center dropdown. (UIX-4411)
Show size column in data table and size KPI in the Data details page for Databricks cluster. (UIX-4404)
Lint fixes for Manage views #2. (UIX-4074)
Upgraded the software to use NodeJS 14.17.6. (UIX-3955, UIX-3786)
`Support to provide feedback within the product. (UIX-3880)
Spark App Name/Id should be displayed on the Spark application page. (UIX-2137)
Utility Upgrade
Upgrade log4j2 from 2.17.0 to 2.17.1. (CDI-419)
Upgrade Kafka from 2.2.0 to 3.0.0. (CDI-384)
Unsupported
Unravel does not support Billing on-prem platforms.
On the Data page, File Reports, Small File reports, and file size information are not supported for MapR, and cloud (EMR, Databricks, GCP) clusters.
Impala jobs are not supported on the HDP platform.
Monitoring the expiration of the SSL Certificates and Kerberos principals in Unravel multi-cluster deployments.
MapR
The following features are not supported for MapR:
Impala applications
Kerberos
The following features are supported on the Data page:
Forecasting
Small Files
File Reports
The following reports are not supported on MapR:
File Reports
Small Files Report
Capacity Forecasting
Migration Planning
The Tuning report is supported only for MR jobs.
Migration Planning
AutoAction is not supported for Impala applications
Migration
Billing
Insights Overview
Migration Planning is not supported for the following regions for Azure Data Lake:
Germany Central (Sovereign)
Germany Northeast (Sovereign)
Forecasting and Migration: In a multi-cluster environment, you can configure only a single cluster at a time. Hence reports are generated only for that single cluster.
Migration Planning is not supported for MapR.
Unravel does not support multi-cluster management of combined on-prem and cloud clusters.
Unravel does not support apps belonging to the same pipeline in a multi-cluster environment but is sourced from different clusters. A pipeline can only contain apps that belong to the same cluster.
All the reports, except for the TopX report, are not supported on Databricks and EMR.
Memory and CPU usage metrics are not supported for TopX reports on Databricks.
In Jobs > Sessions, the feature of applying recommendations and then running the newly configured app is not supported.
Pig and Cascading applications are not supported.
Bug fixes
Applications
The computation of several output rows under the App Summary > SQL tab is incorrect. (ASP-1088)
On the Chargeback page, no applications are listed in the table. (CD1-429)
Tez apps with insights do not show the Insight icon on the job listing page. (INSIGHTS-113)
Impala
Impala pipeline improvements to retry for CM API failures to return query profile data. (ASP-1322)
Insights
The table involved in the join parameter is null if the operator is joining the output of other joins. (INSIGHTS-138)
Recommendation to set 'hive.exec.reducers.bytes.per.reducer' value keeps oscillating. (INSIGHTS-153)
Incorrect dot displayed after the DAG ID in App Summary page > Analysis tab for a Hive query. (INSIGHTS-215)
NullPointerException (ERROR events.SparkEvents: Event for generator com.unraveldata.spark.events.SparkSQLEventGenerator could not be generated due to {} java.lang.NullPointerException at SparkEvents.generateEvents()) is resolved. (INSIGHTS-227)
Eradicated duplicate table names involved in InefficientJoinConditionEvent. (INSIGHTS-236)
Migration
PDFs downloaded for migration are incomplete. (MIG-153)
Workload Fit report:
The error message is shown on UI when the report is running. (MIG-154)
Cost is not displayed for Azure (Australia Central). (MIG-178)
The pie chart does not work appropriately. (MIG-223)
Cloud Mapping per Host report:
Host resource usages are shown as 0 for HDP. (MIG-182)
An error message is not shown when the backend returns 500 Internal Server Error. (MIG-176)
Cluster Discovery report: The pie chart shows some issues. (MIG-211)
The costs available in AWS EC2 and the cost in Unravel are different. (MIG-231)
Workflow/Jobs page displays empty for Analysis, Resources, Daggraph, and Errors tab. (DT-1093)
Event logs and YARN logs are not loaded for some applications in Google Dataproc clusters. (PG-170)
Incorrect data is displayed in the Number of Queries KPI/Trend graph on the Overview page. (DATAPAGE-502)
Create time of partitions does not get captured in hive metastore if the partition is created dynamically. This limits Unravel to show Last Day KPIs for the partition section.
Wrong data displayed for Number of Partitions Created KPI/trend graph under Partitions KPIs - Last Day section in theData page. (DATAPAGE-473)
Table names are not captured properly in some scenarios for Databricks runtime 8.x and above. (PG-252)
DataBricks jobs are being missed intermittently in Unravel. (PG-232)
Google Cloud Datapro: Executor Logs are not loaded for spark applications. (PG-229)
Exception: Problem when retrieving bootstrap actions for cluster is seen in the aws_worker daemon logs.
Workaround: While creating an AWS account for EMR Chargeback/Insights overview feature, you must include an additional entry in the
Policy JSON
file for"elasticmapreduce:ListBootstrapActions"
, as follows:{ “Version”: “2012-10-17", “Statement”: [ { “Effect”: “Allow”, “Action”: [ “pricing:GetProducts”, “elasticmapreduce:ListClusters”, “elasticmapreduce:DescribeCluster”, “elasticmapreduce:ListInstanceFleets”, “elasticmapreduce:ListInstanceGroups”, “elasticmapreduce:ListBootstrapActions“, “elasticmapreduce:ListInstances”, “ec2:DescribeSpotPriceHistory” ], “Resource”: “*” } ] }
Even though the AWS account was already created without this entry (
elasticmapreduce:ListBootstrapActions
), you can always include this policy later.
Unravel node fails to send email notifications. (INSTALL-1694)
The Insights Overview tab uses UTC as the timezone while other pages use local time. Hence, the date and time that are shown on the Insights Overview tab and the other pages after redirection can be different. (UIX-4176)
Kerberos can only be disabled manually from the
unravel.yaml
file.kerberos: enabled: False
Cloud Mapping Per Host report: Failure to get instance list for certain cloud providers. (MIG-171)
Workaround:
Run dbcli.
<Unravel installation directory>/unravel/manager run dbcli
Make the following change in the database schema:
ALTER TABLE celery_taskmeta CHANGE COLUMN result result MEDIUMBLOB;
Cluster discovery
If the metric retrieval for a host fails, then the CPU and memory capacity/usage graphs and heatmaps are not displayed.
This happens on a CDH cluster when the Cloudera Manager agent of a host does not send any heartbeats to the Cloudera Manager server. Such a host is shown as Bad Health in Cloudera Manager. (REPORT-1706)
Workaround: Ensure that the Cloudera Manager agent sends heartbeats to the Cloudera Manager on all hosts and that none of the hosts are shown as Bad Health.
The On-prem Cluster Identity may show an incorrect Spark version on CDH. The report may incorrectly show Spark 1 when Spark 2 is installed on the CDH cluster. (REPORT-1702)
When using PostgreSQL, the % sign is duplicated and displayed in the Workload Fit report > Map to single cluster tab. (MIG-42)
Cloud Mapping Per Host report scheduled in v4.6.1.x will not work in v4.7.1.0. Users must schedule a new report. (REPORT-1886)
The TopX report email contains a link to the Unravel TopX report instead of showing the report content in the email as in the old reports.
Queue analysis: The log file name (
unravel_us_1.log
) displayed in the error message is incorrect. The correct name of the log file isunravel_sensor.log
. (REPORT-1663)
The sensor setup script fails with unrecognized arguments. (INSTALL-1667)
There is a lag seen for SQL Streaming applications. (PLATFORM-2764)
If the customer uses an active directory for Kerberos and the samAccountName and principal do not match, this can cause errors when accessing HDFS. (DOC-755)
In AAD login mode when external logout happens, the user still has access to his current logged-in UI. (UIX-4125)
For PySpark applications, the processCPUTime and the processCPULoad are not captured properly. (ASP-626)
SQL events generator generates SQL Like clause event if the query contains a like pattern even in the literals. (TEZLLAP-349)
Notebooks will not work after upgrading to v4.7.1.0. You can configure them separately. (REPORT-1895)
In case you have configured a single cluster deployment for Unravel and the cluster name is not default, then the Datapage feature may not work properly.
For this, you must explicitly set the following property after upgrading. (INSTALL-2151)
<Unravel installation directory>/unravel/manager stop <Unravel installation directory>/unravel/manager config properties set hive.metastore.cluster.ids=
<cluster-name>
<Unravel installation directory>/unravel/manager apply <Unravel installation directory>/unravel/manager startAfter you upgrade from v4.6.x to v4.7.1.0, the Tez application details page does not initially show DAG data. The DAG data is visible only after you refresh the page. (ASP-1126)
On the Manage page, the DB Stats are not displayed for untracked clusters. (UIX-4171)
The new user interface (UI) can be accessed only from Chrome.
In the App summary page for Impala, the Query> Operator view is visible after scrolling down. (UIX-3536).
Jobs getting falsely labeled as a Tez App for Oozie Sqoop and Shell actions. (PLATFORM-2403)