Detecting apps using resources inefficiently
As a cluster operator, you will want to surface apps that use their resources inefficiently. You can then help developers remediate a app's issue and simultaneously improve your cluster's health.
One of the unique aspects of Unravel is its insights into an app's run and the recommendations it provides to improve an app's efficiency. Unravel's architecture deploys sensors that are hooked into running YARN container JVMs. These sensors capture actual memory usage which you can then compare to the amount of YARN memory allocated by developers, (memory allocation per YARN) x (number of YARN containers).
For a given app, we can see JVM memory consumption on the Resource tab in the apps APM. Select the metric VmRss (Virtual memory resident set size) for this information. For Spark apps, in the following example, these time series could be compared against spark.executor.memory and spark.driver.memory.
Operations > Dashboard contains a list of Inefficient Applications, if any. The list is filtered by App type and then the type of inefficiency (Event Name). This information is also available by a Rest API.
Select the tab for the app type you want to examine and a list of every event (Event Name) apps of that type have experienced. The number of apps experiencing the event is also listed. The table is sorted in descending order on # of App Found column. Click the event type you are interested in. A table of all the apps that experienced the event is displayed. The event type is noted in the upper left-hand corner. In this example, UnderutilizedNodeMemoryEvent was selected. This indicates a low utilization of memory resources by the apps.
Unravel notes when it has tuning suggestions for the app by a glyph. Sort the list of this column to see all the apps that have recommendations.
Click the app to bring up its APM. Although you are examining an app based on the event you selected, an app can have multiple events. In this case the app has four events and four recommendations. Click the event box to see Unravel's recommendation and where the app used resources inefficiently. The Recommendations tab just lists the recommendations for quick access. The Efficiency tab lists all the events the app experienced and the recommendations, if any, for the event. For the following app, Unravel recommends adjusting the spark.executor.memory, spark.default.parallelism, spark.yarn.executor.memoryOverhead, and spark.executor.instance properties. The developer can implement these recommendations as --conf
arguments when using the spark-submit
command.
Each app type has events specific to the app type, and the recommendations are implemented based upon the app type.