Monitoring individual Spark apps
This topic explains how to set up per-appmonitoring of Spark (also called "dev mode"). This is different from cluster-wide monitoring. To monitor individual Spark apps, you must submit them through spark-submit.
The information here applies to Spark versions 1.5.x through 3.0.x.
Note
Spark 3.0 version is supported from Unravel version v4.6.1.6 onwards.
unravel-host
must be a fully qualified domain name or IP address.
Get Unravel's Spark sensor.
The sensor is included in the Unravel Server RPM installation. After installing the Unravel Server RPM on
unravel-host
, obtain the sensor either from the file system on the Unravel Server host (/usr/local/unravel/webapps/ROOT/hh/unravel-agent-pack-bin.zip
), or fromhttp://
.unravel-host
:3000/hh/unravel-agent-pack-bin.zipIf you run Spark apps in YARN-cluster mode (default):
Put the sensor on the host node(s) from which you will run spark-submit by first creating a destination directory that is readable by all users.
Tip
We suggest that
unravel-sensor-path
be/usr/local/unravel-spark
.If
spark-submit
is used from a single client node:mkdir
unravel-sensor-path
cdunravel-sensor-path
wget http://unravel-host
:3000/hh/unravel-agent-pack-bin.zipIf
spark-submit
is used from multiple client nodes, copy the sensor .zip file to HDFS instead of copying it to every client node, and setUNRAVEL_SENSOR_PATH
accordingly. For example, copy it tohdfs:///tmp
:mkdir
unravel-sensor-path
cdunravel-sensor-path
wget http://unravel-host
:3000/hh/unravel-agent-pack-bin.zip cdunravel-sensor-path
hdfs fs -copyFromLocal unravel-agent-pack-bin.zip /tmp set UNRAVEL_SENSOR_PATH="hdfs:///tmp"
Define
spark.driver.extraJavaOptions
andspark.executor.extraJavaOptions
as part of your spark-submit command.Substitute your local values for:
unravel-sensor-path
: Parent directory of the Unravel Sensor .zip file,unravel-agent-pack-bin.zip
. If you put this file on HDFS,unravel-sensor-path
is the parent directory on HDFS.unravel-host-ip-port
: IP address and port of theservice in the format
ip:port
. The default port is 4043. Sample value:10.0.0.142:4043
.spark-event-log-dir
: Location of the event log directory on HDFS, S3, or local file system. If a remote address is used, include the name node IP address and port.spark-sample-jar-path
: Absolute path to the jar file used in the spark-submit command.spark-version
: Spark version to be instrumented. Valid options are 1.5 for Spark 1.5.x, 1.6 for Spark 1.6.x, 2.0 for Spark 2.0.x, 2.1 for Spark 2.1.x, 2.2 for Spark 2.2.x, 2.3 for Spark 2.3.x, 2.4 for Spark 2.4.x and 3.0 for Spark 3.0.x.
export UNRAVEL_SENSOR_PATH=
unravel-sensor-path
export UNRAVEL_SERVER_IP_PORT=unravel-host-ip-port
export SPARK_EVENT_LOG_DIR=spark-event-log-dir
export PATH_TO_SPARK_EXAMPLE_JAR=spark-sample-jar-path
export SPARK_VERSION=spark-version
export ENABLED_SENSOR_FOR_DRIVER="spark.driver.extraJavaOptions=-javaagent:unravel-agent-pack-bin.zip/btrace-agent.jar=libs=spark-$SPARK_VERSION,config=driver" export ENABLED_SENSOR_FOR_EXECUTOR="spark.executor.extraJavaOptions=-javaagent:unravel-agent-pack-bin.zip/btrace-agent.jar=libs=spark-$SPARK_VERSION,config=executor" spark-submit \ --class org.apache.spark.examples.sql.RDDRelation \ --master yarn-cluster \ --archives $UNRAVEL_SENSOR_PATH/unravel-agent-pack-bin.zip \ --conf "$ENABLED_SENSOR_FOR_DRIVER" \ --conf "$ENABLED_SENSOR_FOR_EXECUTOR" \ --conf "spark.unravel.server.hostport=$UNRAVEL_SERVER_IP_PORT" \ --conf "spark.eventLog.dir=${SPARK_EVENT_LOG_DIR}" \ --conf "spark.eventLog.enabled=true" \ $PATH_TO_SPARK_EXAMPLE_JAR
If you run Spark apps in YARN-client mode:
To intercept Spark apps running in
yarn-client
mode, you need to unzip the Unravel Sensor .zip file on the client node at a location readable by all users, referred to asunzipped-archive-dest
below. We suggest/usr/local/unravel-spark
.Important
Please keep the original
unravel-agent-pack-bin.zip
file insideunzipped-archive-dest
If you use multiple hosts as clients, on each client.
mkdir
unzipped-archive-dest
cdunzipped-archive-dest
wget http://UNRAVEL_HOST_IP
:3000/hh/unravel-agent-pack-bin.zip unzip unravel-agent-pack-bin.zipDefine spark.executor.extraJavaOptions as part of your spark-submit command.
To use the example below, substitute your local values for:
unzipped-archive-dest
: directory of the unzipped Unravel Sensor files.unravel-host-ip-port
: IP address and port of thelog_receiver
service in the formatip:port
. Port is 4043 by default. Sample value:10.0.0.142:4043
.spark-event-log-dir
: Location of the event log directory on HDFS, S3, or local file system. If a remote address is used, include the namenode IP address and port.spark-sample-jar-path
: Absolute path to the jar file used in thespark-submit
command.spark-version
: Spark version to be instrumented. Valid options are 1.5 for Spark 1.5.x, 1.6 for Spark 1.6.x, 2.0 for Spark 2.0.x, 2.1 for Spark 2.1.x, 2.2 for Spark 2.2.x, 2.3 for Spark 2.3.x, 2.4 for Spark 2.4.x and 3.0 for Spark 3.0.x.
export UNZIPPED_ARCHIVE_DEST=
unzipped-archive-dest
export UNRAVEL_SERVER_IP_PORT=unravel-host-ip-port
export SPARK_EVENT_LOG_DIR=spark-event-log-dir
export PATH_TO_SPARK_EXAMPLE_JAR=spark-sample-jar-path
export SPARK_VERSION=spark-version
export ENABLED_SENSOR_FOR_EXECUTOR="spark.executor.extraJavaOptions=-javaagent:unravel-agent-pack-bin.zip/btrace-agent.jar=libs=spark-$SPARK_VERSION,config=executor" spark-submit \ --class org.apache.spark.examples.sql.RDDRelation \ --master yarn-client \ --archives $UNZIPPED_ARCHIVE_DEST/unravel-agent-pack-bin.zip \ --driver-java-options "-javaagent:/opt/cloudera/parcels/UNRAVEL_SENSOR/lib/java/btrace-agent.jar=config=driver,libs=spark-$SPARK_VERSION" \ --conf "$ENABLED_SENSOR_FOR_EXECUTOR" \ --conf "spark.unravel.server.hostport=$UNRAVEL_SERVER_IP_PORT" \ --conf "spark.eventLog.dir=${SPARK_EVENT_LOG_DIR}" \ --conf "spark.eventLog.enabled=true" \ $PATH_TO_SPARK_EXAMPLE_JAR