Tagging workflows

Home

Tagging workflows

About Unravel workflow tags

You can add two Unravel tags (<key, value> pairs) to mark queries and jobs that belong to a particular workflow:

unravel.workflow.name: a string that represents the name of the workflow. The recommended format is TenantName-ProjectName-WorkflowName.
unravel.workflow.utctimestamp: a timestamp in yyyyMMddThhmmssZ format representing the logical time of a run of the workflow in UTC/ISO format. In UNIX/LINUX bash. You can get a timestamp in UTC format by running the command "$(date -u '+%Y%m%dT%H%M%SZ')".
Note
Do not put quotes ("") or blank spaces in/around the tag keys or values. For example:
- SET unravel.workflow.name="ETL-Workflow; [Incorrect syntax]
- SET unravel.workflow.name=ETL-Workflow; [Correct syntax]

Different runs of the same workflow have

The same value for unravel.workflow.name but
different values for unravel.workflow.utctimestamp.

Different workflows have different values for unravel.workflow.name.

Hive query example

This is a Hive query that was marked as part of the Financial-Tenant-ETL-Workflow workflow that ran on February 1, 2016:

SET unravel.workflow.name=Financial-Tenant-ETL-Workflow;
SET unravel.workflow.utctimestamp=20160201T000000Z;
SELECT foo FROM table WHERE … Your Hive Query text goes here

Easy recipes for tagging workflows

Export the workflow name and UTC timestamp from your top-level script that schedules each run of the workflow.
Here, we use bash's date command to generate the timestamp.
```
export WORKFLOW_NAME=Financial-Tenant-ETL-Workflow export UTC_TIME_STAMP=$(date -u '+%Y%m%dT%H%M%SZ')
```
Follow the instructions for your job type.

Examples by job type

Hive on MR query using SET commands in Hive

hive -f hive/simple_wf.hql

In hive/simple_wf.hql:

SET unravel.workflow.name=Financial-Tenant-ETL-Workflow; 
SET unravel.workflow.utctimestamp=20160201T000000Z;
SELECT foo FROM table WHERE … Your Hive Query text goes here

Sqoop job using –D command line parameters

sqoop export \
 -D"unravel.workflow.name=$WORKFLOW_NAME" -D"unravel.workflow.utctimestamp=$UTC_TIME_STAMP"  \
 --connect jdbc:mysql://127.0.0.1:3316/unravel_mysql_prod --table settings -m 1 \
 --export-dir /tmp/sqoop_test --username unravel --verbose --password foobar

Note

Sqoop has bugs related to quotes.

Direct MapReduce job using –D command line parameters

Substitute your file name for /tmp/data/small and /tmp/outsmoke.

hadoop jar libs/ooziemr-1.0.jar com.unraveldata.mr.apps.Driver \
-D"unravel.workflow.name=$WORKFLOW_NAME" -D"unravel.workflow.utctimestamp=$UTC_TIME_STAMP"  \
-p /wordcount.properties -input /tmp/data/small -output /tmp/outsmoke

Spark job using --conf command line parameters

Note

For Spark jobs, you must prefix the Unravel tags with "spark.". For example, unravel.workflow.name becomes spark.unravel.workflow.name.

spark-submit \
    --conf "spark.unravel.workflow.name=$WORKFLOW_NAME" 
    --conf "spark.unravel.workflow.utctimestamp=$UTC_TIME_STAMP" 
    --conf "spark.eventLog.enabled=true" \
    --class org.apache.spark.examples.SparkPi \
    --master yarn-cluster \
    --deploy-mode cluster

Pig job using –param and SET commands

pig \
-param WORKFLOW_NAME=$WORKFLOW_NAME -param UTC_TIME_STAMP=$UTC_TIME_STAMP  \
-x mapreduce -f pig/simple.pig

In pig/simple.pig:

SET unravel.workflow.name $WORKFLOW_NAME; 
SET unravel.workflow.utctimestamp $UTC_TIME_STAMP; 
lines = LOAD '/tmp/data/small' using PigStorage('|') AS (line:chararray); 
words = FOREACH lines GENERATE FLATTEN(TOKENIZE(line)) as word; 
grouped = GROUP words BY word; 
wordcount = FOREACH grouped GENERATE group, COUNT(words); DUMP wordcount;

Impala job using SET commands

impala-shell -i <impald_host:port> \
    -f simpleImpala.sql \
    --var=workflowname='ourImpalaWorkflow' \
    --var=utctimestamp=$(date -u '+%Y%m%dT%H%M%SZ')

In ../simpleImpala.sql:

SET 
      DEBUG_ACTION="::::unravel.workflow.name::${var:workflowname}::::unravel.workflow.utctimestamp::${var:utctimestamp}::::"; 
      select * from usstates;;

Finding pipelines in Unravel web UI

Once your tagged workflows have been run, log into Unravel Web UI and select Jobs > Pipeline to start exploring Unravel's Workflow Management features.

In this section:

Would you like to provide feedback? Just click here to suggest edits.

Home

Tagging workflows

About Unravel workflow tags

Note

Hive query example

Easy recipes for tagging workflows

Examples by job type

Note

Note

Finding pipelines in Unravel web UI

Search results