Tagging applications
You can define tags for groups of applications using a python script. Unravel retrieves the script from the property com.unraveldata.app.tagging.script.path so you must define all your application tags in that file. You can also use this script to set workflow tags.
You can think of the script as creating a database comprised of a list of keys, their associated values, and what applications are associated with a specific <key, value>
.
For example,
You have three departments: finance, hr, and marketing.
You would create
the key department and
give it three values finance, hr and marketing.
You would then associate applications with one of more of <key, value> pairs.
One hive query might be associated with
dept:marketing
while another withdept:finance
.
Note
You can not associate an application with more than one value per key. Given the example above, an application cannot be associated with both dept:marketing
and dept:finance
.
See What is tagging? for more information on tagging, its purpose and a more comprehensive description.
Your Python script must be idempotent, i.e., it must produce the same result over multiple invocations with different input (metadata) for the same application.
Application tags are immutable and once created they cannot be changed.
Using a Python script
See Writing a Python script and the example script for tips on how to write a script.
Set the following properties in
/usr/local/unravel/etc/unravel.properties
.com.unraveldata.tagging.script.enabled=true com.unraveldata.app.tagging.script.path=
python_script
com.unraveldata.app.tagging.script.method.name=method_name
Restart the following daemons. You must restart these daemons after you reset the property values above or edit the script referenced.
/etc/init.d/unravel_all.sh stop-etl /etc/init.d/unravel_all.sh start
Writing a Python script
You can add print/debugging statements to the script, but they are logged each time the script is run. Consequently, there are numerous/duplicated entries as the script is invoked multiple times during an application's run. You can also specify workflow tags in your script.
Format
In the Python script, you set a tag_key to a tag_value.
Your tag_value can be a string, the return value of a method, or a concatenation of both.
tag["auth"]="admin"
tag["scope"]=app_obj.getAppQueue()
tags["dept"]=app_obj.getAppName() + "_" + app_obj.getQueue()
Example Python script
The following script creates seven tag_keys for applications and then populates them, generating the tagging dictionary.
hive_query_id
dept
team
auth
scope
unravel.workflow.name, and unravel.workflow.utctimestamp (See tagged workflows.)
The tagging properties are set to the script file and method name.
com.unraveldata.app.tagging.script.path-=/usr/scripts/Tagging.py com.unraveldata.app.tagging.script.method.name-=get_tags
# filename: /usr/scripts/Tagging.py from datetime import datetime # get_tags is the method so com.unraveldata.app.tagging.script.method.name=get_tags def get_tags(app_obj): tags = {} # MR apps get the hive_query_id tag if app_obj.getAppType() == "mr": tags["hive_query_id"] = app_obj.getAppConf("hive.query.id") # every app gets a dept and team tag tags["dept"] = app_obj.getAppName() + "_" + app_obj.getQueue() tags["team"] = app_obj.getUsername() # Only apps with username=admin get this tag if app_obj.getUsername() == "admin": tags["auth"] = "admin" # Every app gets a scope tag based upon queue they are in if app_obj.getQueue() == "engr": # All apps in the "engr" queue get this tag tags["scope"] = "engineering-application" elif app_obj.getQueue() == "qa": # All apps in the "qa" queue get this tag tags["scope"] = "qa-application" else: # All apps not in the"engr" or "qa" queues get this tag tags["scope"] = "daily-application" # creates the workflow tags, these are Unravel tags and you should contact support@unraveldata.com before using them tags["unravel.workflow.name"] = "Workflow-" + tags["team"] tags["unravel.workflow.utctimestamp"] = app_obj.getAppType() + "-" + str(datetime.utcnow()) return tags
Running scripts
The tags computed in the Python script feed into Unravel core ETL pipeline. The Python script is invoked in the ingestion pipeline and is set up to access application metadata to create tags on the fly. The first time an application is invoked and running it is not listed when applications are filtered by tags. Debug and print statements are logged multiple times as the script is invoked multiple times over a run.
References
You can download example tagging scripts. (This is currently private; please contact Unravel Support.)