Same logical operator
SAME
: logically and
's rules plus adding the further constraint that the rules must be violated within the same scope to trigger an AutoAction.
Example - a rule designed to alert rogue users.
Human-readable form
If any user runs more than ten jobs on a cluster and the same user has more than five jobs pending, then report the user as a rogue.
More formally
(any user has > 10 running apps) SAME
(any user has > 5 pending jobs)
JSON definition
“rules”:[ “SAME”:[ { “scope”:”users”, “metric”:”appCount”, “operator”:”>”, “value”:10, state”:”running” }, { “scope”:”users”, “metric”:”appCount”, “operator”:”>”, “value”:5, “state”:”pending” } ] ]
Implementation
Internally the back end uses a clustering technique to implement the SAME
operator. AutoActions runs all metric aggregations simultaneously. When the metrics are received and aggregated, it evaluates all rules and expressions. It starts at the evaluation tree's leaf expressions and goes up to the root expression.
Assume the above rule, three users (A, B, and C), and the following conditions
user A has 12 running and three pending apps
user B has seven running and one pending apps
user C has 21 running and 11 pending apps
First, the two (2) simple rules are evaluated:
does user have more than 10 apps running?
User A has 12 →
TRUE
User B has seven →
FALSE
User C has 21 →
TRUE
does the user have more than 5 apps pending?
User A has three →
FALSE
User B has one →
FALSE
User C has 11 →
TRUE
Second, it applies clustering by the scope, and for each cluster, it counts the number of rules triggered. In the back-end code, this procedure is called the “linking” of rules (see Ruleset.java).
Cluster “User A”, link count = 1.
User A > 10 running apps? →
TRUE
User A > five pending apps? →
FALSE
Cluster “User B”, link count = 0.
User B > 10 running apps? →
FALSE
User B > five pending apps? →
FALSE
Cluster “User C”, link count = 2.
User C > 10 running apps? →
TRUE
User C > five pending apps? →
TRUE
Third, all groups with less than the needed links (2 in this case) are discarded. If some of the rules were triggered, that rule is reset for the group.
Cluster “User A” has a link count = 1, so it's reset and discarded.
User A > 10 running apps? →
TRUE
reset toFALSE
User A > 5 pending apps? →
FALSE
Cluster “User B”, link count = 0, so it's discarded.
User B > 10 running apps? →
FALSE
User B > 5 pending apps? →
FALSE
Finally, only the users that have triggered all rules remain.
Cluster “User C”, link count = 2:
User C > 10 running apps? →
TRUE
User C > 5 pending apps? →
TRUE
User C meets the criteria for the Rogue User AutoAction. Therefore, User C triggers the AutoAction, and the alert is sent and/or the actions performed.
Comparison to AND
Both Users A and User C would have triggered the above rule were AND
used instead of SAME
, that is, (any user has > 10 running apps) AND
(any user has > 5 pending jobs).
To achieve the same result as the above example using AND
instead of SAME
, you would need to create the following AutoAction rule for each and every user on the cluster:
(Username has > 10 running apps) AND (Username has > 5 pending apps)