Calculating a Running Average in KRL


Summary

Calculating the running average of the data from a series of events in KRL can be tricky unless you let an eventex do the hard work.

Moving average of my office temperature

For the Kynetx thermostat project I'm working on, I need to calculate the running average of the last k temperature readings. Turns out, there's a hard way and an easy way. Let's start with the hard way and then I'll show you the easy way. Naturally, the hard way is to do it yourself.

The Long Road

The trick to doing it ourselves is to keep an array of the most recent k values. To do that, for each new temperature, we drop a value from the front of the array (with the tail() operator) and then append() the new value. If the array's empty we initialize it. If it's not long enough yet, we add new values without dropping any. The new array is stored in the persistent entity variable ent:temps and the new average is stored in ent:avg_temp. Here's a rule that does that for k=5:

rule set_average_temperature {
  select when thermostat new_temperature
  pre {
    temperature = event:attr("temperature");
    tl = ent:temps.length();
    new_array = 
      (tl == 5) => ent:temps.tail().append(temperature) |
      (tl == 0) => [temperature]                        |
                   ent:temps.append(temperature)        ;
    avg_temp = average(new_array);
  }
  always {
    set ent:temps new_array;
    set ent:avg_temp avg_temp
  }
}

This isn't bulletproof. If we want to change k to 3, for example, the array doesn't automatically get shortened. We'd have to devise some method for reseting the array.

Calculating the average of an array isn't straightforward if you're thinking imperatively since KRL offers no explicit looping over arrays. KRL does, however, have a number of operators on arrays which make, combined with first-class functions (closures or lambdas), it even easier than writing a loop. Here's how we calculate the average of an arbitrary array of numbers:

average = function(l){l.reduce(function(a,b){a + b})/l.length()};

The reduce() operator sums the array nicely and then we divide by its length. Think functional.

The Power of an Eventex

Manually keeping track of the last k values in an array can be a pain. Fortunately there's an easier way. This rule uses the eventex to do it all. The repeat group operator combined with the avg() aggregator calculate the running average and place it in the variable avg_temp:

rule set_avg_temperature {
  select when repeat 5 
    (thermostat new_temperature temperature re/(.*)/) avg(avg_temp)
  always {
    set ent:avg_temp avg_temp;
  }
}

The repeat eventex operator fires after the event it encloses (thermostat:new_temperature in this case) has repeated a certain number of times (five in this case). The event filter (temperature re/(.*)/) is there to capture the value of the temperature event attribute so that it can be averaged by the avg() aggregator. We store the result, avg_temp, in a persistent entity variable in the rule postlude.

This rule is functionally equivalent to the earlier rule that calculated the running average. The difference is that we let the event expression do the work for us. We may not want to use the eventex to calculate the running average if we need to massage the data in the event attribute in some way, but usually it saves a lot of effort. Other aggregators, min(), max(), sum(), and push() can be used for other operations on event attributes.

The Power of an Eventex

Manually keeping track of the last k values in an array can be a pain. Fortunately there's an easier way. This rule uses the eventex to do it all. The repeat group operator combined with the avg() aggregator calculate the running average and place it in the variable avg_temp:

rule set_avg_temperature {
  select when repeat 5 
    (thermostat new_temperature temperature re/(.*)/) avg(avg_temp)
  always {
    set ent:avg_temp avg_temp;
  }
}

The repeat eventex operator fires after the event it encloses (thermostat:new_temperature in this case) has repeated a certain number of times (five in this case). The event filter (temperature re/(.*)/) is there to capture the value of the temperature event attribute so that it can be averaged by the avg() aggregator. We store the result, avg_temp, in a persistent entity variable in the rule postlude.

This rule is functionally equivalent to the earlier rule that calculated the running average. The difference is that we let the event expression do the work for us. We may not want to use the eventex to calculate the running average if we need to massage the data in the event attribute in some way, but usually it saves a lot of effort. Other aggregators, min(), max(), sum(), and push() can be used for other operations on event attributes.

A Detailed Explanation of the repeat Eventex

Update: I got a few questions about the repeat eventex in the rule above, so let me unpack it a bit more.

Suppose we had a rule like this:

rule set_avg_temperature {
  select when thermostat new_temperature
  notify("Saw a new temperature", "Saw a new temperature")
}

This rule will be selected whenever there is a thermostat:new_temperature event.

Events can have attributes. The thermostat:new_temperature event has an attribute named temperature that gives the new temperature. Event expressions can set up filters based on the values in the attributes. Consider the following rule:

rule set_avg_temperature {
  select when thermostat new_temperature temperature re/\\d\\d/
  notify("Saw a new temperature", "Saw a new temperature")
}

This rule will only be selected for temperatures containing at least two digits because of the regular expression re/\\d\\d/. Regular expressions allow portions of the matched string to be "captured" using parentheses. We can set variables in the event expression from captured values for later use in the rule:

rule set_avg_temperature {
  select when thermostat new_temperature
       temperature re/^(\\d)\\d/ setting (first_digit)
  notify("Saw a new temperature", 
         "Saw a new temperature in the #{first_digit}0's")
}

This rule captures the leading digit of a temperature, binds first_digit to that value, and creates a message using the variable.

A group event expression operator like repeat encloses another eventex. In the select statement shown above, the enclosed eventex is

thermostat new_temperature temperature re/(.*)/

The filter in this case isn't actually filtering anything since re/(.*)/ matches everything. But notice that the parentheses are capturing the value of the temperature. Think of this like the $_ variable in Perl. The aggregator, avg() in this case, is using that captured variable to compute an average. The average will be over the last N thermostat:new_temperature events since that's what repeat does. The variable in the aggregator, avg_temp in this case, is bound to computed value. That variable is available within the rule for later use.

So, the following event expression is selected after five thermostat:new_temperature events (and every one after) and computes the average of the last five temperatures in those events:

select when repeat 5 
  (thermostat new_temperature temperature re/(.*)/) avg(avg_temp)

I hope that helps explain this. Like regular expressions, eventexes pack a lot of power in a compact, declarative package. So they can be a little dense.


Please leave comments using the Hypothes.is sidebar.

Last modified: Wed Feb 12 18:54:56 2020.