The Play! Iteratee framework is very nice for reactively handling data streams. If you haven’t worked with Iteratees yet, think of them as consumers/sinks with a state. For example, in the Chat room example provided with the Play source, Iteratee.foreach
imperatively consumes new messages, passing them to an actor to process.
// Create an Iteratee to consume the feed val iteratee = Iteratee.foreach[JsValue] { event => default ! Talk(username, (event \ "text").as[String]) }.mapDone { _ => default ! Quit(username) } |
There’s an interesting subtlety that could be missed in this example. Iteratee.foreach
internally uses fold
and simply discards the function’s result on each step. This may be surprising — in Scala collections, it’s the other way around, fold
is implemented with foreach
— but makes sense, since an Iteratee is a state machine. A side effect of this implementation is that foreach
still steps through input one frame at a time, even though the result is always a Unit:
def foreach[E](f: E => Unit): Iteratee[E, Unit] = fold[E, Unit](())((_, e) => f(e)) |
So, if you’re dealing with a high volume of data into one Iteratee, or processing a frame is not immediate, you may be surprised to find blocking behavior.
If you’re always side effecting, an easy solution is to immediately offload the request to an actor system (preferably multiple actors to concurrently process the data).
// If you have a consumer like this: val iterateeBlocking = Iteratee.foreach[JsValue] { data => process(data) } // Here are some alternatives: val iterateeFuture = Iteratee.foreach[JsValue] { data => Akka.future { process(data) } } val iterateeActor = Iteratee.foreach[JsValue] { data => actor ! process(data) } // or to understand what's happening under the covers, or // if you'd prefer to work with the state machine directly: import play.api.libs.iteratee._ private def asyncIteratee[T](f: T => Unit): Iteratee[T, Unit] = { def step(i: Input[T]): Iteratee[T, Unit] = i match { case Input.EOF => Done(Unit, Input.EOF) case Input.Empty => Cont[T, Unit](i => step(i)) case Input.El(e) => Akka.future { f(e) } // or another async handler Cont[T, Unit](i => step(i)) } (Cont[T, Unit](i => step(i))) } val iterateeAsync = asyncIteratee[JsValue] { data => process(data) } |
If the last bit is confusing, I encourage you to read the Iteratee.scala source. An Iteratee
is in one of three states: Done
, Cont
, or Error
. New Input
is either EOF
, Empty
, or El(value)
. With EOF
, we return an Iteratee
that is forever in the Done
state. With Empty
, we’re happy to continue accepting input by returning an Iteratee
in the Cont
state. When we have input with El(value)
, we asynchronously process it, and return the an Iteratee
in the Cont
state.
This way, the Iteratee can immediately process more input. This is especially appropriate when your data is arriving in quick bursts, so each input can be processed independently to minimize the time between arrival and the end of processing. Obviously, this breaks FIFO and any ability for input (besides EOF
) to affect the state. For most Iteratee.foreach
uses that I’ve seen, however, processing input asynchronously will greatly minimize average input processing time.