Apache Beam on Azure Databricks
Apache beam is an open source batch and streaming engine with unified model that runs on any execution engine, including Spark. It has powerful semantics that elegantly solves real world challenges in both streaming and batch processing. It recently got also some Scala based abstractions on top of it, which enables succinct and correct expressiveness of windowing, triggering, out of order events and further more. It also has been chosen from some successful cloud born companies that are challenged with vast amounts of data.
Andrej Medic commented
Jason Wolosonovich - interested in your progress, do you have a link to the Slack channel?
jason wolosonovich commented
Currently working on this with some of the Beam folks on the ASF slack channel if you'd like to join the conversation. Currently, I'm able to get the job server up and running using the PortableRunner, however upon pipeline submission, I'm running into issues with the runners taking control of the spark context.