e-journal
Large-Scale Estimation in Cyberphysical Systems Using Streaming Data: A Case Study With Arterial Traffic Estimation
Controlling and analyzing cyberphysical and robotics systems is increasingly becoming a Big Data challenge. We study the case of predicting drivers’ travel times in a large urban area from sparse GPS traces.We present a framework that can accommodate a wide variety of traffic distributions and spread all the computations on a cluster to achieve small latencies. Our framework is built on Discretized Streams, a recently proposed approach to stream processing at scale. We demonstrate the usefulness of
Discretized Streams with a novel algorithm to estimate vehicular traffic in urban networks. Our online EM algorithm can estimate traffic on a very large city network (the San Francisco Bay Area) by processing tens of thousands of observations per second, with a latency of a few seconds.
Note to Practitioners—This work was driven by the need to estimate vehicular traffic at a large scale, in an online setting, using commodity hardware. Machine Learning algorithms combined with streaming data are not new, but it still requires deep expertise both in Machine Learning and in Computer Systems to achieve
large scale computations in a tractable manner. The Streaming Spark project aims at providing an interface that abstracts out all the technical details of the computation platform (cloud, HPC, workstation, etc.). As shown in this work, Streaming Spark is suitable for implementing and calibrating nontrivial algorithms on a large cluster, and provides an intuitive yet powerful programming interface. The readers are invited to refer to the source code referred in this article for more examples.
This paper presents algorithms to sample and compute densities for Gamma random variables restricted to a hyperplane (i.e.,distributions of the form with independant Gamma distributions). It is common in this case to use Gaussian random variables because of closed-form solutions to solve. If one considers positive valued distributions with heavy tails, our formulas using gamma distributions may be more suitable.
Index Terms—Arterial traffic, arterial traffic estimation, expectation-maximization, large-scale estimation, streaming, streaming spark, travel times.
Tidak ada salinan data
Tidak tersedia versi lain