Tutorials
All tutorials will take place on July 1st in two parallel tracks.
Access Control in Publish/Subscribe Systems
Jean Bacon, David Eyers, Jatinder Singh - University of Cambridge, Computer Laboratory
Peter Pietzuch - Imperial College London, Department of Computing
Two convincing paradigms have emerged for achieving scalability in widely distributed systems: publish/subscribe communication between loosely coupled components and role-based, policy-driven control of access to the system by applications.
A strength of pub/sub is its many-to-many communication paradigm where publishers need not know the recipients of their data and subscribers need not know the number and location of publishers. But some data is sensitive, and its visibility must be controlled carefully for personal and legal reasons. Also, databases may subscribe to events for the persistent record as well as subscribers concerned with realtime response. We describe the requirements of several application domains where the event-based paradigm is appropriate yet where security is an issue. Typical are the large-scale systems required by government and public bodies for domains such as healthcare, police, transport and environmental monitoring.
We discuss how a pub/sub communication service can be secured; firstly by specifying and enforcing access control policy at the service API, and secondly by enforcing the security and privacy aspects of these policies within the service itself. We outline our investigations and findings from several research projects in these areas and highlight remaining challenges.
Jean Bacon, Professor of Distributed Systems, leads the Opera research group, the focus being large-scale, multi-domain, secure distributed systems. Ongoing themes include event-based communication, role-based access control and policy-driven systems. She is PI on four grants in Cambridge in the area of the tutorial:EDSAC21, TIME-EACM, CareGrid, and Smart Flow (soon to start). See http://www.cl.cam.ac.uk/research/srg/opera/. She is a Fellow of the BCS and IEEE, founding EIC of IEEE Distributed Systems Online 2000-2007, and an IEEE Computer Society Board of Governors member 2002-2007.
David Eyers is a post-doctoral research associate with interests in event-driven systems, distributed access control, networking, and policy representation and management. He has worked within the EDSAC21 and CareGrid grants, and has been involved with the Smart Flow grant since its inception. He is an experienced lecturer, and has led many small-group teaching sessions. He serves on a number of conference program committees.
Peter Pietzuch is a lecturer in the Distributed Software Engineering (DSE) group with the Department of Computing at Imperial College London. His work spans distributed systems, peer-to-peer computing, event-based systems, networking and databases. His main research interests are new abstractions and infrastructures for design and implementation of adaptive Internet-scale applications and addressing this area's unique data management challenges. Before joining Imperial, he held a Post-doctoral Research Fellowship at Harvard University. He has co-authored a book on distributed event-based systems.
Jatinder Singh is a PhD student on the CareGrid project, whose research concerns information control in healthcare environments. He has spent time working both in academia and in industry, and has many years' student supervision experience. His publications directly match the subject of this tutorial.
CEP: Functionality, Technology and Context
Dieter Gawlick, Shailendra Mishra - Oracle
During the last few years CEP has become the focus of managing events; consequently CEP is in a process of a rapid evolution. There are several ways to support CEP; arguably the most promising approach is CQL (Continuous Query Language), an extension of SQL. CQL provides support for collections of data called streams; streams are sets of time ordered tuples. CQL allows users to supervise permanently a set of streams and to determine if a new element of a result set has been found due to newly arrived tuples or the elapse of time. If one considers the tuples in streams as events, an element in the result set of a CQL represent a complex event.
CQL allows researchers, developers and users to leverage their knowledge of the very rich and well understood SQL technology. By accessing and manipulating streams as well as classical SQL data in a single statement CQL integrates CEP fully into existing data processing technology. The CQL technology has matured and found its way into products of start-up companies as well as into well established companies and their application suites.
The tutorial provides an in-depth discussion of CQL, the data model, the language and the implementation. Additionally, the creations of original streams, the dissemination of streams as well as the management of streams as part of large and highly flexible EPN (Event Processing Networks) will be discussed. The material will be illustrated with several use cases.
Dieter Gawlick is architect at Oracle. He architected the first messaging system fully integrated into a database and was a key contributor to Oracle's integration and sensor technologies. Dieter's current focus is leveraging and evolving database technologies to accelerate the evolution of event processing. Additionally, Dieter works on database support for long running transactions. Dieter was a key contributor in the development of high end database and transaction systems.
Shailendra Mishra is Director of development for Complex Event processing at Oracle. He has worked in several areas in Distributed Databases contributing heavily to the Design, Development and Architecture of Oracle AQ and Data Streams. His current interests include Event Data Streams, Pattern recognition, Approximate Query processing and Data Mining in Event Data Streams.
Large-scale publish-subscribe systems: state of the art and research directions
Peter Triantafillou - University of Patras, Department of Computer Engineering and Informatics
Anne-Marie Kermarrec - INRIA Rennes, Bretagne Atlantique
Publish-subscribe systems are event-based systems receiving increasing attention in several disciplines in modern computer science, ranging from distributed computing where it is viewed as an appealing alternative to traditional remote procedure call infrastructures, to information systems where it is viewed as a natural computing paradigm that enables users to filter the enormous volumes of information produced nowadays and delivers only information that is truly relevant to them. In such systems, subscribers register their interests in specific (possibly data-carrying) events and are asynchronously notified of any matching event published by publishers. Subscribers and publishers are decoupled in space and time and the system is in charge of mapping events to matching subscriptions and ensuring the delivery of events to users with matched subscriptions. Publish-subscribe systems differ in the degree of expressiveness they offer to subscribers: subscribers may subscribe to events by indicating a name, a set of keywords, or a set of tuples of attribute-value pairs.
The publish-subscribe paradigm is inherently scalable, yet the key to fully realizing and exploiting its scalability lies on a (fully) decentralized (also called peer to peer) architecture. This is in contrast to server-based architectures. Several classes of distributed publish-subscribe architectures have been proposed in the research literature, ranging from unstructured (typically gossip-based) to fully structured (a.k.a. DHT-based). The differences typically reflect variants in functionalities with respect to the expressiveness of the subscriptions that can be supported, as well as the efficiency of the underlying dissemination and maintenance schemes.
The aim of this tutorial, in a nutshell, is for participants to take home an understanding of the overall solution design space, the positions occupied in this space by most well-known works in the area, and the related solution trade-offs.
Peter Triantafillou is a Professor with the Department of Computer Engineering and Informatics at the University of Patras, being the Director of the Software Division and the Director of the Network-Centric Information Systems laboratory. Peter received the Ph.D. degree from the University of Waterloo in 1991. He has also held professorial positions at Simon Fraser University and at the Technical University of Crete. Peter was on sabbatical leave with the Max -Planck Institute for Informatics in 2004 - 2005.
Anne-Marie Kermarrec is a senior researcher with INRIA, Rennes, France since 2004 where she leads the ASAP (As Scalable As Possible) research group focusing on large-scale dynamic distributed systems. Her current main research area is on peer to peer overlays, search in large-scale distributed systems, and gossip-based computing. Before joining INRIA, she was a researcher with Microsoft Research in Cambridge (2000-2004), and with Vrije Universiteit in Amsterdam (1996-97).
Interest clustering techniques for efficient event routing in large-scale settings
Leonardo Querzoni - Sapienza University of Rome, Department of Computer and Systems Sciences
The adoption of middleware platforms providing publish/subscribe based communication primitives is today restricted to specific settings characterized by a limited set of dedicated machines usually managed by some central administrator. The birth of new applications that run continuously on a huge number of non dedicated nodes (system monitoring for large enterprises, internet-based collaborative applications, etc.) raised the need for a new generation of publish/subscribe systems. These systems must be able to cope with a very large population of heterogeneous nodes, with unreliable shared network channels, and, with a dynamic environment whose characteristics can change over time.
One of the most important problems these system must deal with is how to efficiently route each event to the set of intended recipients despite the fact that this set can be very small (one subscriber) or really large (up to tens of thousands of nodes) and its composition can change over time. A technique that is widely adopted in modern event routing mechanisms to improve efficiency is interest clustering: nodes sharing similar interests have a good probability to receive similar events, thus it makes sense to cluster them as much as possible in the system in order to reduce the number of distinct paths that must be traveled by an event to reach them.
With this tutorial we want to illustrate how interest clustering is adopted and implemented in various publish/subscribe systems to improve the performance of their event routing mechanisms.
Leonardo Querzoni is a post-doc researcher in Computer Engineering at Sapienza Università di Roma. He obtained his PhD from the same university in 2007. His research interests are in the area of distributed systems, peer-to-peer applications, event-driven architectures and mobile and sensor networks. His work is currently focussed on problems related to efficient event dissemination for large-scale distributed publish/subscribe systems. He is active in various industrial research projects as well as projects funded by the EC, and regularly serves as a reviewer for various conferences and journals in the distributed systems area.
Events and Streams: Harnessing and Unleashing Their Synergy
Sharma Chakravarthy - University of Texas at Arlington, Computer Science and Engineering Department
Although the research on event processing started a long time ago, there is a renewed impetus on complex event processing. Separately, stream processing has been researched more recently and a number of techniques and systems have been developed. However, the differences and similarities between the two have not been well understood. For example, the QoS requirements that are extremely important for stream applications has not been addressed in the context of events. In addition, the synergy that exists between the two and that the combination can be greater than the sum of its parts has tremendous impact on a large class of sensor-based applications.
In this tutorial, we first discuss underpinnings of complex event processing, stream processing, and their similarities and differences. We will discuss the primary techniques that have been developed for the two topics. We will then discuss the alternatives for the integration of the two as most of the applications will require both stream and event processing. Finally, we will discuss a few commercial and prototype systems to understand what features have been incorporated into these systems. We will highlight the research issues in both the areas and what have been addressed and what is still open.
We will cover the following topics in detail during the tutorial: Complex event processing fundamentals, stream processing fundamentals, similarities and differences between the two, why their integration is needed,and synergistic integration of the two. How different commercial and research systems have tackled this problem. Implementation of an event/stream processing system will also be covered using the implementation of MavEStream at UTA. Some of the work can be found at http://itlab.uta.edu/sharma under publications.
Sharma Chakravarthy is Professor of Computer and Engineering Department at The University of Texas at Arlington, Texas. He established the Information Technology Laboratory at UT Arlington in Jan 2000 and currently heads it. Sharma Chakravarthy has also established the NSF funded Distributed and Parallel Computing Cluster (DPCC@UTA) at UT Arlington in 2003. He is the recipient of the college level "Excellence in Research" award in 2006, university level "Creative Outstanding Researcher" award in 2003 and the department level senior outstanding researcher award in 2002.
The OMG Data Distribution Service for Real-Time Systems
Angelo Corsaro - PrismTech
The Data Distribution Service for Real-Time Systems (DDS) is a relatively new standard, adopted by the Object Management Group in 2004. DDS has experienced an extremely swift adoption due to (1) its ability to address challenging requirements of real-time and high throughput data distribution in business- and mission-critical systems, such as financial services, air-traffic management and control, and next-generation military land and naval combat systems, and (2) its support for complex QoS along with data centric features typical of Complex Event Processing platforms, such as, relational information modeling, continuous queries, topic projection, and windows. This tutorial will provide an in depth overview of the DDS, explaining its unique data-centric approach, as well as its extremely rich support for QoS controlling every aspect of data distribution, timeliness, and availability. The tutorial will conclude presenting some of the most important DDS Design Patterns, and providing an outlook of the upcoming extensions to the standard, and research challenges.
Dr. Angelo Corsaro is currently affiliated with PrismTech where he leads OpenSplice DDS product, addressing technology positioning, planning, evolution, and evangelism, as well as extending the technology adoption to new application domains and verticals. He is also responsible for strategic standardization at the Object Management Group (OMG), where is co-chair of the Data Distribution Service (DDS) Special Interest Group, and of the Real-Time Embedded and Specialized Services Task Force. Angelo received a Ph.D. and a M.S. in Computer Science from the Washington University in St. Louis, and a Laurea Magna cum Laude in Computer Engineering from the University of Catania, Italy.
Event Processing - Architecture and patterns
Opher Etzion - IBM
The tutorial is intended to a technical audience that is interested in deep dive into the architectural side of event processing networks and semantic side of complex event processing. It is self-contained, thus, does not assume event processing background, however, the deep dive aspects will be also of benefit to the event processing experts.
Session I - Event Processing Architectures
- Bird's eye view of event processing architecture.
- The event producer perspective
- The event consumer perspective
- Event Processing Network - concepts and facilities
- Layering, stratification and hierarchy.
- Parallelism and partition
- Additional viewpoints
- Database centric viewpoint
- BPM viewpoint
- Sense and response viewpoint
- EPN and semantic networks.
- Enabler for new concepts:
- Extreme Transaction Processing (XTP).
- Context-Driven Architecture (CODA)
- Ubiquitous processing.
- Future trends.
Session II - Complex Event Patterns
- The notion of pattern and pattern language - Top-Down patterns and bottom-up patterns, business-oriented patterns vs. technical-oriented patterns.
- Use case introduction
- Information needed about the pattern to support unique interpretation - making the design decisions explicit. Policies
- Roots for policies in active databases - e.g. consumption modes.
- Pattern types
- Predicates
- Absent patterns
- Observation oriented patterns
- Diagnostics patterns
- Information Dissemination patterns
- Retrospective patterns
- Some advanced patterns - converse events, not-selected events, probabilistic patterns.
- Back to top down patterns
- Conclusion
Opher Etzion is IBM Senior Technical Staff Member, and Event Processing Scientific Leader in IBM Haifa Research Lab, Previously he has been lead architect of event processing technology in IBM Websphere, and a Senior Manager in IBM Research division, managed a department that has performed one of the pioneering projects that shaped the area of "complex event processing". He is also the chair of EPTS (Event Processing Technical Society). In parallel he is also an adjunct professor at the Technion - Israel Institute of Technology.
|