Data Stream Processing and Analytics, Spring 2020
This project is maintained by vasia
Welcome to CS 591 K1: Data Stream Processing and Analytics - Spring 2020.
This course is generously supported by a Google Cloud Platform Education Grant.
Modern data-driven applications require continuous, low-latency processing of large-scale, rapid data events such as videos, images, emails, chats, clicks, search queries, financial transactions, traffic records, sensor measurements, etc. Extracting knowledge from these data streams is particularly challenging due to their high speed and massive volume.
Distributed stream processing has recently become highly popular across industry and academia due to its capabilities to both improve established data processing tasks and to facilitate novel applications with real-time requirements.
In this course, we will study the design and architecture of modern distributed streaming systems as well as fundamental algorithms for analyzing data streams.
Specifically, we will cover the following topics:
The course consists of lectures, exercises, and a final semester project. There is no formal examination at the end of the course. Instead, student grades will be based on:
Every other week, we will start the lecture with a 10’ in-class quiz, which will be used to assess your understanding of the material covered in the previous week(s). These quizzes will be of low difficulty and should require no preparation other than paying attention in class and studying the lecture notes and related reading assignments. There will be a total of 5 quizzes which will collectively contribute 10% towards your final grade.
There will be three (3) assignments during the semester, with availability and due dates as shown below:
Assignment | Available | Due | Grade Contribution |
---|---|---|---|
1 | 1/30 | 2/12 | 10% |
2 | 2/13 | 3/2 | 10% |
3 | 3/3 | 3/23 | 20% |
Each of the first two assignements contribute 10% towards your final grade, while the third assignment contirbutes 20%. All assignments are due by latest 11:59pm on the day of the respective deadline.
For the final project, you will use Apache Flink and Kafka to build a real-time monitoring and anomaly detection framework for datacenters, similar to what is described in the SAQL paper.
The project contributes 50% towards your final grade and its deliverables consist of:
The project description will be announced on 3/17 and its deliverables will be due on 4/30, 11:59pm.