DSPA '20

Data Stream Processing and Analytics, Spring 2020

This project is maintained by vasia

Welcome to CS 591 K1: Data Stream Processing and Analytics - Spring 2020.

Lectures | Exercises | Readings | Piazza | Blackboard

Course information

This course is generously supported by a Google Cloud Platform Education Grant.

Overview

Modern data-driven applications require continuous, low-latency processing of large-scale, rapid data events such as videos, images, emails, chats, clicks, search queries, financial transactions, traffic records, sensor measurements, etc. Extracting knowledge from these data streams is particularly challenging due to their high speed and massive volume.

Distributed stream processing has recently become highly popular across industry and academia due to its capabilities to both improve established data processing tasks and to facilitate novel applications with real-time requirements.

In this course, we will study the design and architecture of modern distributed streaming systems as well as fundamental algorithms for analyzing data streams.

Specifically, we will cover the following topics:

Grading scheme

The course consists of lectures, exercises, and a final semester project. There is no formal examination at the end of the course. Instead, student grades will be based on:

In-class quizzes

Every other week, we will start the lecture with a 10’ in-class quiz, which will be used to assess your understanding of the material covered in the previous week(s). These quizzes will be of low difficulty and should require no preparation other than paying attention in class and studying the lecture notes and related reading assignments. There will be a total of 5 quizzes which will collectively contribute 10% towards your final grade.

Hands-on assignments

There will be three (3) assignments during the semester, with availability and due dates as shown below:

Assignment Available Due Grade Contribution
1 1/30 2/12 10%
2 2/13 3/2 10%
3 3/3 3/23 20%

Each of the first two assignements contribute 10% towards your final grade, while the third assignment contirbutes 20%. All assignments are due by latest 11:59pm on the day of the respective deadline.

Final project

For the final project, you will use Apache Flink and Kafka to build a real-time monitoring and anomaly detection framework for datacenters, similar to what is described in the SAQL paper.

The project contributes 50% towards your final grade and its deliverables consist of:

The project description will be announced on 3/17 and its deliverables will be due on 4/30, 11:59pm.