Data Stream Processing and Analytics, Spring 2020
This project is maintained by vasia
*Note: This schedule is tentative.
Make sure to become familiar with the Official Semester Dates.
Some of the critical Semester Dates are:
| Date | Topic | Slides | Note |
|---|---|---|---|
| 01/21 | Course introduction | dspa20-1.pdf | Optional reading: The 8 Requirements of Real-Time Stream Processing Streaming 101 |
| 01/23 | Stream processing fundamentals | dspa20-2.pdf | Optional videos: The Evolution of (Open Source) Data Processing by Aljoscha Krettek The Evolution of Massive Scale Data Processing by Tyler Akidau |
| 01/28 | Stream ingestion and pub/sub systems | dspa20-3.pdf | Follow the Flink setup tutorial |
| 01/30 | Introduction to Apache Flink and Apache Kafka | dspa20-4.pdf | Assignment #1 available |
| 02/04 | Streaming languages and operator semantics | dspa20-5.pdf | Quiz #1 |
| 02/06 | Notions of time and progress | dspa20-6.pdf | Optional reading: Streaming 102: The world beyond batch Watermarks, Tables, Event Time, and the Dataflow Model |
| 02/11 | Windows and triggers | dspa20-7.pdf | |
| 02/12 | Assignment #1 due | ||
| 02/13 | Assignment #1 discussion and feedback | Assignment #2 available | |
| 02/18 | No class | Substitute Monday | |
| 02/20 | Guest Lecture: Learning How to Build Event Streaming Applications with Pac-Man | Ricardo Ferreira, Developer Advocate at Confluent | |
| 02/25 | State management | dspa20-8.pdf | Quiz #2 |
| 02/27 | No class | Videos to watch: Managing State in Apache Flink - Tzu-Li (Gordon) Tai and Webinar: Deep Dive on Apache Flink State - Seth Wiesman | |
| 03/02 | Assignment #2 due | ||
| 03/03 | Guest Lecture: Streaming in the Real-World: Cyber security event correlation and triage | Carolyn Duby, Solutions engineer and lead Cybersecurity SME at Cloudera Assignment #3 available |
|
| 03/05 | Assignment #2 feedback | ||
| 03/07-15 | Spring break | ||
| 03/17 | High-availability, recovery semantics, and guarantees | dspa20-9.pdf | Final project available |
| 03/19 | Guest Lecture: From data swamp to insight clarity: using Flink at scale at NetApp | Francisco Rosa & Paul Freeman, NetApp | |
| 03/23 | Assignment #3 due | ||
| 03/24 | Exactly-once fault-tolerance in Apache Flink | dspa20-10.pdf | Optional reading: An example run of the Chandy-Lamport snapshot algorithm |
| 03/26 | Exactly-once fault-tolerance in Apache Flink (cont.) | dspa20-11.pdf | Incremental checkpoints Unaligned checkpoints |
| 03/31 | Fault-tolerance demo & reconfiguration | dspa20-12.pdf | Quiz #3 (Take-home) |
| 04/02 | Elasticity and state migration: Part I | dspa20-13.pdf | Video to watch: Dhalion: towards self-regulating stream processing |
| 04/07 | Elasticity and state migration: Part II | ||
| 04/09 | Flow control and load shedding | dspa20-14.pdf | Video to watch: Improving throughput and latency with Flink's network stack |
| 04/14 | Streaming optimizations | dspa20-15.pdf | Quiz #4 (Take-home) |
| 04/16 | Skew mitigation | dspa20-16.pdf | Video to watch: Efficient Window Aggregation with Stream Slicing |
| 04/21 | Filtering and sampling streams | dspa20-17.pdf | |
| 04/23 | Cardinality and frequency estimation | dspa20-18.pdf | |
| 04/28 | Graph streaming algorithms | dspa20-19.pdf | Quiz #5 (Take-home) |
| 04/30 | Course recap | Final project due |