Data Stream Processing and Analytics, Spring 2020
This project is maintained by vasia
*Note: This schedule is tentative.
Make sure to become familiar with the Official Semester Dates.
Some of the critical Semester Dates are:
Date | Topic | Slides | Note |
---|---|---|---|
01/21 | Course introduction | dspa20-1.pdf | Optional reading: The 8 Requirements of Real-Time Stream Processing Streaming 101 |
01/23 | Stream processing fundamentals | dspa20-2.pdf | Optional videos: The Evolution of (Open Source) Data Processing by Aljoscha Krettek The Evolution of Massive Scale Data Processing by Tyler Akidau |
01/28 | Stream ingestion and pub/sub systems | dspa20-3.pdf | Follow the Flink setup tutorial |
01/30 | Introduction to Apache Flink and Apache Kafka | dspa20-4.pdf | Assignment #1 available |
02/04 | Streaming languages and operator semantics | dspa20-5.pdf | Quiz #1 |
02/06 | Notions of time and progress | dspa20-6.pdf | Optional reading: Streaming 102: The world beyond batch Watermarks, Tables, Event Time, and the Dataflow Model |
02/11 | Windows and triggers | dspa20-7.pdf | |
02/12 | Assignment #1 due | ||
02/13 | Assignment #1 discussion and feedback | Assignment #2 available | |
02/18 | No class | Substitute Monday | |
02/20 | Guest Lecture: Learning How to Build Event Streaming Applications with Pac-Man | Ricardo Ferreira, Developer Advocate at Confluent | |
02/25 | State management | dspa20-8.pdf | Quiz #2 |
02/27 | No class | Videos to watch: Managing State in Apache Flink - Tzu-Li (Gordon) Tai and Webinar: Deep Dive on Apache Flink State - Seth Wiesman | |
03/02 | Assignment #2 due | ||
03/03 | Guest Lecture: Streaming in the Real-World: Cyber security event correlation and triage | Carolyn Duby, Solutions engineer and lead Cybersecurity SME at Cloudera Assignment #3 available |
|
03/05 | Assignment #2 feedback | ||
03/07-15 | Spring break | ||
03/17 | High-availability, recovery semantics, and guarantees | dspa20-9.pdf | Final project available |
03/19 | Guest Lecture: From data swamp to insight clarity: using Flink at scale at NetApp | Francisco Rosa & Paul Freeman, NetApp | |
03/23 | Assignment #3 due | ||
03/24 | Exactly-once fault-tolerance in Apache Flink | dspa20-10.pdf | Optional reading: An example run of the Chandy-Lamport snapshot algorithm |
03/26 | Exactly-once fault-tolerance in Apache Flink (cont.) | dspa20-11.pdf | Incremental checkpoints Unaligned checkpoints |
03/31 | Fault-tolerance demo & reconfiguration | dspa20-12.pdf | Quiz #3 (Take-home) |
04/02 | Elasticity and state migration: Part I | dspa20-13.pdf | Video to watch: Dhalion: towards self-regulating stream processing |
04/07 | Elasticity and state migration: Part II | ||
04/09 | Flow control and load shedding | dspa20-14.pdf | Video to watch: Improving throughput and latency with Flink's network stack |
04/14 | Streaming optimizations | dspa20-15.pdf | Quiz #4 (Take-home) |
04/16 | Skew mitigation | dspa20-16.pdf | Video to watch: Efficient Window Aggregation with Stream Slicing |
04/21 | Filtering and sampling streams | dspa20-17.pdf | |
04/23 | Cardinality and frequency estimation | dspa20-18.pdf | |
04/28 | Graph streaming algorithms | dspa20-19.pdf | Quiz #5 (Take-home) |
04/30 | Course recap | Final project due |