DSPA '20

Data Stream Processing and Analytics, Spring 2020

This project is maintained by vasia

« back

*Note: This schedule is tentative.

Special Dates

Make sure to become familiar with the Official Semester Dates.
Some of the critical Semester Dates are:

Tentative lecture schedule

Date Topic Slides Note
01/21 Course introduction dspa20-1.pdf Optional reading: The 8 Requirements of Real-Time Stream Processing
Streaming 101
01/23 Stream processing fundamentals dspa20-2.pdf Optional videos: The Evolution of (Open Source) Data Processing by Aljoscha Krettek
The Evolution of Massive Scale Data Processing by Tyler Akidau
01/28 Stream ingestion and pub/sub systems dspa20-3.pdf Follow the Flink setup tutorial
01/30 Introduction to Apache Flink and Apache Kafka dspa20-4.pdf Assignment #1 available
02/04 Streaming languages and operator semantics dspa20-5.pdf Quiz #1
02/06 Notions of time and progress dspa20-6.pdf Optional reading: Streaming 102: The world beyond batch
Watermarks, Tables, Event Time, and the Dataflow Model
02/11 Windows and triggers dspa20-7.pdf
02/12 Assignment #1 due
02/13 Assignment #1 discussion and feedback Assignment #2 available
02/18 No class Substitute Monday
02/20 Guest Lecture: Learning How to Build Event Streaming Applications with Pac-Man Ricardo Ferreira, Developer Advocate at Confluent
02/25 State management dspa20-8.pdf Quiz #2
02/27 No class Videos to watch: Managing State in Apache Flink - Tzu-Li (Gordon) Tai and Webinar: Deep Dive on Apache Flink State - Seth Wiesman
03/02 Assignment #2 due
03/03 Guest Lecture: Streaming in the Real-World: Cyber security event correlation and triage Carolyn Duby, Solutions engineer and lead Cybersecurity SME at Cloudera
Assignment #3 available
03/05 Assignment #2 feedback
03/07-15 Spring break
03/17 High-availability, recovery semantics, and guarantees dspa20-9.pdf Final project available
03/19 Guest Lecture: From data swamp to insight clarity: using Flink at scale at NetApp Francisco Rosa & Paul Freeman, NetApp
03/23 Assignment #3 due
03/24 Exactly-once fault-tolerance in Apache Flink dspa20-10.pdf Optional reading: An example run of the Chandy-Lamport snapshot algorithm
03/26 Exactly-once fault-tolerance in Apache Flink (cont.) dspa20-11.pdf Incremental checkpoints
Unaligned checkpoints
03/31 Fault-tolerance demo & reconfiguration dspa20-12.pdf Quiz #3 (Take-home)
04/02 Elasticity and state migration: Part I dspa20-13.pdf Video to watch: Dhalion: towards self-regulating stream processing
04/07 Elasticity and state migration: Part II
04/09 Flow control and load shedding dspa20-14.pdf Video to watch: Improving throughput and latency with Flink's network stack
04/14 Streaming optimizations dspa20-15.pdf Quiz #4 (Take-home)
04/16 Skew mitigation dspa20-16.pdf Video to watch: Efficient Window Aggregation with Stream Slicing
04/21 Filtering and sampling streams dspa20-17.pdf
04/23 Cardinality and frequency estimation dspa20-18.pdf
04/28 Graph streaming algorithms dspa20-19.pdf Quiz #5 (Take-home)
04/30 Course recap Final project due