Reactive Fast Data & the Data Lake

Track: Data Science and Machine Learning
Skill Level: Beginner
Room: Room A403
Time Slot: Thu 2/23, 2:30 PM
Tags: streaming , data , akka , kafka , reactive , lightbend , big data , spark
Presentation Link
Abstract

The first act of this talk will begin at a high level to discuss reactive architecture tenets, distributed “fast data” streams, and an application and analytics focused Data Lake.  Enterprise level concerns and the importance of holistic governance, operational management, and a Metadata Lake will be conceptually investigated.  The next level of detail will be to explore what a prospective architecture looks like at scale with Terabytes of ingestion per day, how scale puts pressure on an architecture, and how to be successful without losing data in a mission critical system via resilient, self-healing, scalable technologies.  DevOps and application architecture concerns will be first-class themes throughout.
 
Reactive principles and technology will be the second act of this talk.  Kafka.  Akka. Spark.  Various streaming technologies (Kafka Streams, Akka Streams, Spark Streaming) will be reviewed to identify what they are best suited for.  The fast data pipeline discussion will center around Kafka, Akka, and Apache Flink (Lightbend Fast Data platform).  We’ll also walk through an exciting addition to the Akka family, Alpakka, which is a Camel equivalent for Enterprise Integration Patterns.
 
The final act will be to dive into the Data Lake, from both an analytics and application development perspective.  Technologies used to explain concepts will include Amazon and Hadoop.  A Data Lake may service multiple analytics consumers with various “views” (and access levels) of data.  It may also be a participant of various applications, perhaps by acting as a centralized source for reference data or common middleware (in turn feeding the analytics aspect).  The concept of the Metadata Lake to apply structure, meaning and purpose will be an over-arching success factor for a Data Lake.  The difference between the Data Lake and Metadata Lake is conceptually similar to a Halocline…  Various technologies (Iglu/Snowplow and more) will be discussed from a feature standpoint to flesh out the technology capabilities needed for Data Lake governance.
 
Expect an extremely dense, content rich experience; akin to my previous presentations about Docker, Microservices/APIs and development methodology.

Todd Fritz

Todd leads software development teams to design and implement innovative, enterprise-class, secure, distributed, mission-critical systems via effective alignment of technology (and development process) to business needs. He has led numerous efforts to design and implement scalable, maintainable and extensible platforms and products; many of which are still in Production. He regularly learns new technology to stay current and understand from bottom-up, which is invaluable for formulating enterprise strategy in addition to tactical planning. He is a strong proponent for reactive, streaming architectures that leverage technologies such as Akka, Kafka and Scala.

Fritz leads by example and is a proponent of “servant leadership" and agile development methods (scrum, kanban). He abhors the travesty of waterfall and micromanagement, and is an advocate of following sound SDLC process to provide long-term business value. His passion for programming and software design began on an Apple ][+ in 1980 at age 11, which progressed into assembly language by age 13, and 35 years later has continued across numerous technology innovations and trends. How fortunate to have a childhood hobby grow into a rewarding, lifelong career?

His experience includes greater than 18 years in significant roles of responsibility, and possess a diverse skill set that spans from the back-end to front-end, with emphasis on high volume, message oriented middleware and back-end solutions (e.g. big data).

Sometimes an evangelist of a technology before it becomes “cool” (microservices, EDA/CEP, docker , reactive programming and more). He is a versatile, personable, passionate, hands-on contributor. A team player with proven ability to related to different cultures, personalities and generations.

Previous topics he has presented on include Docker and microservice based architectures, to decompose legacy monoliths.