Batch Processing, the module 5 of Data Engineering Zoomcamp 2024

Batch Processing, the module 5 of Data Engineering Zoomcamp 2024

ãndré

just completed the module 5 of the Data Engineering Zoomcamp, an online course organized by DataTalksClub on GitHub.

this module is about batch processing and topics in this module are including:

Introduction to batch processing
First look at Spark
Working with Group By and Joins in Spark
Spark's DataFrames, Actions, Transformations, and UDFs
Exploring the PySpark API

Big thanks to the DataTalksClub team, especially to the instructor Alexey Grigorev !

#dezoomcamp #dataengineering #learn_in_public 


Report Page