Gain real world experience on Databricks as a Data Engineer

Kickoff your Data Engineer career on Databricks

Are you looking for a sneak peek into what the job of a Data Engineer on Databricks looks like?

What you’ll learn

  • Gain real-world experience of working as a Data Engineer on Databricks.
  • Work on Databricks to ingest data of different formats with Apache Spark 3.0 (Python).
  • Enhance your skills on Databricks Dataframe API and Delta API.
  • Prepare for “Databricks Certified Developer for Spark 3.0” certification.

Course Content

  • Setup –> 1 lecture • 3min.
  • Notebooks –> 4 lectures • 58min.

Auto Draft

Requirements

  • Access to a Databricks Workspace is required, but if you don’t have one we will create a free account in the course.
  • All the code and step-by-step instructions are provided, but the skills below will greatly benefit your journey.
  • Basic knowledge of Python and SQL.

Are you looking for a sneak peek into what the job of a Data Engineer on Databricks looks like?

Would you like to gain some real world experience on what it feels like to work with Databricks as a Data Engineer?

Are you eager to enhance your skill as a Data Engineer and prepare for the Databricks Certified Developer for Apache Spark 3.0 exam in Python?

 

Then look no further.

 

This course is going to provide you with some rudimentary yet real world experience of what a Data Engineering job is so that you can understand if it suits you.

In no more than 1 hour, thanks to several practical examples, you will learn how to ingest data of different formats working in Databricks (comma separated text files, xml files, tab separated text files, fixed width files).

By the end of this course you will know how to deal with the Dataframe API and Delta API just like a real Data Engineer does!

 

COURSE CONTENT

 

  • In Notebook 1, you are given an overview of the project, the data, and the exercises.
  • In Notebook 2, you are going to walk through the code to perform data ingestion into Delta Tables.
  • In Notebook 3, you are going to walk through the code to combine all the datasets into one to answer a handful of business questions.

 

The idea behind this project is to ingest data from a variety of file types and load it into Delta Tables for further analysis.

This project is self-contained, in that all the code required to complete this project is provided with it and you will just have to run each cell of each notebook.

Put it in other words, this project is simply going to walk you through several exercises that reflect the day by day work of a Data Engineer in real life, at least for the very first steps of the data ingestion part.

The data is about global educational indicators and comes from publicly available data from The World Bank. Please note that the author has performed some degree of data massage for the sake of simplicity of the project. Therefore, the data does not represent actual data from the source, but it is only for demonstrating how to work with PySpark on Databricks (take it for demonstration purposes only).

To close the project, you are going to answer a handful of simple business questions based on the combination of the data you previously loaded.

Get Tutorial