Member-only story

Convert parquet format to delta format using azure data factory or azure synapse analytics.

Mohan Balaji
6 min readJun 22, 2023

Let’s learn about the individual points in the title and the prerequisites to perform this task.

Later, will explain the step-by-step process to achieve this goal.

Pre-Requisites:

  • You need to have an Azure account either free or pay-as-you-go.
  • Azure storage account.
  • Azure Data Factory / Azure Synapse Analytics workspace.

What is Parquet File Format?

  • Parquet is a columnar storage file format that is highly optimized for analytical workloads.
  • It provides efficient compression and encoding techniques, making it suitable for big data processing and analytics.
  • Parquet files are self-describing, meaning they include metadata that describes the schema and structure of the data stored within them.
  • Parquet supports schema evolution, allowing you to add, remove, or modify columns without rewriting the entire dataset.

What is Delta File Format?

  • Delta is an open-source storage layer that provides ACID (Atomicity, Consistency, Isolation, Durability) transactions and time travel capabilities on top of data lakes.
  • Delta files are stored as a collection of Parquet files with a transaction log that maintains a record of all operations performed on the data.

--

--

Mohan Balaji
Mohan Balaji

Written by Mohan Balaji

Certified Azure Data Engineer, Databricks , Sharing my recipes on data engineering to the world.

No responses yet