Member-only story
Convert parquet format to delta format using azure data factory or azure synapse analytics.
6 min readJun 22, 2023
Let’s learn about the individual points in the title and the prerequisites to perform this task.
Later, will explain the step-by-step process to achieve this goal.
Pre-Requisites:
- You need to have an Azure account either free or pay-as-you-go.
- Azure storage account.
- Azure Data Factory / Azure Synapse Analytics workspace.
What is Parquet File Format?
- Parquet is a columnar storage file format that is highly optimized for analytical workloads.
- It provides efficient compression and encoding techniques, making it suitable for big data processing and analytics.
- Parquet files are self-describing, meaning they include metadata that describes the schema and structure of the data stored within them.
- Parquet supports schema evolution, allowing you to add, remove, or modify columns without rewriting the entire dataset.
What is Delta File Format?
- Delta is an open-source storage layer that provides ACID (Atomicity, Consistency, Isolation, Durability) transactions and time travel capabilities on top of data lakes.
- Delta files are stored as a collection of Parquet files with a transaction log that maintains a record of all operations performed on the data.