Essential Data Engineering File Types

Rakesh singhania
3 min readOct 25, 2023
Photo by Viktor Talashuk on Unsplash

Data engineers work with a variety of file types, depending on the specific needs of their project.

Below are some common file types in data engineering along with Python code examples to read them using popular libraries.

CSV (Comma-Separated Values):

CSV files are a simple and widely supported format for storing tabular data. Each row in a CSV file represents a record, and each column represents a field.

CSV files are often used to store data from databases or to exchange data between different software applications.

import csv  with open('data.csv', 'r') as file:     
csv_reader = csv.reader(file)
for row in csv_reader:
print(row)
  • Advantages: Universal support, easy to create and read, lightweight.
  • Considerations: Not suitable for complex data structures, lacks support for data types.

JSON (JavaScript Object Notation):

JSON is a lightweight format for storing and exchanging data. JSON files are often used to store data from APIs or to store configuration files.

import json 

with open('data.json', 'r') as file:
data = json.load(file)
print(data)

--

--

Rakesh singhania

As a student of technology, each day I take a single step forward on the path of learning.