Day 4: Ingest and Transform Data in Microsoft Fabric – No-Code and Pro-Code Guide

Ingest and Transform Data in Microsoft Fabric | No-Code and Pro-Code (Day 4)

Published: July 5, 2025

🚀 Introduction

Now that you’ve created your first Microsoft Fabric workspace, it’s time to bring in some data! In this article, you’ll learn how to ingest data into your Lakehouse or Warehouse and transform it using both no-code (Dataflows Gen2) and pro-code (Notebooks) methods.

Whether you’re a business analyst who prefers drag-and-drop tools or a data engineer writing Spark code — Fabric has something powerful for you.

🎯 What You’ll Learn Today

Connecting to external data sources
Using Dataflows Gen2 for no-code ingestion
Transforming data with Power Query
Using Notebooks for Spark-based transformation
Storing clean data into a Lakehouse or Warehouse

🔌 Step 1: Connect to Data Sources

Microsoft Fabric supports 200+ data connectors, including:

Azure SQL, SQL Server, Oracle, PostgreSQL
Excel, SharePoint, OneDrive, Google Sheets
Dataverse, Salesforce, SAP, Dynamics 365
Web APIs, Flat files (CSV, JSON, XML)

💡 Use these connectors to pull in operational, transactional, or master data into your analytics system.

🧩 Step 2: Use Dataflow Gen2 (No-Code ETL)

🔄 What is Dataflow Gen2?

Dataflow Gen2 in Fabric is the latest low-code/no-code ETL solution, powered by Power Query.

🛠️ How to Create One:

In your workspace, click + New → Dataflow Gen2
Select Blank Dataflow
Click Add New Data Source and choose one (e.g., Azure SQL DB)
Use Power Query Editor to filter, clean, and transform your data
Choose your destination: Lakehouse or Warehouse
Click Publish

✨ Key Features:

Data refresh scheduling
Built-in join, pivot, filter, group, merge
Data lineage tracking and governance
Perfect for analysts and citizen developers

💻 Step 3: Use Spark Notebooks (Pro-Code)

🧠 What are Notebooks?

Notebooks in Microsoft Fabric allow you to write and run code using Spark (Apache Spark engine), supporting languages like:

Python (PySpark)
SQL
R
Scala

🧪 How to Use a Notebook:

In your workspace, click + New → Notebook
Choose a kernel (Python or SQL)
Use Spark commands to query, join, and transform data
Write results back to Lakehouse as Delta tables
Save and optionally schedule via pipelines

# Sample PySpark code
df = spark.read.load('Files/orders.csv', format='csv', header=True)
df_filtered = df.filter(df["order_status"] == "Completed")
df_filtered.write.mode("overwrite").format("delta").save("Tables/CompletedOrders")

🔥 Benefits:

Scalable for big data workloads
Supports custom logic, ML, and advanced joins
Ideal for data engineers and data scientists

🏗️ Step 4: Write to Lakehouse or Warehouse

After ingestion and transformation:

Write curated data into Lakehouse tables (Delta)
Or use Data Warehouse for structured OLAP workloads

Fabric auto-generates a SQL endpoint and Power BI dataset for your Lakehouse/warehouse.

🌐 Step 5: Automate with Pipelines (Optional)

Use Data Pipelines in Fabric to schedule:

Dataflow refreshes
Notebook execution
Notification alerts

This ensures data is always fresh and ready for reports or ML models.

🧠 Summary Table

Tool	Purpose	Best For
Dataflow Gen2	No-code ETL using Power Query	Analysts, business users
Notebooks	Code-first transformation with Spark	Data engineers, data scientists
Lakehouse	Central data storage with SQL & file access	All personas
Pipeline	Orchestration and automation	Ops & DevOps

✅ Conclusion

Microsoft Fabric gives you both the simplicity of no-code tools and the power of code-first environments. By using Dataflows Gen2 and Notebooks together, you can build scalable, flexible data pipelines inside a single platform — with zero duplication and full governance.

In the next article, we’ll explore how to create Power BI reports using Lakehouse data directly inside Fabric.

🔮 Coming Up Tomorrow:

Day 5: Visualizing Data with Power BI in Fabric – Real-Time Reports & Dashboards