Ingest and Transform Data in Microsoft Fabric | No-Code and Pro-Code (Day 4)
Published: July 5, 2025
🚀 Introduction
Now that you’ve created your first Microsoft Fabric workspace, it’s time to bring in some data! In this article, you’ll learn how to ingest data into your Lakehouse or Warehouse and transform it using both no-code (Dataflows Gen2) and pro-code (Notebooks) methods.
Whether you’re a business analyst who prefers drag-and-drop tools or a data engineer writing Spark code — Fabric has something powerful for you.
🎯 What You’ll Learn Today
- Connecting to external data sources
- Using Dataflows Gen2 for no-code ingestion
- Transforming data with Power Query
- Using Notebooks for Spark-based transformation
- Storing clean data into a Lakehouse or Warehouse
🔌 Step 1: Connect to Data Sources
Microsoft Fabric supports 200+ data connectors, including:
- Azure SQL, SQL Server, Oracle, PostgreSQL
- Excel, SharePoint, OneDrive, Google Sheets
- Dataverse, Salesforce, SAP, Dynamics 365
- Web APIs, Flat files (CSV, JSON, XML)
💡 Use these connectors to pull in operational, transactional, or master data into your analytics system.
🧩 Step 2: Use Dataflow Gen2 (No-Code ETL)
🔄 What is Dataflow Gen2?
Dataflow Gen2 in Fabric is the latest low-code/no-code ETL solution, powered by Power Query.
🛠️ How to Create One:
- In your workspace, click + New → Dataflow Gen2
- Select Blank Dataflow
- Click Add New Data Source and choose one (e.g., Azure SQL DB)
- Use Power Query Editor to filter, clean, and transform your data
- Choose your destination: Lakehouse or Warehouse
- Click Publish
✨ Key Features:
- Data refresh scheduling
- Built-in join, pivot, filter, group, merge
- Data lineage tracking and governance
- Perfect for analysts and citizen developers
💻 Step 3: Use Spark Notebooks (Pro-Code)
🧠 What are Notebooks?
Notebooks in Microsoft Fabric allow you to write and run code using Spark (Apache Spark engine), supporting languages like:
- Python (PySpark)
- SQL
- R
- Scala
🧪 How to Use a Notebook:
- In your workspace, click + New → Notebook
- Choose a kernel (Python or SQL)
- Use Spark commands to query, join, and transform data
- Write results back to Lakehouse as Delta tables
- Save and optionally schedule via pipelines
# Sample PySpark code
df = spark.read.load('Files/orders.csv', format='csv', header=True)
df_filtered = df.filter(df["order_status"] == "Completed")
df_filtered.write.mode("overwrite").format("delta").save("Tables/CompletedOrders")
🔥 Benefits:
- Scalable for big data workloads
- Supports custom logic, ML, and advanced joins
- Ideal for data engineers and data scientists
🏗️ Step 4: Write to Lakehouse or Warehouse
After ingestion and transformation:
- Write curated data into Lakehouse tables (Delta)
- Or use Data Warehouse for structured OLAP workloads
Fabric auto-generates a SQL endpoint and Power BI dataset for your Lakehouse/warehouse.
🌐 Step 5: Automate with Pipelines (Optional)
Use Data Pipelines in Fabric to schedule:
- Dataflow refreshes
- Notebook execution
- Notification alerts
This ensures data is always fresh and ready for reports or ML models.
🧠 Summary Table
| Tool | Purpose | Best For |
|---|---|---|
| Dataflow Gen2 | No-code ETL using Power Query | Analysts, business users |
| Notebooks | Code-first transformation with Spark | Data engineers, data scientists |
| Lakehouse | Central data storage with SQL & file access | All personas |
| Pipeline | Orchestration and automation | Ops & DevOps |
✅ Conclusion
Microsoft Fabric gives you both the simplicity of no-code tools and the power of code-first environments. By using Dataflows Gen2 and Notebooks together, you can build scalable, flexible data pipelines inside a single platform — with zero duplication and full governance.
In the next article, we’ll explore how to create Power BI reports using Lakehouse data directly inside Fabric.
🔮 Coming Up Tomorrow:
Day 5: Visualizing Data with Power BI in Fabric – Real-Time Reports & Dashboards


Leave a comment