A New Tapestry: Exploring Microsoft’s Fabric in Analytics

Blanka
Blanka

Thoughts from an Analytics Engineer

MS released its concept Fabric just as we were all getting ready to check out for the summer. The excitement was high but the reasons were unclear to me. I’ve since tried it out and this post is my take on what it is and why it could be considered exciting. I’m basing it on the experience I’ve gathered since joining the analytics field about 1.5 years ago, having worked mainly in maintenance and support, primarily in a solution consisting of power BI sitting atop a dbt transformation layer. Let’s dive in and see what synergies come with this unification.

Basics: what is MS Fabric?

MS has set out to create a “one-stop-shop” for the whole analytics stack, bringing together the following tools under the same umbrella:

  • Power BI — dashboarding
  • Azure Data Factory — ingestion and data pipelining
  • Synapse Data Engineering — data processing via eg. spark
  • Synapse Data Science — wrangling and ML capabilities via eg. Spark, Python
  • Synapse Data Warehouse — storage
  • Synapse Real-Time Analytics — real-time processing and analytics

The tools cover the whole process, from data ingestion and transformation to data science, visualization, and real-time processing and alerting. However, most of these features are already existing MS products, such as Azure Data Factory and Power BI, but now they’re packaged seamlessly with a unified UI — similar to when Excel+Word+etc was packaged into Office.

Image by Author using MS logos and excalidraw showing the packaging of MS tools to Fabric

From the front-end perspective, the cool thing is that it looks like Power BI can be plugged in anywhere in the data flow. It can sit right above ingestion, on a real-time stream, atop data science stuff — the possibilities are endless. Which brings me to:

Unification → efficiency

With the various tools being well integrated through the unified interface, data isn’t moving around between various tools. When data enters the fabric universe, it gets converted to an efficient format (delta and parquet) and gets put in the Onelake, a metaphorical equivalent to OneDriva for datasets. After which, it doesn’t move. It can be used in the various faces of fabric, without leaving the comfort of the lakehouse, meaning no copying, and in some instances dedicated datasets that need to be reloaded.

Furthermore, with one source of data, there’s only one security and governance system, and perhaps most importantly, one pricing model. Fabric is being sold per Capacity Units (CUs), with computing power moving between components as needed, which is a very straightforward model.

Generated using Bing Image Generator — powered by DALL-E 3 [prompt: Unification of data tools illustrated minimalistically, nature theme with a tree]

Traceability?

My one reservation, being a dbt spoiled brat working in maintenance, is regarding traceability. When end users come in with questions of the form “the number in the dashboard doesn’t look like the number in my source system?!”, dbt provides excellent lineage documents where you can see how various tables are connected and what transformations and joints are performed, an example of which is shown in the image below. This allows you to trace data fields all the way from an output layer down to the source through a web interface, instead of having to open various SSIS packages or similarly set up solutions. From what I’ve seen of fabric, that is not readily available, or at least not clearly available. This is a shame as it would be neat to see how a data field flows through the various components. But maybe that’s coming later — maintenance is rarely considered “cool” and advertises loudly in a first release.

Example of dbt generated DAG lineage, source: https://docs.getdbt.com/terms/data-lineage

Conclusion

MS have created a one-stop-shop for analytics. With the unified UI and straightforward pricing model, I think the simplicity of the system is really what’s going to hit big, with nontechnical stakeholders being able to understand and interact with the system and get into the nitty-gritty of it. What can be questioned is if the provided components of fabric are better than components provided by a supplier just focused on one thing, parallel to the “don’t buy hiking boots from North Face that tries to make everything, buy it from Hanweeg who specializes in shoes”. But then again, integration is not a thing with shoes so the argument might be invalid.

Overall, I’m very excited to see this implemented on a larger scale! Getting all these chefs to cook up soup IN THE SAME KITCHEN is going to be very interesting to see and be a part of!

Dive in

To take a closer look into this topic, here are some links that I found useful to get an overview of fabric: