04 Apr

|Microsoft, Technology

Microsoft FabCon Wrapped: Day 4 and 5. Microsoft Fabric and Data Governance Insights

For Day 1 Microsoft FabCon Blog and Updates: Read here.

Day 3 Blog and Insights: Read here

Weather report:

Las Vegas – 15 degrees, overcast with strong winds.
Burnley – 18 degrees with clear skies and light winds.

Another benefit of being here is that I’m getting my steps in – almost 18,000 yesterday. All I’m doing is walking to and from the conference from my hotel and moving around the cavernous convention centre.

My walk to the convention centre now takes in the Excalibur, New York and MGM Grand casino floors. Excalibur is an odd place; it seems permanently full of people dragging luggage around – which may seem like a normal thing to be happening in a hotel this size but it’s a localised phenomenon which doesn’t seem to manifest anywhere else. There is also an unusually large number of people who seem to think this is an appropriate place for a family holiday.

It has been quite some time since I’ve witnessed a softly spoken man with a ponytail bound onto a stage declaring himself to be “super excited”, but today was the day that brought those memories flooding back.

Exhibition Hall: Vendors, Stalls and a Few Beers

As I’ve said, I’m not much of a mingler but armed with a single free beer token I went along to last night’s exhibition hall event – basically vendors manning stalls, delegates milling about, and the endless game of trying to avoid having your badge scanned. Once my beer was finished, I shelled out $12 for a bottle of Mexican lager (that’s about £20 QUID A PINT), then managed to scrounge two Las Vegas Craft Lagers from a vendor (DQ Labs I think) which stretched things out a bit.

I spoke with Snowflake, CluedIn (MDM), Profisee, and the Fabric Database expert at the “Ask The Expert” area. Snowflake finds itself in an interesting position in that Microsoft Fabric offers both an opportunity and a potential challenge, much like the dynamic Fabric has with Databricks.

The fact that it was Snowflake up on stage at the keynote and not Databricks, combined with the recently announced partnership between Databricks and SAP has me putting 2+2 together and getting about 14. Databricks do have a stand in the exhibition hall (it doesn’t mention Fabric on it anywhere), and they sponsored this morning’s coffee break so they haven’t disappeared but there doesn’t seem to be the same level of love being shown as there has been previously.

Meanwhile, Snowflake are strengthening ties with Fabric. My pontifications about positioning and what I’ve seen so far this week suggest that Fabric is chasing gold-layer work, and acting as a supportive partner steering customers toward Snowflake works for now. Fabric is still behind both Snowflake and Databricks in terms of capability though, but I’m sure both are keeping an eye on Fabric’s progress; significant advances could tilt “coopertition” toward competition.

CluedIn and Profisee compete in the MDM and matching space, offering capabilities beyond the probabilistic matching of the MoJ Splink engine that we use on many of our PubSec projects. Profisee’s integration as a Fabric workload catches attention, but the pricing for both vendors is linked to record volumes and so is at best hazy.

My obsession with Fabric SQL Database grows by the day not helped by the Fabric database expert who couldn’t give me a reason why you couldn’t run modest analytical workloads on Fabric SQL Database. More on this in a bit.

Data Engineering in Microsoft Fabric

There are quite a few roadmap sessions which expand the 15 minutes each area was given in the keynote out to an hour. I could end up repeating a lot of keynote content, so I’ll summarise:

In the Lakehouse, materialised views, due in a couple of months, arrive declaratively via extended SQL syntax, mimicking Delta tables with added data quality (DQ) constraints:

CREATE MATERIALISED VIEW

CONSTRAINT […] ON MISMATCH DROP

AS SELECT

These go beyond open-source Spark, offering nested views and dependency diagrams (currently Lakehouse-only, with workspace-wide transformation visibility promised later).

Scheduling’s here, event-driven triggers are coming, and DQ breaches can halt pipelines, with a semantic model for custom failure reports on the horizon.
Spark gets a practical overhaul. Every workspace has a starter pool, custom pools allow node sizing and auto scaling, and high concurrency lets notebooks share resources which should be more cost-effective than one pool per notebook.
Auto-scale, in preview, shifts to a serverless billing model, decoupling Spark from Fabric capacity. It’s a pay as you go billing construct. Your F2 Fabric Capacity can handle your Power BI workloads, but you can now scale your Spark pool to 1000 nodes to support a really heavy ETL process that only needs to run once per month. It’s for spiky, irregular workloads.
Spark jobs are currently optimised for processing gold layer / Power BI type transactions. This isn’t great for write-heavy bronze layer operations so a new default for Spark is support for a “writeHeavy” profile which should bring 3-5x performance improvements in bronze layer processing. You can also define Spark resource profile configurations for optimising workloads.
ArcGIS analytics support – 180+ geospatial analytics functions available in Fabric Spark.
Co-Pilot can be used directly within notebooks. It can fix bugs when you encounter an error. It can also be used for “AI transformations” – the developer writes natural language requests to analyse data (e.g. for sentiment) and the results can be appended in the dataframe.
Notebook UDF Integration – user data functions are in public preview and can be called from within notebooks. Developers can run multiple UDF’s and orchestrate data processing without needing to refactor code.
Python notebooks are available (lightweight notebook with a pure Jupyter/Python experience) which give a better resource utilisation for smaller data sets. New is Python notebook support for VSCode.dev and “TSQL Magic” commands in Python notebook are on the roadmap.
CI/CD support for shortcuts in Lakehouse and deployment pipeline support for Spark Job Definitions (SJD) are available. CI/CD notebook enhancements are on the roadmap.
Variable Library support is available meaning no need for manual update of e.g. storage account links when notebooks are deployed through environments.

Dynamics 365 and Microsoft Fabric: Bridging the Data Divide

The “Dynamics 365 and Fabric” session outlined two options for integrating Dynamics 365’s Dataverse with Fabric: Fabric Data Link and Synapse Data Link. Both require row version change tracking to be enabled on each D365 entity.

Fabric Data Link provides a direct OneLake shortcut, retaining Delta Parquet files in Dataverse at £31/GB ($40) which is fine if you aren’t close to your storage limit but if you are it’s about as expensive as cloud storage could be. There is currently a 1-hour latency in preview (15 minutes promised) but it’s effectively a no-ETL, fully managed SaaS service.

Synapse Data Link routes data to Azure Storage via Synapse Spark, offering a 15-minute SLA and low storage costs, but its architecture is complex, and it could be the more expensive option due to the necessity for Spark compute.

Fabric Link stands out for simplicity and a claimed 70% cost saving over Synapse Link’s PaaS/SaaS overhead (assuming no additional storage costs). Synapse Link delivers flexibility and faster updates but demands more setup and incurs variable Spark costs.

Fabric Link leans toward Lakehouse, Synapse Link towards broader Azure integration.

Fabric Link does look like the pragmatic choice – its simplicity suits straightforward mirroring requirements, assuming storage costs align with licences. Basically, neither is perfect pick the one that meets the requirement at a cost that is affordable but look at Fabric link first.

Azure SQL Database vs Fabric SQL Database

The “Azure SQL Database vs. Fabric SQL Database” session hammered home a familiar soundbite from the conference: Fabric SQL Database is “just SQL Server” – same engine, same tools (BTW Azure Data Studio is being deprecated in 2026), same query processing engine.

In Fabric, SQL Database is pitched at OLTP workload or for underpinning apps – i.e. NOT reporting. Fabric Lakehouse fills the gap for those looking to build a PySpark, notebook-oriented architecture. SQL Warehouse is the replacement for Synapse Dedicated Pools and provides a relational MPP service for those with big data and a preference for T-SQL.

That leaves an obvious gap in the market for a platform within Fabric that can service the needs of those customers who don’t want to stray too far from a relational solution but who are only working with relatively modest data volumes. Can anyone think of anything? No, me neither [IT’S BEHIND YOU………….! (OH NO IT ISN’T)].

Underneath, Fabric SQL Database mirrors Azure SQL Database’s General-Purpose tier. Use of blob storage for files introduces a bit of latency, offset by use of a local SSD for TempDB. Azure SQL Database offers provisioned or pausable serverless SKUs; Fabric’s serverless-only setup pauses aggressively and can take 60 seconds to 2 minutes to restart, an issue Microsoft acknowledges and something likely to be fixed by GA.

High Availability comes from three blob-stored replicas, but Disaster Recovery has no support for geo-replication, failover groups, or geo-restore (all Azure SQL Database strengths), just 7-day point-in-time restores vs. Azure SQL Database’s 35. This is all important stuff if you’re planning to run operational OLTP workloads on the platform (which is what it’s for, right?). For modest BI workloads this is less important – these solutions are rarely mission critical and the whole environment can usually be recreated and repopulated from scratch overnight. What a shame it doesn’t support that [OH YES IT DOES].

Microsoft Purview: Enhancing Data Governance and Security

Microsoft Purview was a key emphasis on Day 5 of FabCon and three critical problems in data governance are highlighted:

Oversharing, permission creep, and lack of data mapping/lineage are widespread issues.
AI Applications are being built on potentially poor-quality datasets – reinforcing the need for quality data foundations.
Data movement and leakage concerns are growing, especially as AI agents become more powerful than humans in their ability to create, interpret, read, write and share data.

Chapter 1: Seamlessly Secure

Addressing oversharing concerns, protecting against data loss and insider risks, governing Artificial Intelligence (AI) use to meet regulations and policies and implementing DLP and insider risk protection for Copilot.

The team announced several new features:

Data protections for Copilot in Power BI.
DLP for mirrored sources.
Enhanced user signals in Insider Risk.

Confidently Activate

Data observability in the Purview unified catalogue provides visibility of the entire data estate in one view. Recent announcements focused on Data Quality enhancements, including:

Data products with natural language search capabilities.
Comprehensive data product views with clear descriptions, underlying use cases, and Data Quality scores.
Aggregated views of Data Quality across assets.

Business users can drill into individual data assets to gain Data Quality insights. Copilot integration allows users to ask questions like: “Will this Data Quality impact my model?” – with Copilot analysing Purview Glossary Terms and use cases before providing information on potential data bias and recommendations on production readiness.

This approach increases data literacy for users without them even realising it. The Health Management view provides a graphical representation of data across all domains, showing how data products are consumed throughout the organisation. This functionality extends to the asset level as well.

Visualisation has been enhanced to include Data Quality rules at the column level, but you’re going to need a bigger monitor to make use of this.

A new Chief Data Officer view has been added under Reports in Health Management. This includes an embedded Power BI report providing a bird’s-eye view of Data Quality across a multi-cloud data estate. The “Get to Green” scorecard shows Data Quality status by dimension, serving business users, Data Quality managers, and data leaders.

As the presenter emphasised, trusted data needs to be the foundation for every decision made in an organisation.

Note: Further investigation is needed to differentiate between Purview for Fabric and full Purview capabilities demonstrated in these presentations as they seemed to drift between the two.

Power Hour

Last night was the fabled “Power Hour” where the promise of “we guarantee you won’t learn anything” was expertly delivered. It started with a game of Family Fortunes which annoyed me (more in a minute) and followed up with a demo of a solution taking biometric data and making predictions. It was the hottest ticket of the week, and we had to queue round the block to get in.

Further thoughts on Fabric SQL Database

Yes, I am still banging on about this. Given there is a relational component mirroring a copy of data out to the lake where it can be integrated with other lake data for querying?

Event Party

For the first hour the main entertainment was a man with a guitar and a loop pedal which was surprisingly good but was so quiet you could hardly hear him which was a shame. He’d do two or three songs then make way for an Elvis Impersonator to do a couple to a backing track. Strangely he was really loud so those who had ventured close to the stage to hear the guitarist got the full Elvis experience. In the second hour we had a band on a much bigger stage who were called Ex-somethingorother. ExInfiltrator? The lead singer could sing, the band were all accomplished musicians, and the sound was very tight. However, they’d have been better getting the Elvis impersonator to play with a live band.

Session Notes: Microsoft Fabric Implementation and Capabilities

There was a session on SQL Data Warehouse limitation. I’ve captured a few here. I think this emphasises that Warehouse is still a relatively new platform with plenty of work to be done to get it up to feature parity with the Synapse platform it is intended to replace.

Feature	Status	Workaround
Temporary Tables	Coming in Q1 (delayed)	Physically materialise tables into a temporary schema
INSERT INTO…EXEC	Not on roadmap	Alternative approaches required
MERGE Statement	Due in Q1 (delayed)	Use separate INSERT and UPDATE statements
IDENTITY Column	Planned for H2 (likely Q4)	Use ROW_NUMBER() function with a seed value
Primary/Foreign Keys	Present but non-functional	Be aware that relationships are not enforced, and PKs do not ensure uniqueness

Fabric Capacities

There are some new features coming (or have arrived) which make the management of capacities simpler and more transparent.

Capacities reside in a region of the buyer’s choice. Multiple capacities can be deployed in a single tenant allowing business units to pay for their own consumption.
Capacity Unit Seconds (CU) are the base compute unit. SKU size determines the number of CU’s you have available.
Capacities are self-managing with bursting and smoothing. When there’s too much smoothed usage, throttling is applied. Interactive job delay 20s when 10min > Usage < 60 minute. Rejection when 60m> Usage < 24 hours. Usage > 24 hours, background jobs start getting rejected.
New – Surge Protection – simple experience that limits overuse by background jobs. Throttles background jobs before 24 hours of CU’s is consumed. Keeps throttling until the capacity s “healthy” as defined by the customer. Throttling background jobs will help 40-60% of capacities experiencing interactive rejections.
Surge protection V2 – blocks workspaces for 24 hours if it exceeds a limit set by the capacity admin. Mission critical workspaces are excluded.

Auto scale billing for Spark

Spark jobs can be billed separately. If Spark jobs call other workloads like OneLake, those costs go against the capacity. Jobs are billed when they execute.
Can set a max limit on CU for use by Spark. Spark manages the limit ensuring pools don’t over-consume. Available in public preview.
A new auto scale compute for Spark page shows Spark compute consumed through auto scale – easy to track the configures auto scale limit. Same experience as for the capacity compute page. Helps to understand the compute specific to Spark that will be reflected on your bill.
This is a capacity level setting.

Microsoft Fabric Co-Pilot Capacity

All users can use Co-Pilot experiences. Cost of Co-Pilot goes only to the selected capacity. Expected to be available April 30th.
Select who can use a Co-Pilot capacity. Help departments pay for their own usage within their own capacity. Can use an F2+ SKU for co-Pilot capacity removing the F64+ capacity.
Can designate an existing capacity as the Co-Pilot capacity.

Fabric SKU Estimator (Preview)

Delivers capacity recommendation based on workload details that you enter through a web portal. High level estimator designed to identify the appropriate SKU for their needs. Customers should try out the free trial and monitor usage. Microsoft Sellers and partners have access to a highly granular tool for estimating (the Excel spreadsheet!)
Capacity Chargeback reporting – Helps allocate costs across the org. Rolls up usage by workspace / item / user.

Recommended Approach

Ensure your capacities are correctly sized. Use dedicated capacities to optimize quality of experience and costs. Isolate production, development, testing in separate capacities. Budget for variability.

Enable Surge Protection. Monitor usage using metrics app. Adjust workload limits like pools, memory, and timeouts. Share best practices with colleagues.

Consider Autoscale billing for Spark. Consider Fabric Copilot Capacities. Leverage pause/resume. Resize capacities as needed. Move problematic content to rescue, time-out, or testing capacities.

These updates and recommendations bring a bit more order to Fabric capacities – multiple capacities per tenant with potential to allocate costs by business units, and tools like Surge Protection and Spark auto-scale tracking are intended to help manage costs. However, Spark as PAYG pulls it out of the main capacity, creating a separate bill for heavy data engineering workloads, whilst a Co-Pilot capacity (even on an F2) adds another layer of expense. Presumably complex Spark use and the proliferation of AI upset the capacity estimator.

The “Fabric Capacities” updates encourage isolating workloads—production, development, testing, Spark, and Co-Pilot – into separate capacities. Keeping ETL and reporting workloads apart (on separate physical servers) was considered best-practice back in the day for precisely the reasons of avoiding contention for resource. Fabric’s latest features and some of the recommendations that are emerging, with Surge Protection curtailing background tasks (such as ETL) to preserve interactive operations (like reporting), aligns with that logic.

And Now, The End is Near

This will be my last daily update from FabCon. I’ll reflect on the way home and write a final summary of how reality fared against initial expectations. Early thoughts are that it has been a good conference with messaging at the right level, and insights to take back to help Simpson Associates customers unlock the power of their data. Thank you very much. Elvis has left the building.

Blog Author: Mick Horne, Data Analytics Practice Manager at Simpson Associates.

Read the full series here and dive into the latest FabCon updates and announcements.

If you would like to explore the potential of Microsoft Fabric for your organisation, get in touch now.