by

Governance in Microsoft Fabric: Current Limitations of Lakehouse Scans with Purview

As more organizations has started embracing Microsoft Fabric for unified data analytics, governance is quickly becoming a Hot and much required topic, especially for those relying on Microsoft Purview for governance. One recent internal discussion that highlighted some key pain points and lessons learned is when trying to scan and classify Lakehouse tables using Microsoft Purview.

In this article, I am going to walk through a scenario where I collaborated with a peer from the community who was attempting to enable data classification in their Fabric Lakehouse environment and the lessons that we learned along the way.

Initial Challenge: Should You Create a Dedicated Scan Rule Set?

His kicked things off with a straightforward question:

“Hey, does it make sense to create a dedicated Scan Rule Set for the Lakehouse tables (single schema tables, not multi schema)? Or the default behaviour should be good enough?”

They were trying to assess if it was worth customizing scan rules for Lakehouse, especially in environments without multiple schemas (which are still in preview). The goal was to find best practices that can be easily implemented:

  • Scan Rule Set configuration
  • Working with schemaless or single schema tables
  • Optimizing classification performance and accuracy

Reality Check: No Auto-Classification Yet!

I searched through the docs or tips from product team or Purview experts and then got to know that:

“There is a limitation for Classification, there is no auto classification for Fabric Lakehouse right now.”

  • You cannot create Scan Rule Sets for Lakehouse yet.
  • The only scoping level available is Workspace level.
  • There’s no granular drill down to just the Lakehouse (like drilling down to Lakehouse level or table-level rule sets) are not supported yet).

This is expected to be addressed in future releases (possibly in 6+ months??), but for now, rule customization is not an option, so plan accordingly.

Lesson learned while trying to implement: Missing Tables in Scan Results?

Another issue surfaced for us when Lakehouse tables were not showing up in the scan results even though Purview and Fabric were successfully connected using a Service Principal. So, the obvious question we had was what exact permissions/roles are required for the SP (or its security group) in Fabric to access Lakehouse schemas and view DeltaLake tables?

The answer was Contributor access for the SP in the Fabric workspace that contains the Lakehouse. We did some trial and error and once we added the security group with SP to the workspace as a Contributor, the scan started working correctly.

Key Takeaways for Now

Until Microsoft enables deeper integration between Purview and Fabric Lakehouse, here is what you should keep in mind:

  1. Scan Rule Sets Cannot Be Customized for Lakehouse Yet
    Stick with default rules and scope at the workspace level. Fine-tuned classification is not yet supported.
  2. No Auto Classification on Lakehouse Tables
    Manual classification or external enrichment may be required until this limitation is addressed.
  3. Grant Proper Permissions to the SP
    Ensure the Service Principal (or its security group) has Contributor access in the relevant Fabric workspace.
  4. Expect Growing Pains
    These gaps are expected for early stage integrations. Microsoft has likely already placed this on their product roadmap, so stay tuned.

Final Thoughts

Microsoft Fabric is moving fast, and Purview’s integration with it is improving, but it is not fully there yet for Lakehouse governance. If you are setting up scans and classification, avoid over engineering Scan Rule Sets for now (no other go!) and double check SP permissions in your Fabric environment.

Write a Comment

Comment