The Background
Back in 2010, the following players comprised the upper right "leaders" section of Gartner's Magic Quadrant for BI and Analytics Platforms: Oracle BI, Microsoft Reporting Solutions, IBM Cognos Analytics, SAS, MicroStrategy, and SAP. Today in 2022, these often get labeled with the "legacy" moniker, and are heralded to have made way for the latest generation of tools such as Microsoft Power BI, Tableau, Qlik, Looker, Domo, ThoughtSpot, Sisense, TIBCO Spotfire, and more.This latest generation of business intelligence tools, often branded as "data visualization" and "self-service" tools, marked a notable shift from the previous generation by leveraging low/no-code environments and user-friendly drag-and-drop configurations to create more visual reports and dashboards to better engage analysts and facilitate more insightful learnings. This shift was concurrent with, and influential in, the democratization of data away from just IT developers and into the hands of the business analyst that was closer the meaning of the data, and the story that needed to be told to functional leaders.
The Problem Statement
The adoption of tools by data citizens outside of IT has helped ensure the world runs on data and effective story-telling, but this has also come at a cost: for the better part of the decade, tools like Power BI and Tableau lacked functionality to be easily integrated into enterprise-grade solutions with the proper level of data governance, security, and support for modern DevOps approaches like continuous integration and continuous development (CI/CD). As a result, it's been easy for business users and units to develop standalone point solutions, but these eventually become isolated solutions in the organization, difficult to maintain, and difficult to reconcile with sources of truth from reporting solutions from other business units.Even today, it's not very easy to develop solutions with Power BI, Tableau, and others at-scale. Though, it has gotten better and, as these tools continually more commonplace, the analytics community needs more architecture and development process patterns that can be leveraged to build and scale enterprise-grade solutions.
The aim of this post is to help share the DevOps patterns and processes we recently leveraged when building an Azure-based end-to-end modern data solution with Power BI reporting for a large client in the Consumer Packaged Goods (CPG) industry.
An Example Solution Architecture and DevOps Processes
Our Solution Architecture leveraged data sources ingested into an Azure Synapse "lakehouse" architecture and fed to a consumption layer comprised of Azure Analysis Services (AAS) and Power BI. The Power BI "stack" included development in Power BI Desktop, leveraging DirectQuery to multiple AAS cubes (can be referred to as "composite models"), publishing to Power BI Service workspaces, and surfacing via workspace apps for end-user consumption. Note-- although this solution leveraged AAS cubes, one leveraging the same semantic layer hosted in PBI Datasets on PBI Service with XMLA endpoints enabled can be considered as logically equivalent, and preferred in future deployments.To manage the Power BI solution as part of an end-to-end, modern software/analytical solution, we leveraged the following DevOps Architecture & Processes:
- Source Control via Azure Git Repos. Here, we stored Power BI files (PBIXs) connected to TEST databases. We briefly entertained the notion of version control via OneDrive, but ultimately wanted the Power BI development team to operate with the same general tools and principles as the rest of the software and data engineering development team.
- Backlog Management and User Story Definition via Azure DevOps. Here, we leveraged agile principles and user-story-writing best practices to ensure well-structured and well-defined work items. More than this, we leveraged "tags" to more easily conduct Sprint Planning and work assignments to ensure that a single PBIX was worked on by only one developer at a time, which is a necessity, due to the inability to perform "merges" in a tool like Power BI.
- TEST / PROD Workspace Pattern on Power BI Service. We employed a pattern of only publishing PBIX files connected to TEST databases in the TEST Power BI workspaces. Similarly, PBIX files promoted to PROD workspaces would always have connection strings updated to point to PROD databases. Note: we did not leverage a DEV environment for Power BI development due to the lack of meaningful data volume and data quality in DEV databases. In addition, we can generally consider Power BI Desktop as a "DEV" environment of sorts, so this may be considered a preferred and more efficient pattern regardless of DEV data quality. As a result, an important distinction to be aware of is that all development in Power BI Desktop was completed against TEST databases. This may seem counter-intuitive, however, we found it to be the best balance based on the level of our data quality. Completing Power BI development against PROD data may seem desirable, however, this would require new back-end features to be released into PROD before front-end developers are able to leverage them. We found this to be slow and undesirable, especially as front-end exploration was often necessary to confirm and validate that back-end features are working properly before they can be promoted to PROD.
- Short-Lived "Development" Files and Branches. In traditional applications of version control, a branching strategy is employed to govern work being completed by many developers on the same software solution in parallel. In Power BI and other most data visualizations tools, this is not possible due to the inability to compare PBIX files at a line-by-line level to identify differences between the two files. Instead, a process is required to ensure a merge of a "development" branch into a "main" branch introduces the correct features and does not "blow away" any other features in the process. To address this, we implemented a naming convention process for "development" files/branches that appends the User Story Number from the Azure DevOps Backlog to the PBIX title. For example: "dashboard_001_002.pbix" would be the short-lived working title for "dashboard.pbix" that exists in the PROD and has User Story #001 and #002 as requested features from the business. This allows the publishing of "development" files to the TEST and PROD workspaces to verify/QA features without over-writing the "official" file in PROD that end-users are accessing. Once features are verified/validated/QA'd and the file is ready to be merged into the main branch, it is retitled as "dashboard.pbix" and a Pull Request created and completed.
- Development and Testing with Files (and not just Workspaces). In reading that no "DEV" workspace exists, it may be confusing to reconcile traditional approaches to "development" and "testing". Similarly, in reading that a "TEST" workspace exists in the solution architecture, it may be natural to assume that User Acceptance Training with business/end users is conducted from that workspace. This would be an incorrect assumption! With this framework, we shift our thinking of development, testing, and UAT into how we leverage specific PBIX files, not specific workspaces. "Development" occurs within Power BI Desktop by the primary developer in a file titled as described in the previous section ("dashboard_001_002.pbix"). The developer should test their features within Power BI Desktop, and, when ready, publish the "development" file to Power BI Service. As described earlier, Power BI Desktop development is conducted against TEST databases. Therefore, for a Power BI developer to properly "test" their features, they need to promote that file "dashboard_001_002.pbix" to the PROD workspace to see it with PROD data. Here, the developer should complete their own "testing" of the new features, and also solicit at least one colleague to perform a secondary round of "testing" (using the well-written User Story Acceptance Criteria to validate against). Once "testing" is complete, we have confidence that the "development" file can be merged into the "main" branch and file. You'll notice-- both the TEST and PROD workspaces were leveraged to complete what we would normally consider the "testing" process.
- User Acceptance Testing with Files (and not just Workspaces). We can extend the discussion in the previous section to formulate an approach for User Acceptance Testing with a broader group of business power users. When a dashboard or set of features are ready User Acceptance Testing, the main branch file can be retitled "dashboard_UAT.pbix" and promoted to the PROD workspace. Here, it will, of course, be connected to PROD data, which will be necessary for User Acceptance Testing. As feedback is received and bugs or new features are completed, the normal processes described can be followed. Let's imagine Work Item #003 is a new User Story that emerges from User Acceptance Testing, and Work Item #004 is a bug. The Power BI developer will create a new "development" file titled "dashboard_003_004.pbix" and complete the normal development and testing processes described. After it has been merged into the main branch, a file with the title "dashboard_UAT.pbix" can simply be published and promoted again, over-writing the previous version. This has the important benefit of ensuring the PBIX and URL that business users are accessing never changes or gets taken down during the User Acceptance Testing process. Once User Acceptance Testing is completed, the "dashboard_UAT.pbix" file can be removed and the proper "dashboard.pbix" can remain the only relevant file in the PROD workspace and app.
- Workbook Promotion via Power BI Deployment Pipelines. The usage of PBI Deployment Pipelines as the sole method to promote content into the PROD workspace and over-write end-user-facing content ensured the level of governance needed for a solution of this size, and also reduced any manual errors introduced by developers changing connection strings in PBI Desktop, and accidentally forgetting which database environment is active when publishing to source control or to Power BI Service. Notably, when we first built the solution, we leveraged Azure DevOps Deployment Pipelines to deploy PBIX files. This allowed for the releasing of the entire end-to-end solution artifacts to be versioned and released together with a single pipeline. However, it became clumsy and slow for front-end Power BI deployments to be dependent on back-end solution builds and releases, which could take 5-10 minutes on their own. This became particularly frustrating when numerous minor or cosmetic Power BI changes needed to be made rapidly or in relatively quick succession. It became clear that a standalone pipeline for Power BI deployments would be a much better solution. In addition, due to the incompatibility of the newly released DirectQuery data connections to anything but PBI Deployment Pipelines, the switch was made, and we happily never looked back.