By Simon Fryett
The world of Business Intelligence (BI) has been rapidly evolving over recent years and indeed has accelerated with the advent of new reporting tools / applications and expectations coupled with Self Service, Cloud and Artificial Intelligence (AI) demands.
For many years the traditional BI architecture has been based around a large established reporting toolset connected mainly to a Data Warehouse, and maybe a small number of other supporting sources (ODS’s), normally for operational reporting and Data not destined for the warehouse. This environment is managed and controlled centrally by the BI team, with access to the DW only via the reporting tools. Self Service is generally avoided with extracts provided by the BI team being preferred.
The challenges of evolving and new requirements are not met by the traditional architecture and tools and have resulted in other tools and applications being introduced and supported outside of the central reporting tool. The needs of AI and other analytics approaches have increased the demand for customised data sets, putting a greater strain the BI team.
Modern BI Requirements:
- Demand Quick tactical reporting tools that are easy to use and visually appealing often favoured by business users (Tableau and Qlik).
- Real Time Dashboards are becoming increasingly popular and graphical analytics imagery. (UK GOV Covid reports as an example)
- Data Scientists and AI need data and access to it that is often broader than the traditional DW. They often take a prototype approach to analysis trialling data sources that may not be needed after initial analysis.
- Real Time Reporting is becoming more important often aligned with digital transformation and the need for great agility with business decisions.
- Broader Self-service to the data by different areas of the business is in demand, with users not wanting to wait on IT to provision the data sets as often this takes weeks and months to gain secure access.
- Many new Data sources and types rapidly growing that encompass many different data types such as XML, JSON, CSV from both on premise and cloud-based such as Amazon S3 or Azure storage.
- Data Lake support and access, along with the many data types that may reside within them.
- Data Governance and compliance are now becoming integrated into BI
Challenges of meeting modern BI requirements:
This has meant a growing demand for the provision of data sets for self-analysis in the business. This growing ecosystem of siloed reporting tools becomes increasingly hard and time-consuming to manage and control. It also leads to inconsistency in results as each tool harvests and presents its own version of the truth from its own identified sources, resulting in time-consuming reconciliation and justification of differing outputs.
For years self service access has been generally avoided by the BI function for good reason, but this results in delays in getting information to the users that need it and bring increasing workload on the teams as the requests grow. Modern analytics techniques that are are discovery based can require quick successive data requests as insights are found, having to wait days or weeks is not practical. Its becoming more and more difficult for the request for self service access to be denied. The reasons for resisting such as users not understanding which data to retrieve, security / data protection and load on source systems need to be addressed.
All these different methods of access produce a data security and access problem as each tool has its own direct access to information. Attempting to log access for regulatory purposes and to ensure users can consistently only access and see data they are authorised to or is not sensitive becomes increasingly challenging.
The very issues we tried to avoid with a data warehouse solution have come back as the requirements of the business have evolved. These challenges are not going to disappear and it is unlikely that a single tool will satisfy these varied requirements and self-service is going to have to be accepted as it will be too time-consuming and slow to not do so.
The new challenge is how you manage the inevitable multiple reporting and analysis tools in your environment.
Many organisations are finding themselves caught between the ‘headlights or challenge’ of how to move to provide ‘modern Business Intelligence (BI)’ tools/solutions incorporating elements such as AI and controlled self-service to their end-users who want and need to use familiar tools. While continuing to protect comply with increasing data protection regulation, secure sensitive data and ensure consistency and quick timely access to data for analysis or reporting.
The Solution: A Modern Data Platform:
All of these new challenges can be addressed by a modern data platform that provides a managed and curated single point of access to all of your data sources for reporting or analysis, be it your Data Warehouse, real-time operational systems, weblogs or other externally sourced data such social media feeds. This approach provides an agile and secure architecture for delivering the information needs of your business be it provisioned or managed self-service.
With a data platform, you access all of your source systems via the platform, utilising data virtualisation to access the data from source systems as needed, the data catalog and data protection layers should be integrated to provide access and data protection based on user / role and data policy. This approach ensures that different tools are accessing the same data are subject to the same data protection rules, providing consistency and security. As everything is passing through a single point, access can be centrally controlled and monitored.
With this “baked in” security and protection in platform, self service becomes viable as the user and data based restrictions can be applied in real time stopping the user from accessing sensitive or prohibited data or taking took much. To support self service a simple and intuitive data preparation tool is required that enables non technical users to collate, join and prepare data for their own use. This is not an ETL tool, but something more simple and striped back with an interface no more complicated than a reporting tool.
- The Data virtualisation layer enables data from disparate and varying data sources to be surfaced as simple relational database tables. Which means the reporting and analysis tools don’t have to support all the different types of data required. As new data sources arise the complexity of their access or conversion is dealt with once in the platform. Future proofing your investment in your reporting / analytics tools and prolonging the viability of your legacy and existing tools. Allowing them to sit alongside the new tools. It also means you can seamlessly change the source of data without the upstream systems being aware. This means databases or operational systems can be changed or moved without changes to multiple systems.
Roles based restrictions can be applied in terms of the number of rows and columns visible. Integration with the data protection module provides further security for sensitive data by redacting or obscuring sensitive data in real-time, based on your access permissions
- Central Data Catalog to provide single source of meta data and definitions. It should also support business terms and data lineage. Users can then Shop for data by Business Terminology or technical Meta Data definitions. The Catalog should also be the source for all data policies and data protection rules, that are linked to and govern the access to the underlying data via the business and technical meta data assets. Lineage to show how data is being processed from source to report or analytics tool, simplifying reconciliation and impact analysis.
- Data profiling, auto data discovery and data rules, ensure that the Data Quality of your critical data assets is known, monitored and managed. Within an integrated data platform these activities and their results are associated and linked to the meta data in the catalog, providing a means for the users of the data to see the quality / trust of the data they are using.
- Data protection should be managed and configured in the data catalog based on data policies and then assigned to the assets they need to govern. User / role based and be applicable to both company sensitive and personal sensitive data as determined by data policies and regulation. They should be applied in real time and either redact, obfuscate or remove protected data based on policy.
- Self Service / Data Wrangling capabilities to prepare, enrich and store data for your use and others should you wish to share it. This lightweight data preparation tooling enables power users to manipulate and enrich their own data in controlled and protected environment. Creating repeatable processes than can be scheduled to run periodically and whose results can be shared with others.
It is critically important that all of these capabilities are fully integrated into single architecture with each layer using and supporting the others. Without this tight integration, you don’t have a data platform you just have a selection of loosely coupled data tools that will need time-consuming and costly integration (if possible) and will not be able to provide the security and agility of a single platform.
Beware, there are many vendors that claim their data catalog, data warehouse or data virtualisation tool is a data platform but the reality is if they don’t have all capabilities above fully integrated, they are not going to be able to deliver on the data platform promise.
We at SmallNet have spent 21 years helping our clients solve data integration and analysis challenges, building data architectures to support all of their data and reporting needs.
If you want to know about how to modernise your BI architecture or any of the items discussed above then please get in touch. Additionally, we won’t bombard you with PowerPoints or white papers as we have real demonstration solutions set up to ‘show’ you the software directly.