What is a Modern Data Warehouse?
A Modern Data Warehouse is, in simple terms, a way to streamline data workflows to make them more efficient. Key to the Modern Data Warehouse is, of course, technology enhancements; however, this is not the only thing that makes a Data Warehouse modern. It is also about the processes and capability employed to deliver the insights, as the changes in skills and data development approaches are equally, if not arguably more important in what a Modern Data Warehouse is.
Common features of a Modern Data Warehouse
On the basis that a Modern Data Warehouse is made up of Technology, Processes and Capabilities, I would suggest some of the common features are:
- Move to Cloud – Hybrid/IaaS as minimum, however PaaS is truly what makes it modern.
- The use of OLAP cubes only where there is a clear rationale and justification – understanding that High Performance computing has alternative approaches for certain use cases.
- Modern Development Operations Process – using Source Control and CI/CD release processes and workflows.
- An ability to utilise multiple data workloads – rather than just using one type of data pathway/ETL approach.
- Ability to utilise more agnostic data skills – remove the need to have specific Technology related capabilities.
- Inclusion of Data Science and comprehensive Analytics – Machine Learning and AI capabilities.
Focusing on the Technology aspect, I would like to expand one particular pattern that could make up the Modern Data Warehouse: Big Data.
Over 15 years ago, the way we talked about data analytical platforms was simple: a standard Batch Data Warehouse approach and an emerging “Big Data” approach. Many organisations utilised the emerging Big Data approach, as the technology that would support this was mainly HDFS and Hadoop.
This batch approach often used a Data Warehouse, which took data from several sources with various degrees of structure and quality, cleaned it and modelled it in a way that’s super simple for reporting and has some descriptive and historical analysis.
The Big Data approach (aka Data Lake) utilises both High Performance Computing on top of unstructured and raw data normally stored in a partition storage. This approach is different from the traditional Data Warehouse approach as it creates flexibility and efficiency in development and performance.
A Modern Data Warehouse can take the form of what is commonly known as a Data Lakehouse*, which combines Partitioned Storage as the Landing and Staging area with High Performance Computing (normally in the form of a distributed database), to prepare and model the data for reporting consumption. Combining the benefits of Big Data and Traditional Data Warehousing is one technical aspect that can make up a Modern Data Warehouse.
Why use a Modern Data Warehouse?
Well, the answer lies in value for effort and money as well as the ability to deliver high quality insights to users in a more timely way.
Many traditional data warehouse approaches will process all data in the same way, regardless of the frequency of use, volume, quality, and particular use case. In other words, a ‘one size fits all’ approach. This drives a potential imbalance between the cost and benefit of processing the data. The costs involved are not just those of hardware and software and processing time, but also development time and the opportunity cost of not having the information to make the best decision for the business. The Modern Approach utilising High Performance Computing allows for flexibility in how we deal with different workloads based on their attributes. This ultimately means saving time and money.
The old approach to a Modern Data Warehouse
Let’s look at an example. At Simpson Associates, we are Microsoft Gold Partner and have deployed many Modern Data Warehouses involving Microsoft Azure and Power BI. In the past we may have utilised Microsoft SQL Server Integration Services to extract data, transform and load it from the source through to a dimensional model in either SQL Server or Analysis Services. Then present the data in SQL Server Reporting Services.
The new approach to a Modern Data Warehouse
The new approach could be to use Microsoft Data Factory, Azure Integrate or indeed Azure Synapse to ingest data and store in Azure Data Lake. We wouldn’t attempt to reduce the data that’s ingested, stored, and consolidated as the storage is relatively inexpensive and would allow:
- Only data which needs to be processed into a Data Warehouse Model to be processed in this way.
- Alternative workloads such as Azure Machine Learning to be able to use the Azure Data Lake without needing the data processed through a Dimensional Model.
- The ability to quickly react to additional Reporting Requirements in the Model without having to go back to the original source, thus improving the reporting Lifecyle timescales.
Again, this solution offers flexibility with the fact that alternative patterns can be used, but also the batch processes can use more performant and modern computer capabilities to deliver insights quickly and in a more cost-effective manner.
This is just a bit of an insight into what we, at Simpson Associates, would consider a Modern Data Warehouse. Of course, there are many aspects to what, why and how this would work in your organisation, which is probably the most important point that is mentioned through this blog; flexibility. In my opinion, a Modern Data Warehouse is one pattern or approach designed to deliver the goals of an organisation, dependent on where you are in your data journey. It’s easy for others to comment on whether or not a simple migration of an existing on-premise solution to something like Azure is truly modernising the platform, without knowing the ins and outs of your particular situation. If it delivers on current objects of the Data Warehouse and enables your organisation’s future corporate, data strategy and road map to be realised through flexibility, I suggest that is a Modern Data Warehouse.
*I have a love/hate relationship with the water-based naming conventions that are common in data vernacular, but I would like to take this opportunity to coin the phrase data Wharf. A Wharf is a warehouse at a river front, so a Data Wharf could refer to a Data Warehouse being supplied directly by event based streamed data.
Blog Author
Andrew Edge, Business Development Manager, Simpson Associates
Don’t forget to join us on Linkedin