In year 2003, I was working in for global pharmaceutical company in UK; helping them to build their enterprise data warehouse. We had nine different subject areas and it took this customer four years to implement this data warehouse.
Even in the pilot phase of the project, we were asked to load 100,000 rows of research data into the data warehouse. The data warehouse was supposed to store and process 10 terabytes of data in production. With the legacy technology given to us to work with, we ended up testing the limits of chosen platform to ensure we got what we wanted.
16 years later, we live a completely different world. Data storage is very cheap. Cloud solutions are promising 99.99% of availability and you can hold terabytes of data in any cloud platform and just forget about the cost. However, the key question is how much analysis you would like to do with your data. The more analysis you want to do, the more computing power you would need.
While I was reflecting on this topic, I realized that in last 6 years I have met many CIOs', CDOs' and chief architects asking the following questions all the time to me when they are in the process of modernizing their data ecosystem.
- Do we need build a data lake?
- Do we still need to have our data warehouse when we already have a data lake?
- Should move my data warehouse to cloud?
- What are the key attributes of modern data warehouse?
If you have any of these questions this blog is for you. Please note that data warehouse has raised from its own ashes like a Phoenix bird. But the new data warehouse accommodates best of all technology for your digital agenda and you’re AI/ML strategy). Please review the following 4 areas of your data warehouse to be sure that you are not building an old school solution.
Seamless integration with structured, semi structured and unstructured data
Your new data warehouse should give your development team an easy and seamless working environment across the type of data. Data sources can be structured (i.e. excel, CSV or relational databases) or semi structured (i.e. JSON, XML, Log files) or unstructured (i.e. video/audio feeds). It should be easy to integrate these types of data when you are building your new data warehouse.
Cloud and hybrid deployment options
Your data warehouse should be able provide you with cloud and hybrid deployment options. Forrester recently did a study where they found 47% of organizations increasing their cloud deployments for big data specifically. This new deployment options also make integrations to SaaS solution easy and simple.
Depending on the sensitivity of the data you can choose which data domain in your new data warehouse will use which suitable deployment option.
Batch and real time integrations
Real time analysis should be feasible on your data when its needed. Trading and many other businesses need extremely low latency. like Your business may like leverage the power of optimization, prediction, recommendation done on real time to bring your business more closure to your customers.
Your new data warehouse should be able equip you with technology that makes all of these possible in real-time.
Bundled with advance analytics and machine learning capabilities
Everyone is talking about machine learning and AI. If you still don’t know how can use the power of it, think out of box. You may suddenly realize that most commonly discussed issues by your customer team or your finance team can be solved by adding few lines of machine learning code at the right place. Your modern data warehouse solution should enable you with AI/ML capabilities anywhere, anytime to help you make critical decisions.
In summary, if you are planning for a new data warehouse or if you are revamping your old data warehouse, please look for all the new opportunities the modern technology offer. Think long term and design a data warehouse that will cater all current and future needs of your business