Data...What? Whatever You Call It, Stay Away From a Data Mess!
Data Warehouse, Data Lake, Data Mesh, Data...What? If you are confused by the proliferation of these IT terms, allow us to shed some light.
Let's face it. We IT people love buzzwords. Sometimes we think the quality of a solution improves with the quantity of flashy words or acronyms used to define it. And we can't be more wrong! Each day we must learn a new word or acronym to stay updated. However, are things really changing that much at their core? Or is the old mantra "the more things change, the more they stay the same" still ruling our world? Let's dig a little deeper on that.
Setting the background for data architectureIn the 80's, we saw a new paradigm emerging -- the Data Warehouse architecture -- built to consolidate data from different systems to create a reporting infrastructure to serve business users, and at the same time, save some dollars by freeing processing from expensive mainframes and transactional platforms. As time evolved and business requirements got more complex, we've also seen the Data Warehouse evolve to a more mature and robust architecture to serve more and more use cases, from Batch Reporting to Complex Near-Real-Time event processing. We've been advocating this at Teradata for decades, as it is depicted in the image below.
Figure 1. Data Warehouse evolution
The 90s and 00s increased focus on the speed and the complexity of the data that were already used for analytics and enhancing those with more non-traditional sources (like web application logs or CRM outputs). Operational Intelligence was a key theme that made Data Warehouses more “active,” with intra-day or even intra-hour data loading along with the standard daily batches of data. There was also a switch from “looking at the mirror” to “looking at the front,” with predictive analytics as a key enabler of business decisions. Like Teradata’s CTO Stephen Brobst used to say, “it makes no sense driving a car by only looking at the rear mirror."
Although, you may view this as "old school” or legacy, we need to know where we came from to plan where we are going. We experienced a prosperous era with Data Warehouses delivering value to business users, with relational databases like Teradata dominating this market and bringing lots of business value and customer satisfaction to the world. On the dark side, we have also seen failed implementations, unable to deliver much value, because those implementations were too "IT-centric" with little connection to the business needs. The lesson learned was that architecture deployments must be aligned to and focused on meeting business needs. Spoiler alert: does this sound familiar to you today? We'll get back to this later.
The “Data Something” architecture issue
Over the years we've seen a huge explosion in data volume from numerous data sources, some with a huge volume of records produced at high speed. This variety and amount of information introduced new requirements in order to be able to capture and curate it. New ways of consuming the information extended the need for real time, even along with other challenges such as higher concurrency and increased data structure complexity. Congratulations! We just entered the Big Data Era! Buckle up, this is going to get bumpy.
The architecture principles that we've seen for decades started to shift to a different paradigm. We realized that having a one-for-all or multi-purpose platform wasn't enough and suddenly we saw the birth of the hybrid analytical ecosystem -- the "Logical Data Warehouse” or the "Enterprise Data Hub" among many other names (depending on which analyst firm’s lexicon you prefer). Going one step further, we met the new kid on the block: the distributed file system, powered specially by the Hadoop Project. This became the great enabler of the bright and new architecture pattern: The Data Lake.
Figure 2. Reference Information Architecture
Data Lakes became the "de facto" standard for many analytical ecosystems up to the point that if you weren't building a Data Lake your analytical ecosystem was not cool, and you weren’t perceived as modern. We've seen the great and mighty Silicon Valley players building their Data Lakes and providing new tools and knowledge to the framework. But guess what? Most implementations failed (if we built it, they will come, but they didn’t) although it was not because the design pattern itself. Most Data Lakes were built without any clear business purpose and without proper data management practices. It was "cheap” and “good enough" based on open source software on top of commodity hardware. No matter what companies thought, even open source and commodity have a cost and take a huge amount of effort to deliver real business value. We may agree that some social network and content streaming companies, among others, have succeeded on this trip, but the vast majority of companies have a very different way of doing business and building teams compared to the Valley companies. So, no recipe is good for everyone.
Figure 3. Data Integration on an Analytical Ecosystem
Fast forward to the present, from the swampy mud of the Data Lakes we see that Data Warehouses still playing their part. Wait… to make things more modern, we also see new frameworks rising too. We started to hear about the Data Lake House and the Data Mesh, among a few others.
Data is the air in wheel
We are living in a highly competitive era, with a marketplace full of great data solutions (and not so great ones), but let's focus on what is important, the data. Sadly, many solution providers try to differentiate themselves by adding new spokes to the data wheel, trying to shake the "status quo" without much real content. Is it bad to change? Certainly not. We advocate and embrace change to enable innovation. But remember that data is what powers the wheel of progress. We can improve the efficiency and modernize the design of the wheel to make it more secure and adapt it to different environments or weather…but at the end, it's still a wheel. Well architected data is what inflates the wheel to enable your business to drive to new heights in a safe manner with the fewest bumps in the road.
Data what? Data strategy!That's the word you should be looking for. Well, in fact, those are two worlds. No matter what is making loud noise outside, you need to keep looking at your business requirements and build the data strategy based on that. What data will bring more value? Which sources are already there and could complement this new initiative? How can we prioritize them? Which funding will be available to accomplish this? Can we guarantee a good ROI to get more funding? If we build this first, can we start to get value faster? After we answers those questions, we can apply architecture patterns to make the most intelligent use of the company's technological and monetary resources. Based on the value of the data and the characteristics of the required analytics, we can find a good fit for different architecture patterns. We may be able to reuse what's already proved useful. There’s no need to stick to one pattern but embrace what is useful of each one.
Figure 4. Teradata Analytical Roadmap
It's not the other way around. Trying to fit the data, analytics and even the business decisions into an architecture pattern just because it sounds cool is not good. Holding the company's architecture strategy and decisions to "just a word" is quite dangerous.
Remember, in the end, a proper data strategy will deliver more business value than chasing a shiny new architecture buzzword. Oh, by the way, how many buzzwords have you spotted on this article?