The Road Ahead: Integrating Amazon S3 and Azure Blob into Teradata Vantage
Carrie Ballinger discusses the future state of Teradata's native object store and what it means for the Vantage platform
Do your closets at home ever get cluttered? Is your garage or storage locker stuffed with yesterday’s projects, previous years’ tax returns, books you own but haven’t read, and other belongings that you don’t use but just can’t bear to give away?
Just like cramming that exercise bike you’ve stopping pedaling into your garage, you may have turned to Amazon Simple Storage Service (S3) or Microsoft Azure Blob storage as places to keep big stores of data your business has collected but that you haven’t completely figured out what to do with.
Unlike your garage, however, Amazon S3 and Azure Blob aren’t limited by having the equivalent of a fixed ceiling and four walls set in concrete. With Amazon S3 and Azure Blob, you can keep vast amounts of semi-structured or unstructured data economically, such as your web logs, sensor data, or call center transcripts as they continue to accumulate
However, getting at and making use of the objects you’ve parked in your Amazon S3 or Azure Blob data lake is typically much more cumbersome than just opening your garage door and looking around. Data lake end users often spend significant time in coordination activities, such as:
- Understanding data lineage
- Extracting data
- Provisioning temporary storage
- Transforming data
- Moving data between tools
- Transporting the data to one or more platforms where it can be analyzed effectively.
That can be a big headache – much more than just a Sunday afternoon diversion.
Consider this: it can easily take days, weeks, or even months from the time you want to explore data to the time you can do so. Because of this time lag, there’s a risk the data will be stale and semi-irrelevant by the time you finally get your hands on it, potentially killing most of the value proposition as well as any time-to-market advantage you may have had when you started.
Fret not, however: Teradata’s Native Object Store capability, which will be available in an upcoming release of Teradata Vantage in 2019, opens the door for data scientists and business analysts to read and process semi-structured or unstructured data from Amazon S3 or Azure Blob object stores using standard SQL.
And, amazingly, it involves very minimal effort and no new platforms.
With Vantage, all you will need to do is define a “foreign table” structure in the Teradata SQL Engine and supply location information that points to one or more data stores. This table definition becomes a window through which you can reach out, grab, and filter object store data, bringing in just the pieces you want and leaving in place whatever you don’t want.
Once the foreign data has been accessed, Teradata software does the rest, including decompressing, decrypting, and transforming the payload data so that it ends up looking like a relational table.
Here’s one of the most exciting parts of Teradata’s Native Object Store story: You can join your Amazon S3 and Azure Blob data to other data that is already part of the enterprise, and, if you chose, make a persistent copy of the part of the object store data that you may want to spend a little more time scrutinizing.
Think about that for a moment: You get the cost, convenience, and space savings of keeping your huge volumes of data in Amazon S3 and Azure Blob data lakes, and at the same time you get to view the data lake contents alongside your established enterprise data by invoking traditional SQL.
The potential value of what you already have is immense. Those clickstream objects you’ve stored away might tell you where your customer has gone and what they have looked at recently – but the production, relational data inside the database could include customer transaction and profile information that will give you richer insights into who the customer is and their value of their account.
Like musical fusions – think rhythm and melody, bass and treble – both types of data are exponentially more worthwhile when they can be inter-mixed. The answers you glean from your data will be deeper and more complete when multiple types of detail can be viewed side-by-side.
Teradata Vantage with Native Object Store will give you that edge.
Pulling in and then processing data lake objects inside Vantage will provide huge benefits:
- No new platforms or unfamiliar tools
- All the parallelism of the database being brought to bear on the preparation and discovery process, putting your most scarce resource – the data scientist – in overdrive
- Teradata’s established workload management, security, and reliability making it easier and more secure to bring new algorithms into production.
But most important, you will be able to produce answers to business questions that really matter to the organization by means of the advanced analytic capabilities inherent in Vantage.
So, yes, go ahead and keep on throwing your object stores into data lakes – and be confident that by embracing Teradata’s future Native Object Store functionality, all of your data will be as easy to draw value from as that old exercise bike that’s just waiting to be dusted off and used to help make you healthy and happy.