datalake-1100x350

Data Lake Products

Harness the Value of Exploding Data Volumes

Data Lakes have emerged in recent years in response to organizations looking to economically harness and derive value from exploding data volumes. New data sources such as web, mobile, and connected devices along with new forms of analytics such text, graph, and pathing have necessitated a new Data Lake design pattern to augment traditional design patterns such as the Data Warehouse.

Companies are beginning to realize value from Data Lakes in the areas of:

  • New Insights from Data of Unknown or Under-Appreciated Value
  • New Forms of Analytics
  • Corporate Memory Retention
  • Data Integration Optimization

Yet confusion regarding the definition of a data lake abounds in the absence of a large body of well understood best practices. Drawing upon many sources as well as on site experience with leading data driven customers, a data lake is defined as a collection of long term data containers that capture, refine, and explore any form of raw data at scale, enabled by low cost technologies, from which multiple downstream facilities may draw upon.

blue design pattern

Data Lake Design Pattern

 

A design pattern is an architecture and set of corresponding requirements that have evolved to the point where there is agreement and best practices for implementations. How you implement it varies from workload to workload, organization to organization. While technologies are critical to the outcome, a successful data lake needs a plan.  A Data Lake design pattern is that plan.

The data lake definition does not prescribe a technology, only requirements. While Data Lakes are typically discussed synonymously with Hadoop – which is an excellent choice for many Data Lake workloads - a Data Lake can be built on multiple technologies such as Hadoop, NoSQL, S3, RDBMS, or combinations thereof. 

 

"Data lakes can be based on HDFS, but are not limited to that environment; for example, object stores such as Amazon Simple Storage Service (S3)/Microsoft Azure or NoSQL DBMSs like HBase or Cassandra can also be environments for data lakes." — Gartner, 2015

 

blue architecture

Data Lake Architecture

 

As the trusted advisor to the world’s leading data driven organizations, Teradata can help with the design, implementation, and support to ensure your organization avoids the typical pitfalls and realizes maximum value from your Data Lake initiative by ensuring critical capabilities and design principles based on best practices.

datalake architecture chart

 

blue products

Products & Services for Data Lakes

 

Think Big:  Big Data Consulting

Expert implementation and customization services provided by Think Big helps organizations successfully implement Data Lake initiatives to gain optimal business value, offers advanced implementation services and sophisticated integration of open-source technologies, including: 

  1. Big Data Strategy
  2. Data Lake Implementation
  3. Data Engineering
  4. Analytics and Data Science
  5. Managed Services
  6. Big Data Training

Teradata Appliance for Hadoop:  Delivered Ready to Run

Performance hurdles, prolonged implementation periods, and reliability issues – are solved by the Teradata Appliance for Hadoop when compared to solutions that are not preconfigured.  Teradata does the hardware and software integration plus plenty of testing so you don’t have to do it.  The Teradata appliance is delivered ready-to-run and optimized for enterprise-class big data storage and discovery.

Open Source Presto: Powerful SQL Engine for Hadoop and Beyond

Presto is an open source SQL-on-Hadoop query engine designed for running interactive analytic queries against data sources of all sizes. Through a single query, Presto allows you to access data where it lives, including in Apache Hive™, Apache Cassandra™, relational databases or even proprietary data stores.  Presto was created by Facebook for the analytics needs of extremely large data-driven organizations.

Aster Analytics on Hadoop: Drive More Value from Your Hadoop Data Lake

Easy to use, multi-genre advanced analytics at scale to enable business analysts and data scientists to quickly discover insights in their Hadoop data lake.  Aster delivers over 100 pre-built parallel analytic functions that runs natively on Hadoop to analyze data directly on HDFS. Aster Analytics is also YARN integrated to support multiple instances of Aster from sandboxes to production use cases in the same Hadoop cluster.

Teradata Integrated Big Data Platform:  the lowest cost per terabyte in the Teradata Platform Family

Get deep strategic insights from massive amounts of data with Teradata Database software & utilities. Analyzing multi-structured data began with Teradata Database 14.0 when name-value-pair functions and regular expressions enabled Teradata sites to process web logs using popular business intelligence tools. The Teradata Integrated Big Data Platform supports workloads such as deep history analytics, storage of massive amounts of multi-structured data, and a raw data landing zone for transformations. 

Download Open Source Presto

An open source distributed SQL query engine designed for running interactive analytic queries against data of all sizes. Via a single query access data where it lives, including in Hadoop, Apache Cassandra™, MySQL and PostgreSQL or even proprietary data stores.

Solution Showcase: Teradata’s Compelling Open Source Strategy

Open source software provides many opportunities for the tech industry, particularly around innovation and community building. In this paper written by Nik Rouda, senior analyst with Enterprise Strategy Group (ESG), learn how Teradata leverages open source technologies to support a commercial software strategy that benefits both the company and its customers.