Artikel

What is a data mart?

Explore the key uses of data marts, how they are structured, and how they differ from data warehouses and data lakes within an organization’s overall data storage strategy.

Überblick

This article takes a look at the key uses of data marts, how they're structured, and how they compare to data warehouses and data lakes when it comes to an organization's overall data storage strategy. It also explores the advantages and disadvantages of different types of data marts, including dependent, independent, and hybrid approaches.

Finally, it offers an overview of some of the challenges and limitations that data marts can pose for some business units.

What is a data mart?

The data mart is a subject-oriented slice of the data warehouse or database logical model serving a narrow group of users. It's created so that a particular business unit, such as sales, marketing, or customer service, can quickly access crucial data for faster analysis.

Many data marts contain only a subset of data from the full tables in the data warehouse: For example, while a data mart for a specific department may have information from multiple sources—like sales transactions and inventory records—there won't be many sources. Typical data marts range from 5 to 20 tables, as opposed to the 4,000 or more tables common in an enterprise data warehouse.

A small number of tables doesn't necessarily mean a small amount of data. A single data mart table could hold hundreds of terabytes. But that vast sum is just one type of data, like call logs at a wireless telecom company. Additionally, data marts only capture summaries of data as result tables, potentially leaving behind worthwhile details.

Data mart vs. data warehouse vs. data lake

Data marts, data warehouses, and data lakes all support analytics, but they’re designed for different scopes, users, and data types. The table below summarizes the key differences.

Comparative Focus	Data Mart	Data Warehouse	Data Lake
Scope	Narrow, subject- or department-focused; “deep but not wide.”	Enterprise-wide repository that consolidates data from many sources to support BI and analytics.	Enterprise-accessible storehouse designed to hold enormous amounts of data in its native formats.
Typical Users	A narrow group of users focused on a specific function or subject area.	Business analysts, data engineers, data scientists, and other BI/analytics users.	Data scientists, developers, and engineers; teams running advanced analytics/ML and exploration.
Schema	Usually modeled for the domain (often star or snowflake schema).	Schema is defined before ingestion (“schema on write”); structured for consistent analytics.	“Schema on read”; data can be stored before being formatted/defined.
Processing Pattern	Curated and purpose-built for fast analysis on a subset; often sourced from a warehouse.	Data is typically cleaned/structured via ETL to optimize querying and reporting.	Ingests/stores raw data from many sources; supports batch and real-time processing at scale.
Cost (Relative)	Smaller scope can be lighter-weight, but multiple marts can increase duplication, complexity, and costs.	Can be difficult/expensive to scale and may require substantial ETL effort for diverse data types.	Designed for long-term storage at lower cost and cost-effective scaling for large volumes of raw data.
Common Use Cases	Department or subject-area analytics (e.g., sales, finance, marketing) using a focused subset of data.	BI, dashboards, reporting, and enterprise “single source of truth” analytics.	Data exploration, data science, ML/AI, and analytics that need varied structured + unstructured data.

In practice, organizations often use these together. For example, a data warehouse may feed one or more data marts, while a data lake supports broader ingestion and exploration.

Data mart structure: Tables and schema

Tables are the primary components of a data mart, and they combine to form schema, which come in two main categories:

Star schema

Imagine rows and columns of data in five spreadsheets. Four of the spreadsheets are connected via key fields that match the largest sheet, called the fact table. If that fact table contains 50 million records, it likely won't fit in any conventional spreadsheet, so it must be spread into multiple data mart tables. When arranged in 5 to 10 tables, the resulting design pattern is called a star schema. The fact table is the center, and the others, known as dimension tables, make up the points of the star.

Snowflake schema

To handle large data sets that require multiple fact tables, it's still possible to do so with a data mart, but the structure must be different: Hence snowflake schema, so named for the shape taken by a diagram of the tables and relationships within this data mart structure. Each fact table in the mart has four or five dimension tables, and their combination forms the snowflake.

Data mart types: Advantages and disadvantages

Today's enterprise can use any of the following data mart types, and sometimes do so concurrently.

Dependent data mart

Dependent data marts rely on information sourced from data warehouses to organize the data into particular subsets, as discussed earlier. Storage, engineering, and other key operations are handled by those running the warehouse.

Dependent data marts are especially useful for business units whose team members want to harness the power of analytics without having to be experts in data science or data warehouse architecture. They can focus on specific queries and analysis most relevant to their departments. That said, because these data marts are bound to a warehouse, any problems with the warehouse become problems with the marts.

Independent data mart

An independent data mart doesn't need a data warehouse to exist. In fact, it effectively functions as its own warehouse: The independent data mart collects external and internal data from various sources and aggregates it into a miniature data warehouse.

This type of mart is most useful for short-term projects, small and medium enterprises that want to quickly create a data warehouse, and niche business units that want a highly customized data repository. The main drawback to an independent data mart is that it must put the data through extract, transform, and load (ETL) and cleansing processes, both of which require significant data engineering knowledge.

Hybrid data mart

With a hybrid data mart, analysts and other users source data from a data warehouse, but it's not the only data source it can access: The hybrid mart also collects data from independent databases unconnected to the enterprise data warehouse, cloud applications, and many other points of origin.

Hybrid data marts are ideal for organizations with multiple databases or even multiple warehouses. But like its independent counterpart, it demands near-expert technical aptitude to run most effectively.

Challenges of data marts

Data marts are another tool in the data management toolbox and should be used when they can provide a specific benefit. For example, their structure and relative simplicity make them ideal for ad-hoc reporting in a hurry, or for maintaining reporting operations when resources are tight.

However, data mart limitations make them impractical for every business unit. Redundancies and duplicate data are common when overusing data marts, potentially leading to data drift that impugns data quality and can endanger key reporting processes. Additionally, data marts invariably create data silos. While the occasional data silo isn't disastrous—and may be necessary for business units with extraordinary security needs—many silos create a disjointed organization. Finally, data marts can't handle the complex queries that warehouses can.

Data teams and their leaders should use a balanced approach that leverages data marts, warehouses, and lakes in their appropriate contexts. Teradata VantageCloud, the complete cloud analytics and data platform, provides an ideal database management system (DBMS) for this multifaceted, disciplined data management approach.