The Power of Prioritization in Data Management
Find out how the early architectural decisions surrounding the Teradata Database are still making a critical contribution to performance today. Read more!
Teradata celebrated its 40th anniversary last month, July 2019. I made a good career move by joining the company just before its 10th anniversary. From time to time I pause and remember the original Teradata Database and how many of those early architectural decisions are still making a critical contribution to performance today.
One of the functionalities from the original Teradata Database that I value the most is prioritization. While today there are a number of sophisticated options available for managing work that enters the system, the original architects had the forethought to embed simple prioritization capabilities in the database from Day One.
If you've ever been to an emergency room on a Saturday night for something mild, say a sprain, you probably noticed that as soon as an ambulance delivers a guy with chest pains, he gets seen immediately while you continue to sit in the waiting room. Or when you're commuting to work, and a fire truck comes up behind you with his sirens screaming, you pull over and give the firemen plenty of space to pass. This slows down your commute a bit, but it speeds up fire truck's arrival at the emergency.
These are both examples of prioritization in action in our everyday lives. The critical things jump ahead and get serviced sooner, the less important things wait. You may not like spending a little more time in the waiting room or by the side of the road, but overall, prioritization adds value.
Prioritization in the Database
The original Teradata Database came with four different priority buckets. Everything it executed ran at one of those four levels. With those contrasting priorities, the database was able to control the level of resources being offered to different internal tasks and influence the speed at which they completed. By default, all user-submitted work, such as SQL queries or load jobs, ran at the medium priority.
Even though no customer back then gave a thought to applying those available priorities to their own queries, the database code itself took great advantage of prioritization.
Prioritization supports one of the common themes that was woven into the database: It is more important to complete work that is already under way than it is to start new work. This is similar to an attitude you might promote if you were running a restaurant that is frequently jammed with guests, and people are waiting to be seated. It makes sense to prioritize credit card processing for guests ready to leave ahead of taking new meal orders from guests who just sat down. That way you can free up space sooner for still-waiting guests to be seated.
One example of the value placed on completing already-active work has to do with query completion. Just like settling a diner's bill quickly, returning a query's answer set in the database is given an elevated priority. This allows an almost-complete query to exit the system sooner and free up resources for others.
If something goes wrong during a user request, maybe an update transaction fails, the internal tasks involved with the clean-up necessary to get things up and running again are given a very high priority. If a guest in your restaurant spills a glass of wine on the floor, it's important to clean that up quickly so others can move in and out without slipping. Anything the database does related to the health of the system or availability of the data runs at a high priority.
Many background tasks have been embedded into the foundation of the Teradata Database, and they are generally assigned a low priority. Most of us are acquainted with background tasks in our daily lives. Think about street sweeping or garbage collection, things you rely on that seem to run automatically and unobtrusively.
Many background tasks in Teradata aid in the placement and management of user data in the database. Disk space in Teradata is not pre-assigned, but rather is handled as a pool of available resource that is controlled entirely by the system. Rows are stored in variable length data blocks that can grow or shrink over time, and these data blocks can be dynamically moved to different locations. Background tasks keep the flexible space management approach afloat and frees the administrator from ever having to do costly off-line re-organizations of the data. Priorities play a major role in keeping these internal tasks below the surface and low impact.
Customers Take Notice
Over time, customers began to take notice of these built-in priorities. I was a witness to one of the first production applications to use an elevated priority for simple direct-access queries. Cashiers submitted these queries when the price of an item needed validation during customer checkouts. Running this application at the "rush" priority allowed these queries to get resources immediately whenever they entered the system. Consequently, there was no delay in getting pricing validation information at the time it was needed, no matter how busy the system was.
Usage of priorities has snow-balled from there. In response to this shift in customer applications, an expanded set of priorities as well as other workload management options such as concurrency control and the ability to reject queries has been implemented. Teradata's workload management is continuing to evolve and improve today.
At the same time, the database itself continues to map its internal activities and background tasks to different priorities just as it always has. So, when a siren and flashing lights are needed to detect a deadlock condition or clean up a failed transaction, priorities in the database are there to ensure that happens, gracefully, effectively, automatically.