BIAnalytix Tip Summary: What is a surrogate key, and why should I care?

What is a surrogate key?
Merriam-Webster says that a surrogate is “one that serves as a substitute”. In the Business Intelligence and data warehousing realm, a surrogate key is a field in a table that does two things: it uniquely identifies a table record (as all keys should) and it takes the place of something (thus the name “surrogate”). The something that a surrogate key is replacing is a business, or natural key. A natural key also uniquely identifies a record, but it carries business meaning with it as well. Think of a table of customers, you could either uniquely identify each customer by their Social Security Number, a value which carries with it a lot of information (and risk in storage), or you could assign each customer a random integer at creation time. The SSN scheme is an example of a natural key, the random integer scheme would be a surrogate key. “Random” isn’t really fair, typically it is an ordered sequence number, i.e. the first customer inserted into the table is assigned the number 1. Each subsequent customer receives a number incremented and thus guaranteed to be unique, from the previous number. If there are 10,000 customers in the table, you could reasonably expect the 10,000th to have the integer 10000 in its surrogate key field.


This key field carries with it no business meaning. It is simply a counter that increments with each additional table record. As an integer data type, it takes up relatively little space on the disc, and therefore does not add much overhead to the overall database design.

Why should I care?
Using surrogate keys requires some extra programming work some would argue. The good news is that in most modern Relational Database Management Systems (RDBMS, Microsoft SQL Server for example) the system does the work of assigning the next number in sequence to a new record inserted into a table. The advantages of using surrogate keys are many. What if we are using the copy barcode UPC code for our Copy dimension and one day the business decision makers decide to change the format of the stored copy code? Reworking the key structure for a table with millions of records, related to tens or hundreds of other tables, can be quite a nightmare. No one ever decides to change the format or structure of a surrogate key, because the value of the key has no business meaning, it is an internal value only an integer that uniquely id’s a record, and ties one table to another. There is also a slight performance gain to be had by using integers (surrogate key) rather than strings (most business keys) in table joins. The RDBMS within a BI system has less work to do to join 547 to 547 than ‘111-222-3333’ to ‘111-222-3333’. When dealing with billions of records, any little performance gain is welcome.

BIAnalytix is a Media Business Intelligence system from Decentrix that uses surrogate keys throughout its structure for all the reasons described above.

© Copyright 2011, Decentrix Inc. All rights reserved.
Decentrix Inc. 1200 17th Street, Suite 200, Denver, Colorado, 80202 Phone: 303-899-4000  
 
Decentrix Website Solution & Collaborative Intranet Tools   |   DishOnTheWeb.com by Decentrix, Dish Network Retailer Solutions
Afftraq Affiliate Sales Tracking Solution   |  HotDesk, Superior Website & Intranets for Small to Medium Size Business

Home Products Website Solution Collaboration NetAnalytix e-Commerce Technology Overview Architecture Quickcentrix Quickcentrix Integration Kit Consulting Web Analytics Consulting e-Commerce Consulting Business Intelligence Analytics Decentrix Process Online Marketing Site Visitors Search Engines Register With Yahoo Register With Google Search Engine Ranking Becoming Visible Online Collecting Prospect Lists Checklist Target Market Tours Website Solution Tour Collaboration Tour NetAnalytix Tour BIAnalytix Tour Clients News Press Releases e-Newsletters Contact Support Sales Company Vision Management Team Careers