Data Integrity Pain Points for Marketers

The amount of data businesses collect has grown exponentially, but the resources to deal with it haven’t kept pace. Unfortunately, too many marketers still think that data is both easy to pull and either ready to use or easy to clean. Those of us who have worked in the trenches know otherwise.

Pain Point One: Extracting Data

Many companies have large data warehouses with an abundance of data, but there is often very little metadata (data that provides key information about the data) to ensure the data is used correctly. There are often only a handful of analysts who know the correct joins, what key conditions must be applied or what has changed in the data over time. Many analysts are simply providing misleading data sets. A simple analogy may help with my last statement. What is the answer to the following equation: 6 x 2 – 2 x 4? If you answered 40, then you didn’t know you need to apply the order of operations to the data. The answer is actually 4. The equation really should be thought of as (6 x 2) – (2 x 4). Without the knowledge of the order of operations mathematical rule, 40 sure seems logical. With data, answers are often obtained; however, are the answers correct? Companies need skilled employees with access to powerful tools to get those correct answers. In addition, the data needs to be centralized – so it’s readily available for easy access – and it must become trusted to be used to its potential.

Pain Point Two: Data Updates

Data can get old and need updating. For example, let’s introduce a fictional person named Mandy Smith. Mandy might get married and change her last name, she might decide to start going by Amanda, she might move, or she could change jobs. Such occurrences can result in changes to name, address, phone number and/or email field values.

Entities, such as consumers or companies, can exist multiple times within a data set. Is there one correct record, or are the valid data field values spread across multiple records? Entities can also exist across multiple data sets. You don’t want to pay for prospects from a list vendor if you already possess the data. You also don’t want to send a prospect campaign to an existing customer, to a known competitor or to someone who has requested to not be contacted, has died or is in prison.

Additional Data Considerations

Handling of missing data:

  • Should it be left blank?
  • Can it get appended using an external service?
  • If a value is needed, can’t be obtained and is a metric field, should an overall average be used or should an average be used from like entities?

Same entity represented differently:

  • Proper name vs. nickname
  • Name components combined in one field vs. separated in multiple fields
  • Full spelling vs. abbreviation for business name
  • Misspellings

Other issues:

  • Junk data put in due to forced entry, fraud, etc.
  • Data entry mistakes
  • Notes in fields not meant for notes
  • Ranges in fields where discrete numbers were expected
  • Address or other data not standardized

Take the time and invest in the correct resources to make your data strong. Too often, a misguided cost center vs. revenue mentality discourages such action. Good data can actually pay long-term dividends.

Jim Vilter • March 21, 2018

Previous Post

Next Post