The Data Masker

Monday, March 24, 2014

The Price for Non-Compliance

The homepage of the Catholic Archdiocese of Seattle is the last place one might expect to see the phrase “IRS TAX FRAUD SCAM” in a bright red box. The conspicuous red box, which links to a legal notice in four languages and a letter straight from the Archbishop himself, is just one of many painful steps the Archdiocese has had to take in the week following the discovery of a data breach. The breach exposed social security numbers and other sensitive personal information of some of the Archdiocese’s 90,000 employees and volunteers.

As the bright red notice demonstrates, the Archdiocese must now fulfil mandates from the scary side of legal compliance: what to do after sensitive data has already been compromised. In some circumstances, the notice and reporting requirements, invasive legal investigations, fines, penalties, and sanctions have driven organizations into bankruptcy.

State and Federal agencies, like the IRS, are on the lookout for signs of data breaches such as the fraudulent tax returns that resulted from the Archdiocese breach. However, organizations that control sensitive data like identity, financial, or health information are often subject to stringent requirements, like a duty to discover and report data breaches as early as possible. Some of these requirements vary by state, but they generally involve massive fines that increase with the scale of the breach and any delays in discovering and reporting the breach. The legal requirements also vary by industry. For example, financial institutions subject to GLBA must provide their customers with notice every time their privacy policy changes, while schools subject to FERPA stand to lose all Federal funding if they divulge confidential student records to the wrong person.

The costly effects of a data breach do not end there. Notification requirements can lead to negative media exposure and customer outrage. Expenses, both voluntary and mandatory, often include legal fees, public relations costs, extra security, and programs aimed at restoring customer goodwill (like Target’s credit monitoring program). If the Target breach is any example, blaming your organization’s contractors or service providers will do nothing to stem the tide of expenses after your sensitive data is compromised.

In data privacy compliance, an ounce of prevention is often worth a pound of cure. If your organization deals with sensitive personal data, make sure you are aware of the compliance requirements for your industry and jurisdiction. Having the right technological solutions in place to protect your data might just make the difference between profit and bankruptcy.

By Harris Buller, J.D.

This article contains a general discussion and does not constitute legal advice. If you encounter any of the issues discussed in this article, consult with an attorney.

Saturday, March 1, 2014

Masking? Encryption? Confusion of Sorts.

Masking is NOT encryption. Encryption is NOT enough.

You have probably heard these phrases before. What exactly do they tell us?

Data is sensitive. During data lifetime, it goes through many “hands” and gets seen by many

“eyes”. When we want to protect sensitive data from exposure, we need to understand who we

protect against and what the points of exposure are.

We start with the following use case: we inserted data into a system and our data is saved on the

disk. Who do we not want to see our data and why? First, the malicious outsiders.

The very first

data we usually protect are the logins and passwords, because these pieces provide the “keys to

the locks”. We secure them with encryption. If we want to be more cautious, we encrypt all the

sensitive data on the disk – to protect against the case of theft or loss of the storage device itself.

This is especially relevant in light of the BYOD – bring your own device to work – trend. Even

the most cautious among us often leave laptops or tablets in cars unattended. The majority of us

are not intelligence agency operatives and are not trained to never leave a trace behind. The best

protection against a malicious outsider is ENCRYPTION.

However, almost 40% of data fraud happens not with outsiders, but with insiders. These cases

involve people accessing the data across the whole spectrum: from the CxO office with internal

trading cases to unscrupulous or naïve developers. Few developers, of course, are unscrupulous

or naïve. Yet, unfortunately, breaches do happen. All of us are well aware of the latest case of

an “insider threat”: the case of Mr. Snowden.

Regardless of his intentions, he demonstrated that

a developer, bound only contractually, has unrestricted access to data and, as such, can present

a threat. Encryption does not protect against insiders. A CxO sees the data naked because s/

he works with it in production. The developer sees encrypted data be it in production or non-
production. Now, guess, who has the keys to encryption when production data is recreated in

development as necessary in many scenarios? Yes, you guessed it right: the developer does!

The only protection against an unscrupulous CxO is legal recourse. However, there is both legal

recourse and technological protection against an unscrupulous or ‘naïve” developer - DATA

MASKING. Masked data retains the look and functionality of real data. It fits the field size,

passes unit tests and gives real numbers at performance testing - as it would in the production

environment.

With data masking, the only data that has real value is data in the production environment. The

environments outside of production have fake data that has significantly less value on the black

market. Data masking is NOT encryption but rather a "one-way street" to removing sensitive information.

The bottom line: fewer people have access to real data when we use data masking.
Developers do not have access to sensitive information, be it encrypted or not.

In the next blog posts, we will be talking about data at rest vs. data in transit, production data

masking scenarios, as well as how we decide on data sensitivity

Sunday, November 17, 2013

Database Design: Data Masking must be a criteria

I started reading the works of C.J.Date on RDBMS back in the late 1970’s. While I have kept up with his later writings, I have found that Scott Ambler books are very good practical books, namely:

Agile Database Techniques, Wiley, 2003

Refactoring Databases, Evolutionary Database Design, Addison Wesley, 2006.

These books have been invaluable in designing and implementing effective databases but what is missing from all of them are discussion of data masking.

I tend to be of the old rigorous school for green-field database development: create a fully normalized logical database model, ideally pushing the model to a full 6th normal form. After this is done, denormalized to a physical database model that addresses performance and ease of use criteria of the customer. I believe an additional criteria needs to be included in the logical-to-physical data model activity, inclusion of data masking criteria.

Two important criteria to consider are:

Strive to have masking possible at a column-atomic level, if there is column-correlation then use adjunct tables for the columns that are correlated in a table.

Complete avoidance of natural keys.

Column Level Masking and Correlation

If there are no dependencies on other columns, you are probably in normal form and have no hard-dependencies. This is also a soft dependency which I term column-correlation which I define as:

The value in column A results in a subset of values being valid for another column B.

In most cases, the columns will be category columns. For example: Gender and Method of Address are correlated:

F –> Ms, Mrs, Dr. Prof.

M –> Mr. Dr. Prof.

U –> Dr. Prof.

A randomizer will result in M- Ms, or F-Mr (ignoring issues with transgender)

This situation and standard forms in data modeling have a painful co-existence. Traditionally an address will be decomposed into atomic components such as zip code, city, state, county, address line 1, address line 2, etc. Ideally, it would also have a column indicating if this was a postal address or the delivery service address. When it is time to mask these columns a host of issue arise because these columns are correlated.

A typical issue is sales tax calculations. Sales tax calculations uses one or more addresses (destination, shipper, billing) to calculate the sales tax. If you masked one column then the address may be deemed invalid and the call for tax may fail. For example, my home zip code covers two counties and 3 towns/cites. There are a few cases where a community crosses an international border. 123 Main Street may be in the US, 223 Main Street may be in Canada.

My approach is to break out these correlated columns into molecules as separate adjunct tables using surrogate keys. This allows an easy shuffling of the keys, or the substitution of the rows with alternative valid records.

Avoidance of natural keys

Surrogate keys should be used for all referential integrity, foreign keys, primary key and alternative primary keys. A natural key, such as a two character state (WA,CA, etc) may be tempting to use, but it then means that data masking may not be column-atomic. I have found some shops have been aggressive on this point by requiring all referential integrity to use GUID/Unique Identifiers. This slightly extreme approach has some advantages because often enumerations are saved as integers creating a quasi-natural key that creates unforeseen problems with data masking.

Other Data Modeling Criteria for Data Masking?

Readers may wish to suggest other criteria, there are several more that I know of, but they occur rarely so I will not burden the reader with academic issues.

Tuesday, October 29, 2013

Data Masking: the problem and attitude to the problem

This is the first of a series of posts dealing with data-masking. There are many software providers in this specialized industry. Some of the providers are also database providers attempting to capture this additional add-on market. There is a traditional tendency for these big companies creating add- ons to produce only a basic version lacking key features. These key features will often be provided by a company whose sole business is data-masking. These companies always strive to differentiate themselves by providing more at less cost.

The primary motivators for data-masking are government laws and regulations. A few examples are:

The core reasons for this protection are typically:

Protecting an individual's privacy
Protecting an organization's privacy. Corporations and most organizations are deemed "persons" in most of the western world.
Preventing information being available that may assist inside trading of stock (or equivalent)
Preventing unfairness in the marketplace: for example, exposing a firm's customers, what they ordered, and the price actually charged for goods.

For several years I was working for Patchlink, and Lumension Security. I was their representative to the Security Content Automation Protocol and other initiatives sponsored by the National Institute of Standards and Technology (NIST). NIST activities have yet to expand to data masking, but such action is expected in the next few years. NIST has produced only one related paper, "Guide to Protecting the Confidentiality of Personally Identifiable Information (PII)" SP-800-122 (Apr 2010), which is worth reading. If you are the data masking owner in your organization, this is not an optional read;; the contents would have considerable legal weight as "normal or expected practices" because the source is NIST.

A Higher Level of Protecting Information

Often management in corporate America takes a minimalist approach "if it follows the general advise of my data-masking provider, it is good enough" which translate to, "if things goes bad, I want to be absolved of responsibility and have some other party guilty of not doing their job". With the duration of time in most IT jobs being short before moving on to the next position in a different company, this approach is a safe bet for the manager (but may not be a good bet for the company). I am of the temperament of being very pro-active and wish to prevent data exposure ever happening;; be it on my shift, or after my shift.

Looking at best practices for Data Masking is one of the goals of this blog.