Data Masking Helps Keep Live Data From Peeking Out, Experts Say
Emerging technology may prevent shared and test database content from appearing where it shouldn't
December 4, 2009
Nearly every IT person has seen it at some point: a test bed populated with real, sometimes business-sensitive, data just waiting to fall into the wrong hands. "It's so easy to just make a copy -- and it's hard to come up with test data," notes Steven McCabe, associate director of information technologies for the Residential and Student Service Programs at University of California Berkeley.
That simple fact can lead lazy developers to pull live data into a test environment, where it could easily be plucked by curious users or even external hackers. So McCabe's department is in the process of deploying a set of data-masking tools from dataguise -- a data security software vendor -- to facilitate the copying of quick and easy development and test data without compromising sensitive information.
According to Gartner, more than 80 percent of companies currently use live data for nonproduction purposes, such as development, testing, quality assurance, and business intelligence. This can pose a big problem for organizations, as the loosey-goosey nature of nonproduction databases has traditionally left them less secure than their production counterparts.
This is where masking solutions come into the picture. "Companies definitely need to mask information in their development and testing environments," says Slavik Markovich, CTO of Sentrigo, a database security tool vendor. "Many times those environments are accessed by many developers, [systems engineers], and [database administrators]. Whoever wants to access those environments gets access."
Masking products garble sensitive information while maintaining data integrity. "Data-masking technologies should satisfy a simple, yet strict rule: The application that runs against masked data performs as if masked data is real," explained Gartner fellow Joseph Feiman in a recent report. "In no way is data masking allowed to limit developers' capability to adequately test applications."
According to Markovich, data masking helps fill the gaps left behind by database activity monitoring (DAM) tools and by the encryption of live production data. Allan Thompson, executive vice president of operations for dataguise, agrees, noting that encryption and other technologies can be ineffective once the databases are opened and copied to other resources.
"When they pull it from production and put it in a nonproduction environment, they usually encrypt the databases," Thompson says. "But then they send them to the other departments, they unencrypt them, and they make copies. That's why auditors are finding sensitive data on desktops, file servers, laptops, and more."
And once live data is transferred to nonproduction systems, it can easily be converted to flat files that will no longer be tracked by DAM solutions, even if these solutions monitor nonproduction environments, experts observe.
But implementing data masking isn't always as simple as buying a new product, experts say. "Even when you have good tools for data masking, it is still very easy to make mistakes and not use them correctly," Markovich says.
Users make three common mistakes when implementing data masking, observers say: failing to understand where your sensitive data is within database tables, copying sensitive information to nonproduction systems before masking, and taking a "mask everything" approach.
Knowing the database environment well enough to perform effective masking is sometimes easier said than done, experts say. Markovich recounts a story told at a recent U.K. Oracle Users Group meeting, in which a healthcare company used masking -- but penetration testing revealed 17 unmasked copies of the "masked" data in other tables that it didn't know about.
"You've got to understand your database and where the 'bad' data is, and find those elements," McCabe says. If you don't know where the data is, you can get automated tools that can help ease the pain, he noted.
In other cases, organizations fail to mask information before it hits nonproduction systems. They'll first copy a database over -- and then mask sensitive data.
"That's a very bad practice because this information is getting in its original form to the testing environment," Markovich says. "Even if it's for a very short period, you cannot make the testing environment unlearn what was. Even if you overwrite the data, traces of the original information will remain there."
Data masking can often hit system performance pretty hard, McCabe says. To minimize this problem, you need to know which information to mask -- and which masking techniques to use for specific sets of data.
"When we first started it, our inclination was to mask as much as possible, but that really slowed performance," he says. "I recommend people spend some time to figure out what are the minimum amount of data elements in your data set that need to be masked -- and what are the more lightweight masking techniques you can get away with."
Have a comment on this story? Please click "Discuss" below. If you'd like to contact Dark Reading's editors directly, send us a message.
About the Author
You May Also Like