Live Data In Test Environments Is Alive And Well -- And Dangerous
Nearly 85 percent of financial firms use production data while developing and testing applications, so DBAs and database security need to better coordinate with developers
Those charged with the care and feeding of database information stores, beware: A new statistic tucked into a comprehensive study of financial services firms' data protection policies shows that even at the most security-aware organizations, application developers still use live data in their development and test environments.
The study, released earlier this month by the Ponemon Institute and commissioned by Compuware, showed that among 80 very large financial organizations, 83 percent use live data while developing and testing applications. That's a big risk to sensitive information, data security experts warn, and is a testament to the fact that DBAs and database security experts need to step up their efforts to work in tandem with their development colleagues to protect the data that these coders get their applications to tap into.
"Most organizations are still using live data in test and development environments because of a lack of awareness around data security, and they don't know they can easily mask or de-identify sensitive data using off-the-shelf technologies without changing applications or testing processes," says Phil Neray, vice president of security strategy at Guardium, an IBM company.
Even when the awareness is there, organizations still tend to rely on real data for its speed and ease of use, says Brian Contos, chief security strategist for Imperva.
"Using live, cloned data is generally regarded as a shortcut when there isn't enough time or resources to create test data, or a secure test data strategy isn't in place," he says.
But these are not excuses for a practice that can put customer data in great jeopardy. "It is true that in general these test systems are not Internet-accessible, but even if you have absolute trust in all your employees -- never a good starting point -- that doesn't remove the risk, as many organizations will outsource parts of development and hire contractors, consultants, and the like," Contos says. "And if the media has taught us anything over the last decade about carelessness, it's that people often store this type of data on laptops and removable media devices, and those assets can get lost or stolen."
Beyond the insider threat, there's also the very real possibility that malicious external hackers can eventually work their way deep enough into the network after a blended attack and get their hands on test applications and live data. Adam Muntner, managing partner of security consultancy QuietMove, regularly conducts penetration tests for his clients. Munter says the use of production data from the database is an all-too-common weakness he finds when he's looking to poke holes in his client organizations' security.
"Use of production data can accelerate testing, but needs to be done with extreme caution," Muntner says. "During many a penetration test, we've hacked our way to the keys of the kingdom through test systems that were not maintained up to the standards of production systems, yet contained the same data."
Part of the difficulty with this issue is that preventative measures require the cooperation of both the database stakeholders and the application developers.
"Generally, developers ought to obfuscate and anonymize any data in a test system which could be considered personally identifiable information. This would include email addresses, phone numbers, IP addresses, mailing address, etc.," Muntner says. "If an organization's operations team could obscure this data before delivering it, the chance for exposure shrinks even more."
One of the first steps is to formalize a process within the organization requiring authorization from the data and application owners for production data -- keeping everyone in the loop and preventing mistakes by untrained junior members of the team. And at the very least, Muntner says, personally identifiable information, such as social security numbers, e-mail addresses, phone numbers, IP addresses, mailing addresses, and so on should be anonymized, preferably where the data resides. If possible, the DBA can be brought in to leverage SQL commands, such as RAND, REPLICATE, and REPLACE, he says.
An even safer option is the use of artificial data generation using regular expressions and predefined ranges of values to generate realistic but fake test data, Munter says. He suggests two open-source tools, dbMonster and Generate Data, to help with the task.
Have a comment on this story? Please click "Discuss" below. If you'd like to contact Dark Reading's editors directly, send us a message.
About the Author
You May Also Like