Tech Insight: Finding And Securing Your Enterprise's Most Sensitive Data

The headlines are full of companies facing serious breaches. Here are some basic steps to protect your enterprise's critical data -- and stay out of the news

Dark Reading Staff, Dark Reading

May 20, 2011

10 Min Read
Dark Reading logo in a gray background | Dark Reading

No matter what your business, information is likely one of your most valuable assets -- both to you and to the attacker. With so many breaches in the news -- from Sony to Epsilon to Heartland Payment Systems -- we all must understand what sensitive data we have, the risk associated with the data, and how to protect it.

But before we can do anything, we must know what sensitive data we have and where it's stored. We can’t protect what we don’t know about. Finding the data could be very simple or hugely complex, depending on your organization.

The first step is to list all of the sensitive data types your organization handles: employee personal information, customers' personally identifiable information, cardholder data, medical records, and corporate intellectual property, such as source code and transaction information. These data types will hold different risks for different companies -- if you're a software company, for example, the loss of source code is more damaging than the loss of externally regulated data.

Once you have a list of what you believe is critical data, ask department heads to add any other data types they know of or believe should be included. Use this as an opportunity to also ask teams to identify the places where these data types are utilized. At this point, all data types should be added to the list -- later, you’ll filter and prioritize based on risk and which you can most easily protect.

Once you know what sensitive data you're looking for, you'll inevitably find it in places where it shouldn’t be stored. During a data audit years ago, I found credit card information in the /tmp directory on a server. A production support staff member was debugging a data load that couldn’t be reproduced in staging and dumped the data load into the /tmp directory to review the data structure against that of staging. Once finding the problem, he forgot to remove the dump, thus leaving it there for anyone who accessed the server.

After you've talked with your people about what types of data to look for and where it might be found, the best way to find sensitive data is to scan and monitor for it. Most data in your organization can be fingerprinted in a way that allows for searching. For instance, credit card numbers and Social Security numbers follow a predefined format that’s well-documented.

There are literally dozens of pages on Google that show you how to search for these data types, utilizing everything from OpenDLP to simple Perl, PHP, and Python scripts. Custom data types, such as source code, typically can be fingerprinted and searched using header information added to the code on check in; standard comments that may apply to all files, such as copyright information; or by searching for common strings that appear in the code, such as variable names or custom include files.

If your source code doesn’t have something unique defined in every file checked in, then add a unique signature to every file so that it can be searched for in the future. This process can be automated through most source control software. Utilizing OpenDLP or a commercial alternative to perform searches -- rather than writing your own scripts -- is a great way to start. Of course, OpenDLP is Windows-centric, so other solutions might be required.

What if you need to search for data that is in an uncommon format? Easy. Find the pattern, break out your favorite regular expression helper, like Reggy for OS X, or call your resident regex guru and build your own regex to reduce false positives.

There are two types of data that you will run into when searching: structured and unstructured. Within these, there will be countless formats -- and these formats will be the bane of your search. Structured data is data stored in a known format -- such as in a database. Unstructured data is data that is stored in unpredictable or various formats, such as Word files, text files, Excel files, or any other random format.

Searching plain text files, XML files, and databases is pretty straightforward. You connect to the database and run a query across each table or for flat file. Search the file system, iterate over each line of each file, and you’re pretty much done.

Archives, Word files, Excel files, and data stored externally will cause more of a problem -- not to mention the data types that aren’t well-documented or predictable. If you’re trying to roll your own solution, then Perl, Python, Java, and just about every other language offer libraries of common file formats. All of the commercial data loss prevention (DLP) and search products can identify common file formats, too.

What if your sensitive data is stored in the cloud? Many enterprises don't have good visibility into data that is stored and shared externally -- whether it's a cloud services provider or a third-party service, such as the ones that might be managing your payroll or benefits.

Next: How to find data in the cloud. To find data in the cloud there are a few approaches you can take -- each of them varies, depending on the service, so choose the method that works best for you and your service provider or partner.

First, as you did with storage locations, determine what services are being used. Circulate a list to department heads and key users. Check with legal, IT, and accounting. You’ll find some services being used that aren’t "official," so you’ll have to work with the end users and departments on those. The big ones, such as Salesforce and the like, you can probably find through accounting or legal, since they leave a trail of licenses and budget. Once you know which services your organization is using, determine the best approach for each.

A good way to start is to monitor the network for data patterns. Pull out your trusty regex again and add it to your IDS or IPS -- or install a DLP solution to monitor network traffic for sensitive data. If the service utilizes encryption, then you won’t see much, but it's a start.

Identifying unencrypted, approved transactions is a good first step. We need to know data is protected when we share or move it. During this process, you might discover unapproved or unknown services and data transfers. You’ll be surprised what you find floating over email, IM, FTP, and HTTP.

Now the real work starts. If you can’t find the data traversing the network, how can you find data in outsourced services? You have a few options, though none of them is great. First, if the services have APIs that can be used to search data, get out your favorite IDE (I’m hardcore, so I code in vi), and whip up a script to search the service, report on data found, or take some action.

If there is no API, or if you don't have the coding skills, then manual searching and auditing might be in your future. Logging in as an administrator and running queries against stored data can obtain the same results, but it takes more time and resources. Make this your last resort -- you don't want to spend all day searching cloud services unless you have no other choice.

As you’re searching all of these locations and finding sensitive data, two things will likely occur to you. First, how can you prevent data from being stored in so many places? Second, how should you protect the sensitive data that you've found? Let’s tackle these in reverse order.

Protecting the data will help control its flow. Each type of data poses a different risk to the organization if it is lost, destroyed, compromised, or exposed. Understand the risk each data type poses and align your protections to fit those risks. If the data is highly valuable, then encryption and/or restricted access could be required.

Start with access restrictions. Restrict access to the storage locations on a "need to know" basis. Restrict access to the level of rights -- and to the users who have a need to access the data. The smaller the group, the lower the privileges and the less risk of compromise. Access controls can be applied at the file, directory, database, or application level. Use whatever makes sense.

Once you have identified the proper storage location for your sensitive data and restricted access to it, encryption might be the next step. There are a slew of possibilities available, ranging from file-level encryption to full-drive encryption. Modern database management systems support encryption natively. Add in an external key management product, and you can build a solid solution.

If, for some reason, this doesn’t work for you, there are file system-level encryption products that allow you to drop your database -- and other files -- into an encrypted mount point. From there, you can control which processes and users will be allowed to access the decrypted data.

Never forget to provide your users with a method of encrypting data when stored locally or on the network. I know we all say never to store sensitive data locally, but let's face it: Sometimes there is a need for local storage. Provide a method and guidance for protecting that data.

Using a commercial full-disk encryption product might be the solution for local encryption; in some cases, open-source TrueCrypt does the job. I like full-disk encryption because then I don't have to worry about whether the user stored the file in the correct folder -- everything on the disk is encrypted. Some others argue that full-disk encryption is overkill because the user only really needs to encrypt a tiny subset of the total files stored on the system. To each his own.

With access controls and encryption in place, we now have a method of monitoring data for unauthorized access. If our network monitoring doesn’t catch someone transferring data, then access logs associated with the storage location should help us identify any access attempts.

After we have identified and protected the approved locations for storing sensitive data, it's time to tackle the rogue locations. Identify these locations, send notifications to their owners, and work with each to move the data to the approved locations -- or bring the existing locations into compliance.

If the owners of the rogue locations don’t listen or respond, then get buy-in from your management. Then comes the fun part. Find the location, restrict all access, and wait to see who complains. When they complain, inform them that the data violates policy and must be stored in the approved location or in an approved method.

Now we have all of our data identified, stored, and protected. The last thing to tackle is retention and documentation. Be sure that your organization documents the data types, the control types required for each, and where each data type can be stored. These guidelines should be high-level enough to allow flexibility while still keeping sensitive data types secure.

Data retention is a key element of protecting sensitive data. One of the easiest ways to reduce data risk is to collect and store only the data your organization really uses. Purge old data or back it up and store it offline. By reducing the amount of data that's easily accessible, organizations naturally decrease risk. Establish purge policies for customer data that’s no longer required. Purge old, unused files on "public" shared drives. Be sure to clean up data stored with external vendors and cloud services.

Finding data across the organization and external services is not an easy task. There are many places to look and many technologies to work with. Start small, learn what you can about data usage, storage, and types, and gradually tackle the problem. If you can, use commercial products that continually expand their scope and capabilities. Open-source solutions, single-purpose tools, or custom scripts could also work -- but require more commitment.

Have a comment on this story? Please click "Comment" below. If you'd like to contact Dark Reading's editors directly, send us a message.

Read more about:

2011

About the Author

Dark Reading Staff

Dark Reading

Dark Reading is a leading cybersecurity media site.

Keep up with the latest cybersecurity threats, newly discovered vulnerabilities, data breach information, and emerging trends. Delivered daily or weekly right to your email inbox.

You May Also Like


More Insights