Big Data And Bad Security
The rush to collect and mine big data leaves data security in the dust
Following my most previous post, I received some questions concerning data security and the adoption of NoSQL -- both as a platform and a service -- because many of the cloud providers harness "big data." My goal was to explain to DBAs what they should be aware of because NoSQL will be part of their lives, and they need to understand how it impacts security and operations. Based on the questions I received, there is some ambiguity regarding what's being provided by cloud providers, what's built into any given NoSQL engine, and what the application developer who harnesses NoSQL is responsible for. So let's backtrack a bit and go through the trends as I see them.
Every couple of years we get a new programming language or platform that alters the IT landscape -- a better presentation of content, better use of resources, easier-to-use systems, easier-to-build code, and so forth. Examples might be the Internet, Ruby, Java, birtualization -- and now it's the cloud, big data, and mobile. Simple, lightweight, low cost, and powerful are typical hallmarks of these new technologies, and the very reasons they gain rapid adoption. Soon thousands of developers jump on the proverbial bandwagon and start interesting new projects.
In a short time span, these projects gain acceptance because they provide value and service like never before. And companies get hooked on the value provided. Shortly after the point when we accept the new technology as both an enabler and differentiator for the business, we realize the rush to embrace cool, new technologies has left management and infrastructure far behind. Specifically, that means security -- as always -- comes after the fact.
NoSQL -- which is defined as nonrelational, distributed, and horizontally scalable data stores -- falls squarely into this category. It abandons the constraints of schemas and transactional consistency in favor of simple usage and massive scalability in terms of data storage and processing capabilities. It's a technology that can handle big data, storing more and being able to analyze the aggregate -- at a scale beyond the reach of relational databases.
But from a security standpoint, you're starting from scratch. By and large it is not built in. We are only a couple of years removed from a time when just a few developers were prototyping applications on obscure open-source data repositories. Now we have major firms advancing capabilities and providing the infrastructures to go with these databases. And with millions of mobile devices all feeding data back to providers that want near real-time analysis, demand will continue to grow.
If Hadoop and its various flavors did not have you convinced that Big Data was more than a fad, then Amazon's SimpleDB & Dynamo, Google's BigTable, Cassandra, and 100 other platforms should have. Now that Oracle has jumped into the NoSQL market with both feet, you should start seriously looking at NoSQL installations as something you will be managing, either in-house or in the cloud that augments your infrastructure.
Yes, I've heard the criticism in the database community that Oracle offering NoSQL was purely a marketing effort to slow the erosion of the core relational database business, but I don't think that's true. Big data is built around a different data storage model and a slightly different method of processing data. The concept of what constitutes a data center is being redefined by big data and cloud computing, and these vendors are all vying for customers' attention in this expanding market. So for you DBAs out there, start learning NoSQL because it's going to be around for a while.
But this is where it comes back to security because there isn't much available. Developers and IT managers will, once again, scramble to bolt security on after the fact. The few vendors that claim to provide NoSQL security have nothing more than "cloud-washed" marketing collateral to go with exactly the same product they have been selling for years.
We see some federated identity systems implemented with SAML, and environment security measures embedded with the cloud infrastructure. But there is no such thing as vulnerability assessment or database activity monitoring for NoSQL today. Label security is based on schema, which does not exist in NoSQL. Encryption can be problematic because the data and indices need to be in clear text for analysis, requiring application designers to augment security with masking, tokenization, and select use of encryption in the application layer. Audit trails are whatever the application developer built in, so they are both application-specific and limited in scope. Today, security is sparse, and it likely will be for several years.
Adrian Lane is an analyst/CTO with Securosis LLC, an independent security consulting practice. Special to Dark Reading.
About the Author
You May Also Like