The InfoSec Barrier to AI
Information security challenges are proving to be a huge barrier for the artificial intelligence ecosystem. Conversely, AI is causing headaches for CISOs. Here's why.
Information security is all about keeping an organization's information secure and maintaining its integrity. Some time ago, all this meant was good locks on the doors, physical as well as virtual.
The first challenge to this philosophy started in the 2000s with software-as-a-service (SaaS), when vendors asked that companies volunteer data to shared servers. The information security industry responded with stronger encryptions and contracts. Enterprises added another layer by simply refusing to let the data outside their firewalls. They asked for on-premises deployments and private clouds instead. This was the origin of the debate that has not settled for decades … and counting!
Today, over the past three to five years, artificial intelligence (AI) has emerged as a force that has only complicated the arguments on both sides, adding completely new challenges to information security. Conversely, information security has become a big challenge to the growth of AI itself.
A Bird No Longer in Hand
CISOs have responded with agility to most aspects of AI's threat. For example, doing well with AI means relying on the construct of function-as-a-service (FaaS). Also called serverless computing, FaaS enables a user to start a program without worrying about hardware needs or distribution. While this is very useful for training AI models, provisioning for FaaS makes economic sense at enormous scales only. In other words, AI naturally begs for migration to public clouds, which is fine — data leaving the premises may cause a lot of headaches but it does not necessarily mean that the data is insecure. Since there is really no way around this, CISOs have stepped up. However, the concern lingers regarding what happens with this data outside the firewall.
AI learns the best on data from multiple customers in a similar situation. Many vendors have tried to get away with commingling at the model level. For example, IBM has assuring words about data privacy on its website, but it fails to mention that for many of Watson's products there is a single underlying AI model. Data from every customer, while individually secure, is used to train that single model.
The easier problem in this approach is to figure out exactly what these models are retaining that could be valuable. For example, consider a popular open source model from Google called Word2Vec. It converts each word in your corpus to a 400- to 1,000-dimension vector (an array of 400 or 1,000 numbers). The weights in the vector don't mean much per se; however, Google itself popularized this trick to demonstrate the power of Word2Vec: If you take the vectors for King, Man, and Woman, and do something like [King] – [Man] + [Woman], you get the vector for [Queen]. While this sounds fantastic and innocuous for general English, relationships like this may risk competitive insights for enterprises. Maybe all we wanted to do was to hide the identity of the queen.
To be fair, it is hard to reverse engineer insights from a trained model in most cases. Still, it is not impossible. CISOs have continually asked for more transparency to counter this.
The Real Unmet Challenge: Deterministic Behavior
The harder problem is of a completely different nature. Let us conduct a thought experiment. Say a CISO is confident about a vendor's security protocols and the integrity of its models. The CISO allows it to process the company's data and results start to flow. Say, 2 + 2 = 5, which is found to be generally acceptable. The company designs its systems around this; perhaps a downstream process offsets the result by -1. Most people would say AI has delivered.
This is when we come to the real problem — that of deterministic behavior. Based on the success at this company, the vendor markets the model to a competitor. The AI model gets fresh data. It is smart enough to spot the error in its ways. It learns. It changes the output to 2 + 2 = 4. On the surface, this sounds like another win for AI, but this is a big problem for enterprises. All the investments around the model that our CISO's company made are now useless. The company must recalibrate, reinvest in the downstream systems, or, in the worst case, live with an erroneous output of the process. The rise of AI has added a completely new dimension to CISOs' worries. An AI model's integrity and consistency — its deterministic behavior — is now equally important. Continuously evolving, shared AI models do not come with a guarantee for reliability. At least not yet.
The Barrier for AI
The debate between proponents of multitenant solutions and those of bespoke implementations was raging well before AI added this new twist. In some quarters, it was popular to tag CISOs as conservative if they did not share data. Now it seems simply prudent. The bigger implication of this problem is for the AI industry. AI as a product is proving to be a myth. AI companies must develop very differently than their SaaS brethren. However, the funding and scaling models in venture capital communities are geared for SaaS. In short, information security challenges are proving to be a huge barrier for the AI ecosystem.
Maybe it is the information security industry that eventually solves this problem. Perhaps there are protocols and encodings ahead that force the idea of deterministic behavior to AI. Until then, everyone is in an unfortunate pickle of determinism.
Related Content:
Register now for this year's fully virtual Black Hat USA, scheduled to take place August 1–6, and get more information about the event on the Black Hat website. Click for details on conference information and to register.
About the Author
You May Also Like