(1) What are the public policy implications of the collection, storage, analysis, and use of big data? For example, do the current U.S. policy framework and privacy proposals for protecting consumer privacy and government use of data adequately address issues raised by big data analytics?
Current policy is not sufficient to address big data issues. There are many good proposals, but the speed with which they are taking shape means they are being lapped by the constantly shifting realities of the technology they are meant to shape. NSTIC (National Strategy for Trusted Identities in Cyberspace http://is.gd/JOrjCw) has been an excellent example of this. That digital identity is core to properly addressing big data should be obvious. How can we hope to protect privacy if we cannot identify the proper steward of data? How can we identify data stewards if the people who ought to be identified have no consistent digital identity? The very founding notion of NSTIC, trusted identities, begs the question of if we are prepared to approach empowering people via assigning responsibility. If we do not have identities that can be trusted, then we don’t even have one of the basic building blocks that would be required to approach big data as a whole.
That said, the implications that big data has are too large to ignore. In “The Social, Cultural & Ethical Dimensions of Big Data” (http://is.gd/EGe7tD), Tim Hwang raised the notion that data is the basic element in (digital) understanding; and further that understanding can lead to influence. This is the big data formulation of the notion that knowledge leads to rights, and rights lead to power – the well tested idea of Michel Foucault. In the next century, the power of influence will go to those who have understanding culled from big data. This will be influence over elections, economies, social movements and the currency that will drive them all – attention. People create big data in what they do, but they also absorb huge amounts of data in doing so. The data that can win attention will win arguments. The data that gets seen will influence all choices. We see this on the internet today as people are most influenced not by what they read which is correct but rather what they see that holds their attention. And gaining that influence seems to be playing out as a winner takes all game. With nothing short of the ethical functioning of every aspect of human life on the line, big data policy implications cannot be understated.
(2) What types of uses of big data could measurably improve outcomes or productivity with further government action, funding, or research? What types of uses of big data raise the most public policy concerns? Are there specific sectors or types of uses that should receive more government and/or public attention?
The amount of data involved in some big data analysis and the startlingly complex statistical and mathematical methods used to power them give an air of fairness. After all, if it’s all real data powered by cold math, what influence could there be hiding in the conclusions? It is when big data is used to portray something as inherently fair, even just, that we need to be the most concerned. Any use of big data that is meant to make things “more fair” or “evenly balanced” should immediately provoke suspicion and incredulity. Just a small survey of current big data use shows this to be true. Corrine Yo from the Leadership Conference gave excellent examples of how surveillance is unevenly distributed in minority communities, driven by big data analysis of crime. Clay Shirky showed how even a small issue like assigning classes to students can be made to appear fair through statistics applied to big data when there are clearly human fingers tipping the scales. There are going to be human decisions and human prejudices built into every big data system for the foreseeable future. Policy needs to dictate what claims to fairness and justice can be made and outline how enforcement and transparency must be applied in order to be worthy of those claims.
The best way for government to speed the nation to ethical big data will be to fund things that will give us the building blocks of that system. In no particular order a non-exhaustive list of these ethical building blocks will be trusted identity, well defined ownership criteria of data generated by an individual directly and indirectly, simple and universal terms for consent to allow use of data, strong legal frameworks that protect data on behalf of citizens (even and especially from the government itself), and principles to guide the maintenance of this data over time, including addressing issues of human lifecycles (e.g. what happens to data about a person once they are dead?). There are many current proposals that apply here, e.g. NSTIC as mentioned above. All of these efforts could use more funding and focus.
(3) What technological trends or key technologies will affect the collection, storage, analysis and use of big data? Are there particularly promising technologies or new practices for safeguarding privacy while enabling effective uses of big data?
Encryption is often introduced as an effective means to protect rights in the use of data. While encryption will doubtless be part of any complete solution, today’s actual encryption is used too little and when used often presents too small a barrier for the computing power available to the determined. Improvements in encryption, such as quantum approaches, will surely be a boon to enforcement of any complete policy. The continued adoption of multi-factor authentication, now used in many consumer services, will also be an enabler to the type of strong identity controls that will be needed for a cooperative control framework between citizens and the multitude of entities that will want to use data about the citizens. As machines become better at dealing with fuzzy logic and processing natural language, there will be more opportunities to automate interactions between big data analysis and the subjects of that analysis. When machines can decide when they need to ask for permission and know how to both formulate and read responses to those questions in ways that favor the human mode of communication, there will be both more chances for meaningful decisions on the part of citizens and more easily understood records of those choices for later analysis and forensics.
(4) How should the policy frameworks or regulations for handling big data differ between the government and the private sector? Please be specific as to the type of entity and type of use (e.g., law enforcement, government services, commercial, academic research, etc.).
Policies governing interaction of government and private sector is one area where much of what is defined today can be reused. Conversely, where the system is abused today big data will multiply opportunities for abuse. For example, law enforcement data, big or not, should always require checks and balances of the judicial process. However, there is likely room for novel approaches where large sets of anonymized data produced from law enforcement could be made available to the private and educational sectors en masse as long as that larger availability is subject to some judicial check on behalf of the people in place of any individual citizen. Of course, this assumes a clear understanding of things being “anonymized” – one of many technical concepts that will need to be embedded in the jurisprudence to be applied in these circumstances. There are cracks in the current framework, though. This can allow data normally protected by regulations like HIPPA to seep out via business partners and other clearing house services that are given data for legitimate purposes but not regulated. All instances of data use at any scale must be brought under a clear and consistent policy framework if there is any hope to forge an ethical use of big data.
I’m gearing up to go to the NSTIC convened steering group meeting in Chicago next week. Naturally, my inner nerd has me reviewing the founding documents, re-reading the NSTIC docs, and combing through the by laws that have been proposed (all fo which can be found here). I am also recalling all the conversations where NSTIC has come up. One trend emerges. Many people say they think the NSTIC identity provider responsibilities are too much risk for anyone to take on. With identity breaches so common now that only targets with star power make the news, there does seem to be some logic to that. If your firm was in the business of supplying government approved identities and you got hacked then you are in even hotter water, right?
The more it rolls around in my head, the more I think the answer is: not really. Let’s think about the types of organization that would get into this line of work. One that is often cited is a mobile phone provider. Another is a website with many members. One thing these two classes of organization – and most others I hear mentioned – have in common is that they are already taking on the risk of managing and owning identities for people. They already have the burden of the consequences in the case of a breach. Would having the government seal of approval make that any less or more risky? It’s hard to say at this stage, but I’m guessing not. It could lessen the impact in one sense because some of the “blame” would rub off on the certifying entity. “Yes, we got hacked – but we were totally up to the obviously flawed standard!” If people are using those credentials in many more places since NSTIC’s ID Ecosystem ushers in this era of interoperability (cue acoustic guitar playing kumbaya), then you could say the responsibility does increase because each breach is more damage. But the flipside of that is there will be more people watching, and part of what this should do is put in place better mechanisms for users to respond to that sort of thing. I hope this will not rely on users having to see some news about the breach and change a password as we see today.
This reminds me of conversations I have with clients and prospects about single sign on in the enterprise. An analogy, in the form of a question, a co-worker came up with is a good conversation piece: would you rather have a house with many poorly locked doors or one really strongly locked door? I like it because it does capture the spirit of the issues. Getting in one of the poorly locked doors may actually get you access to one of the more secure areas of the house behind one of the better locked doors because once you’re through one you may be able to more easily move around from the inside of the house. Some argue that with many doors there’s more work for the attacker. But the problem is that also means it’s more work for the user. They may end up just leaving all the doors unlocked rather than having to carry around that heavy keychain with all those keys and remember which is which. If they had only one door, they may even be willing to carry around two keys for that one door. And the user understands better that they are risking everything by not locking that one door versus having to train them that one of the ten doors they have to deal with is more important than the others. All of this is meant to say: having lots of passwords is really just a form of security through obscurity, and the one who you end up forcing to deal with that obscurity is the user. And we’ve seen how well they choose to deal with it. So it seems to me that less is more in this case. Less doors will mean more security. Mostly because the users will be more likely to participate.
I’ve had the privilege to witness many IT funerals. By my reckoning, Mainframes, CORBA, PKI, AS400, NIS+, and countless others are all dead according to the experts. Of course, that means nearly every customer I talk with is overrun with zombies. Because these technologies are still very much alive, or at least undead, in their infrastructures. They are spending tons of money on them. They are maintaining specialized staff to deal with them. And, most importantly of all, they are still running revenue generating platforms on them. Now some of the the venerable folks speaking at CIS2012 want to count SAML among the undead. It’s a sign of the ever increasing pace of IT. SAML, if it’s dead, will be leaving a very handsome corpse. But I think it’s safe to say SAML will be with us for a very long time to come. This meme feels like another flashpoint in the tensions between thought leaders like the list of folks discussing this on twitter (myself included) and the practitioners who have to answer to all the folks in suits who just want to see their needs met. I try to split the difference. It seems to me that the only thing that makes something dead is when people are actively trying to get away from it because they are losing money on it. SAML is nowhere near that. But if dead is defined as not being a destination but rather a landmark in a receding landscape, then maybe it has died. But it’s chasing after us hungry for our budgets and offering being impervious to pain as a trade for that funding, which sounds like some kind of zombie to me. Using SAML will make you impervious to the pain of being so far ahead of the curve there is no good vendor support, impervious to the pain that there are not enough people with talent in your platform that you can’t get things done – or have to pay so much to get things done you may as well not do them, and impervious to the pain of being unable to get what you need done because there aren’t enough working examples of how to do it. Based on what i hear from practitioners, they may like being impervious to all those pains. So the IT zombie legions grow…
I’m in the car listening to an NPR piece about LARPing while driving between meetings. Something they say catches my ear. It seems LARPers (is that even a word?) have an impulse to create immersive identities aside from their own because they want more degrees of freedom to experience the world. In case you’re in the dark about what LARPing is (like I was), it’s Live Action Role Playing – dressing up as characters and acting out stories in real world settings as opposed to scripted controlled settings. It’s clear how Maslow’s Hierarchy of Needs applies here. You won’t find a lot of LARPing in war torn areas, or communities suffering from rampant poverty. But does a group of people having enough energy to spare in their identity establishment to want to spawn new identities to live with imply an Identity Hierarchy of Needs? Could it be that when you have enough security in the identity you need, you seek out ways to make the identity that you want more real than just going to the gym to get better abs?
Maslow’s hierarchy of needs is one of my favorite conceptual frameworks. Not only is it extremely powerful in its home context of psychology, not only is it useful in framing the psychological impacts of many things from other contexts (political, philosophical, economic), it’s also useful as a general skeleton for understanding other relationships. My marketing team recently applied it to Quest’s IAM portfolio. They framed our solutions as layers of technology that could get your house in order to achieve the far out goals of total governance and policy based access management, which they identified as Maslow’s highest order. But I’m thinking about this more in terms of pure, individual identity. Of course the technology tracks alongside that in many ways. The LARPing is what got me thinking, but the other parallels become immediately clear. How many people have multiple social networking accounts? A page for business tied to a Twitter account, a Facebook presence as a personal playground, and a LinkedIn page for a resume are standard fair for many folks in the high tech biz, and beyond. Again, it’s not likely that a blue collar factory worker would have all these identities to express themselves. Like Maslow’s original idea, there is a notion of needing the energy to spare and the right incentives to take the time. There is also an interesting socio-political dimension to this I’ll leave as an exercise to the reader.
The first question is clear: what would an identity hierarchy of needs look like? If one googles “hierarchy of needs” AND “Identity management”, there are a dizzying number of hits. So it’s not like this hasn’t been explored before. Some good ones come from Dave Shackleford who applies the hierarchy to security and R “Ray” Wang who applies it more widely to making choices about technology decisions. But these only treat IAM as an element of their whole. I want to apply it to identity by itself.
One thing I’ll borrow from Dave’s structure is the four categories he uses (from the bottom up): fundamental, important, enhancing, holistic. I won’t pretend I’m going to get this right at this point. I would love to get feedback on how to make this better. But I’ll take a stab at making this work. The assumptions here are that there is no identity without attributes. What does it mean to say “I am Jonathan” if it’s not to assert that this thing “I” has an attribute labeled “name” that is given the value “Jonathan”? And this is more than a technology thing. All notions of identity boil down to attributes and collections of attributes. The next layer deals with taking identities that are collections of attributes and giving them places in groupings. Call them roles, groups, social clubs, parishes, or whatever you like. Membership in collections help define us. The next two layers were harder to work out, at first. But then I realized it was about the turn inward. Much like Maslow’s higher level are where you work on your inner self, our identity hierarchy is about understanding and controlling our attributes and participation in collectives. First we need to realize what those are. Then we need to use this knowledge to gain the power to determine them.
Self determination is actually the perfect phrase to tie together all these thoughts. What was it about the LARPers that triggered all these thoughts? It was that they had decided to actively take control of their identities to the point of altering them, even bifurcating them. That may make it sound like I’m making them out to be the masters of the universe (and not just because some do dress up as He-Man characters). But just like some folks can live in a psychological state pretty high up on the Maslow hierarchy without putting in much effort to achieve the first few levels, the same can be true of folks in the identity hierarchy, I’d think. If you have your most important attributes defined for you by default, get assigned reasonable collectives to belong to, and even have a decent awareness of this without challenging it, then you may grow up to be the special kind of geek that likes to LARP. That pleasure derived from splitting your personality is likely something that’s largely implicit – you don’t need to understand it too deeply.
Of course, if this all feels too geeky to apply to regular folks, I can turn to what may be the oldest form of this identity splitting. The “liaisons” in the title came from a notion that maybe folks carrying out complicated affairs of the heart were trying to bifurcate their own identities in a bid to push self determination before there was any better outlet. No excuse for serial adultery, but it gives a new prism through which to view the characters in Dangerous Liaisons, perhaps. How many times in novels does the main motivation for these affairs come down to a desire for drama, romance, or a cure for bourgeois boredom? How many times on The People’s Court? The point is that just like people who have climbed to the top of Maslow’s Hierarchy may not have done so using morally good means and may not use their perch to better the world, people who are experimenting in self determination to the point of maintaining multiple identities in their lives may not be doing it for the most upstanding of reasons, either.
And how does this all relate back to the technology of IAM? Maybe it doesn’t very concretely. I’d be OK with that. It may if you consider that there are many people out there trying to hand their users self determination through IAM self service without first having a grip on what attributes make up an identity. How can you expect them to determine their fate if they have no idea what their basic makeup is? We expect users to take the reigns of managing their access rights, certifying the rights of others, and performing complicated IAM tasks. But if they ask “Why is this person in this group?” we have no good answers. Then we’re surprised at the result. So maybe this applies very well. Finally, what does this have to do with the cloud? Clearly, cloud means more identities. Many times they are created by the business seeking agility and doing things with almost no touch by IT. If the cloud providers give them a better sense of identity than you do, then that’s where they will feel more able to determine their own fate. Some may say “But that’s not fair. That cloud provider only needs to deal with a small bit of that person’s identity and so it’s easier for them!” Life is not fair. But if you establish a strong sense of what an identity is and how it belongs in collectives, gave users ways to understand that, and then enabled them to control it, you would be far ahead of any cloud provider. But it all starts with simply understanding how to ask the right questions.
I expect (and hope) to raise more questions with all of this than to answer them. This is all a very volatile bed of thoughts at the moment. I’m hoping others may have things to say to help me figure this all out. As always, I expect I’ll learn the most by talking to people about it.
There was an excellent article at Dark Reading the other day about data leaks focusing on insider threats. It did all the right things by pointing out “insiders have access to critical company information, and there are dozens of ways for them to steal it” and “these attacks can have significant impact” even though “insider threats represent only a fraction of all attacks–just 4%, according to Verizon’s 2012 Data Breach Investigations Report.” The article goes on to discuss how you can use gateways, DLP for at rest and in flight data, behavioral anomaly detection, and a few other technologies in a “layered approach using security controls at the network, host, and human levels.” I agree with every word.
Yet, there is one aspect of the controls that somehow escapes mention – letting a potentially powerful ally in this fight off the hook from any action. There is not one mention of proactive controls inside the applications and platforms that can be placed there by IAM. A great deal of insider access is inappropriate. Either it’s been accrued over time or granted as part of a lazy “make them look like that other person” approach to managing entitlements. And app-dev teams build their own version of security into each and every little application they pump out. They repeat mistakes, build silos, and fail to consume common data or correctly reflect corporate policies. If these problems with entitlement management and policy enforcement could be fixed at the application level, the threats any insider could pose would be proactively reduced by cutting off access to data they might try to steal in the first place. It’s even possible to design a system where the behavioral anomaly detection systems could be consulted before even handing data over to a user when some thresholds are breached during a transaction – in essence, catching the potential thief red handed.
Why do they get let off the hook? Because it’s easier to build walls, post guards, and gather intelligence than it is to climb right inside of the applications and business processes to fix the root causes. It’s easier to move the levers you have direct control over in IT rather than sit with the business and have the value conversation to make them change things in the business. It’s cheaper now to do the perimeter changes, regardless of the payoff – or costs – later. Again, this is not to indict the content of the article. It was absolutely correct about how people can and very likely will choose to address these threats. But I think every knows there are other ways that don’t get discussed as much because they are harder. In his XKCD comic entitled “The General Problem,” Randall Munroe says it best: “I find that when someone’s taking time to do something right in the present, they’re a perfectionist with no ability to prioritize, whereas when someone took time to do something right in the past, they’re a master artisan of great foresight.” I think what we need right now are some master artisans who are willing to take the heat today for better security tomorrow.