(1) What are the public policy implications of the collection, storage, analysis, and use of big data? For example, do the current U.S. policy framework and privacy proposals for protecting consumer privacy and government use of data adequately address issues raised by big data analytics?
Current policy is not sufficient to address big data issues. There are many good proposals, but the speed with which they are taking shape means they are being lapped by the constantly shifting realities of the technology they are meant to shape. NSTIC (National Strategy for Trusted Identities in Cyberspace http://is.gd/JOrjCw) has been an excellent example of this. That digital identity is core to properly addressing big data should be obvious. How can we hope to protect privacy if we cannot identify the proper steward of data? How can we identify data stewards if the people who ought to be identified have no consistent digital identity? The very founding notion of NSTIC, trusted identities, begs the question of if we are prepared to approach empowering people via assigning responsibility. If we do not have identities that can be trusted, then we don’t even have one of the basic building blocks that would be required to approach big data as a whole.
That said, the implications that big data has are too large to ignore. In “The Social, Cultural & Ethical Dimensions of Big Data” (http://is.gd/EGe7tD), Tim Hwang raised the notion that data is the basic element in (digital) understanding; and further that understanding can lead to influence. This is the big data formulation of the notion that knowledge leads to rights, and rights lead to power – the well tested idea of Michel Foucault. In the next century, the power of influence will go to those who have understanding culled from big data. This will be influence over elections, economies, social movements and the currency that will drive them all – attention. People create big data in what they do, but they also absorb huge amounts of data in doing so. The data that can win attention will win arguments. The data that gets seen will influence all choices. We see this on the internet today as people are most influenced not by what they read which is correct but rather what they see that holds their attention. And gaining that influence seems to be playing out as a winner takes all game. With nothing short of the ethical functioning of every aspect of human life on the line, big data policy implications cannot be understated.
(2) What types of uses of big data could measurably improve outcomes or productivity with further government action, funding, or research? What types of uses of big data raise the most public policy concerns? Are there specific sectors or types of uses that should receive more government and/or public attention?
The amount of data involved in some big data analysis and the startlingly complex statistical and mathematical methods used to power them give an air of fairness. After all, if it’s all real data powered by cold math, what influence could there be hiding in the conclusions? It is when big data is used to portray something as inherently fair, even just, that we need to be the most concerned. Any use of big data that is meant to make things “more fair” or “evenly balanced” should immediately provoke suspicion and incredulity. Just a small survey of current big data use shows this to be true. Corrine Yo from the Leadership Conference gave excellent examples of how surveillance is unevenly distributed in minority communities, driven by big data analysis of crime. Clay Shirky showed how even a small issue like assigning classes to students can be made to appear fair through statistics applied to big data when there are clearly human fingers tipping the scales. There are going to be human decisions and human prejudices built into every big data system for the foreseeable future. Policy needs to dictate what claims to fairness and justice can be made and outline how enforcement and transparency must be applied in order to be worthy of those claims.
The best way for government to speed the nation to ethical big data will be to fund things that will give us the building blocks of that system. In no particular order a non-exhaustive list of these ethical building blocks will be trusted identity, well defined ownership criteria of data generated by an individual directly and indirectly, simple and universal terms for consent to allow use of data, strong legal frameworks that protect data on behalf of citizens (even and especially from the government itself), and principles to guide the maintenance of this data over time, including addressing issues of human lifecycles (e.g. what happens to data about a person once they are dead?). There are many current proposals that apply here, e.g. NSTIC as mentioned above. All of these efforts could use more funding and focus.
(3) What technological trends or key technologies will affect the collection, storage, analysis and use of big data? Are there particularly promising technologies or new practices for safeguarding privacy while enabling effective uses of big data?
Encryption is often introduced as an effective means to protect rights in the use of data. While encryption will doubtless be part of any complete solution, today’s actual encryption is used too little and when used often presents too small a barrier for the computing power available to the determined. Improvements in encryption, such as quantum approaches, will surely be a boon to enforcement of any complete policy. The continued adoption of multi-factor authentication, now used in many consumer services, will also be an enabler to the type of strong identity controls that will be needed for a cooperative control framework between citizens and the multitude of entities that will want to use data about the citizens. As machines become better at dealing with fuzzy logic and processing natural language, there will be more opportunities to automate interactions between big data analysis and the subjects of that analysis. When machines can decide when they need to ask for permission and know how to both formulate and read responses to those questions in ways that favor the human mode of communication, there will be both more chances for meaningful decisions on the part of citizens and more easily understood records of those choices for later analysis and forensics.
(4) How should the policy frameworks or regulations for handling big data differ between the government and the private sector? Please be specific as to the type of entity and type of use (e.g., law enforcement, government services, commercial, academic research, etc.).
Policies governing interaction of government and private sector is one area where much of what is defined today can be reused. Conversely, where the system is abused today big data will multiply opportunities for abuse. For example, law enforcement data, big or not, should always require checks and balances of the judicial process. However, there is likely room for novel approaches where large sets of anonymized data produced from law enforcement could be made available to the private and educational sectors en masse as long as that larger availability is subject to some judicial check on behalf of the people in place of any individual citizen. Of course, this assumes a clear understanding of things being “anonymized” – one of many technical concepts that will need to be embedded in the jurisprudence to be applied in these circumstances. There are cracks in the current framework, though. This can allow data normally protected by regulations like HIPPA to seep out via business partners and other clearing house services that are given data for legitimate purposes but not regulated. All instances of data use at any scale must be brought under a clear and consistent policy framework if there is any hope to forge an ethical use of big data.
I’m gearing up to go to the NSTIC convened steering group meeting in Chicago next week. Naturally, my inner nerd has me reviewing the founding documents, re-reading the NSTIC docs, and combing through the by laws that have been proposed (all fo which can be found here). I am also recalling all the conversations where NSTIC has come up. One trend emerges. Many people say they think the NSTIC identity provider responsibilities are too much risk for anyone to take on. With identity breaches so common now that only targets with star power make the news, there does seem to be some logic to that. If your firm was in the business of supplying government approved identities and you got hacked then you are in even hotter water, right?
The more it rolls around in my head, the more I think the answer is: not really. Let’s think about the types of organization that would get into this line of work. One that is often cited is a mobile phone provider. Another is a website with many members. One thing these two classes of organization – and most others I hear mentioned – have in common is that they are already taking on the risk of managing and owning identities for people. They already have the burden of the consequences in the case of a breach. Would having the government seal of approval make that any less or more risky? It’s hard to say at this stage, but I’m guessing not. It could lessen the impact in one sense because some of the “blame” would rub off on the certifying entity. “Yes, we got hacked – but we were totally up to the obviously flawed standard!” If people are using those credentials in many more places since NSTIC’s ID Ecosystem ushers in this era of interoperability (cue acoustic guitar playing kumbaya), then you could say the responsibility does increase because each breach is more damage. But the flipside of that is there will be more people watching, and part of what this should do is put in place better mechanisms for users to respond to that sort of thing. I hope this will not rely on users having to see some news about the breach and change a password as we see today.
This reminds me of conversations I have with clients and prospects about single sign on in the enterprise. An analogy, in the form of a question, a co-worker came up with is a good conversation piece: would you rather have a house with many poorly locked doors or one really strongly locked door? I like it because it does capture the spirit of the issues. Getting in one of the poorly locked doors may actually get you access to one of the more secure areas of the house behind one of the better locked doors because once you’re through one you may be able to more easily move around from the inside of the house. Some argue that with many doors there’s more work for the attacker. But the problem is that also means it’s more work for the user. They may end up just leaving all the doors unlocked rather than having to carry around that heavy keychain with all those keys and remember which is which. If they had only one door, they may even be willing to carry around two keys for that one door. And the user understands better that they are risking everything by not locking that one door versus having to train them that one of the ten doors they have to deal with is more important than the others. All of this is meant to say: having lots of passwords is really just a form of security through obscurity, and the one who you end up forcing to deal with that obscurity is the user. And we’ve seen how well they choose to deal with it. So it seems to me that less is more in this case. Less doors will mean more security. Mostly because the users will be more likely to participate.
I’m in the car listening to an NPR piece about LARPing while driving between meetings. Something they say catches my ear. It seems LARPers (is that even a word?) have an impulse to create immersive identities aside from their own because they want more degrees of freedom to experience the world. In case you’re in the dark about what LARPing is (like I was), it’s Live Action Role Playing – dressing up as characters and acting out stories in real world settings as opposed to scripted controlled settings. It’s clear how Maslow’s Hierarchy of Needs applies here. You won’t find a lot of LARPing in war torn areas, or communities suffering from rampant poverty. But does a group of people having enough energy to spare in their identity establishment to want to spawn new identities to live with imply an Identity Hierarchy of Needs? Could it be that when you have enough security in the identity you need, you seek out ways to make the identity that you want more real than just going to the gym to get better abs?
Maslow’s hierarchy of needs is one of my favorite conceptual frameworks. Not only is it extremely powerful in its home context of psychology, not only is it useful in framing the psychological impacts of many things from other contexts (political, philosophical, economic), it’s also useful as a general skeleton for understanding other relationships. My marketing team recently applied it to Quest’s IAM portfolio. They framed our solutions as layers of technology that could get your house in order to achieve the far out goals of total governance and policy based access management, which they identified as Maslow’s highest order. But I’m thinking about this more in terms of pure, individual identity. Of course the technology tracks alongside that in many ways. The LARPing is what got me thinking, but the other parallels become immediately clear. How many people have multiple social networking accounts? A page for business tied to a Twitter account, a Facebook presence as a personal playground, and a LinkedIn page for a resume are standard fair for many folks in the high tech biz, and beyond. Again, it’s not likely that a blue collar factory worker would have all these identities to express themselves. Like Maslow’s original idea, there is a notion of needing the energy to spare and the right incentives to take the time. There is also an interesting socio-political dimension to this I’ll leave as an exercise to the reader.
The first question is clear: what would an identity hierarchy of needs look like? If one googles “hierarchy of needs” AND “Identity management”, there are a dizzying number of hits. So it’s not like this hasn’t been explored before. Some good ones come from Dave Shackleford who applies the hierarchy to security and R “Ray” Wang who applies it more widely to making choices about technology decisions. But these only treat IAM as an element of their whole. I want to apply it to identity by itself.
One thing I’ll borrow from Dave’s structure is the four categories he uses (from the bottom up): fundamental, important, enhancing, holistic. I won’t pretend I’m going to get this right at this point. I would love to get feedback on how to make this better. But I’ll take a stab at making this work. The assumptions here are that there is no identity without attributes. What does it mean to say “I am Jonathan” if it’s not to assert that this thing “I” has an attribute labeled “name” that is given the value “Jonathan”? And this is more than a technology thing. All notions of identity boil down to attributes and collections of attributes. The next layer deals with taking identities that are collections of attributes and giving them places in groupings. Call them roles, groups, social clubs, parishes, or whatever you like. Membership in collections help define us. The next two layers were harder to work out, at first. But then I realized it was about the turn inward. Much like Maslow’s higher level are where you work on your inner self, our identity hierarchy is about understanding and controlling our attributes and participation in collectives. First we need to realize what those are. Then we need to use this knowledge to gain the power to determine them.
Self determination is actually the perfect phrase to tie together all these thoughts. What was it about the LARPers that triggered all these thoughts? It was that they had decided to actively take control of their identities to the point of altering them, even bifurcating them. That may make it sound like I’m making them out to be the masters of the universe (and not just because some do dress up as He-Man characters). But just like some folks can live in a psychological state pretty high up on the Maslow hierarchy without putting in much effort to achieve the first few levels, the same can be true of folks in the identity hierarchy, I’d think. If you have your most important attributes defined for you by default, get assigned reasonable collectives to belong to, and even have a decent awareness of this without challenging it, then you may grow up to be the special kind of geek that likes to LARP. That pleasure derived from splitting your personality is likely something that’s largely implicit – you don’t need to understand it too deeply.
Of course, if this all feels too geeky to apply to regular folks, I can turn to what may be the oldest form of this identity splitting. The “liaisons” in the title came from a notion that maybe folks carrying out complicated affairs of the heart were trying to bifurcate their own identities in a bid to push self determination before there was any better outlet. No excuse for serial adultery, but it gives a new prism through which to view the characters in Dangerous Liaisons, perhaps. How many times in novels does the main motivation for these affairs come down to a desire for drama, romance, or a cure for bourgeois boredom? How many times on The People’s Court? The point is that just like people who have climbed to the top of Maslow’s Hierarchy may not have done so using morally good means and may not use their perch to better the world, people who are experimenting in self determination to the point of maintaining multiple identities in their lives may not be doing it for the most upstanding of reasons, either.
And how does this all relate back to the technology of IAM? Maybe it doesn’t very concretely. I’d be OK with that. It may if you consider that there are many people out there trying to hand their users self determination through IAM self service without first having a grip on what attributes make up an identity. How can you expect them to determine their fate if they have no idea what their basic makeup is? We expect users to take the reigns of managing their access rights, certifying the rights of others, and performing complicated IAM tasks. But if they ask “Why is this person in this group?” we have no good answers. Then we’re surprised at the result. So maybe this applies very well. Finally, what does this have to do with the cloud? Clearly, cloud means more identities. Many times they are created by the business seeking agility and doing things with almost no touch by IT. If the cloud providers give them a better sense of identity than you do, then that’s where they will feel more able to determine their own fate. Some may say “But that’s not fair. That cloud provider only needs to deal with a small bit of that person’s identity and so it’s easier for them!” Life is not fair. But if you establish a strong sense of what an identity is and how it belongs in collectives, gave users ways to understand that, and then enabled them to control it, you would be far ahead of any cloud provider. But it all starts with simply understanding how to ask the right questions.
I expect (and hope) to raise more questions with all of this than to answer them. This is all a very volatile bed of thoughts at the moment. I’m hoping others may have things to say to help me figure this all out. As always, I expect I’ll learn the most by talking to people about it.
I sat down with a very smart group of folks and they were saying how they think SSO is very, very hard. If your world is all Active Directory (AD), it’s easy. But that is true in a tiny percent of the world. Everywhere there is some odd ball application and in most places there are just as many applications not using AD as there are using it (even if they buy Quest solutions, sadly). The cloud, something everyone is forced to mention in every tech blog post, also complicates this. How do you do SSO when the identities aren’t under your control? Or, reverse that, how do you get SSO from your cloud vendor when your on premise applications aren’t under their control? But every time I have the SSO conversation at length with people the conclusion is always the same. If all you have are applications from the last 10 years and some cloud stuff, there are approaches, including Quest’s, that can fully solve that problem. You can integrate into your commodity AD authentication, put up SSO portals, or use widely adopted standards like SAML – or all of the above in a clever combination. Even thick client GUI applications can be tamed with enterprise SSO (ESSO) solutions at the desktop. The things that always end up falling through all the cracks are older applications. Things that are often the crown jewels of the business. Applications that are so old because they are so critical that no one can touch them without huge impact to the business. But the older technologies resist almost every attempt to bring them under control. Even ESSO, which is the catch all for so many other laggards, can’t tame many of the odd green screens, complex multi field authentications, or other odd things that some of these applications demand at the login event. When I’ve spoken to our SSO customers, they always seem happy with 70-80% adoption on their SSO projects. They know they will never get that last group until the applications change. But there doesn’t seem to be any compelling event for those applications to be changed. So SSO continues to seem hard, but we all know that’s not exactly true.
First of all, I’ll define what I mean by cloud in 10 words: cloud is outsourcing some layer of services from you infrastructure. This thought comes after meeting a large healthcare organization that’s putting their “back office” operations in an MSP. This is having a significant impact on how they are viewing administration of IT. When you own operations and administration, you can easily blend the two. If you have an administrative issue that would be made easier by shifting something about the operations of your IT resources, you do it. But when operations is a black box, then you actually have to make your administration solve all your challenges. That is new for many.
This organization is putting most of the non-clinical systems in an MSP, or in the cloud if you prefer, and that means there are many IAM challenges. Where do accounts originate? Who controls the authoritative data about users? Because so many clinical and other applications require it, they are keeping much of the directory infrastructure in house. How do changes flow in both directions when there are automated process and human admins and operators on both sides? How can all the changes from both ends be tracked? How can the state, the changes and the policies be kept in line with regulatory requirements? It’s a daunting set of challenges.
Right now they have their hands full just making it all happen. And they have plenty of parties (each site, the central IT organization, various consulting organizations, all the vendors) that are all involved in the project as it’s ongoing. When I sat down with them and many of these parties, it was hard enough just playing catch up to see who was responsible for what. We were there to discuss many of the pains they are experiencing in the phase they are in now and where Quest can help. What I immediately started to envision were the pains of the next phase. I think Quest can help with those, too, but I’m hoping they were receptive to my suggestions about it all. My basic message was that they are going to have to arm their administrators with a new kind of toolset and those administrators were going to have to have a new, leveled up approach. They were going to have to think less like technologists and more like data architects. What will matter most going forward is having very sound and robust models for data, policies and processes. Otherwise they would fall back into old ways of thinking and likely find themselves without the ability to make those level of changes to the MSP hosted systems. Or, even worse, waste time fighting with the MSP to change operational details – a fight where they finance both sides of the battle and take both sides’ losses as well.