(1) What are the public policy implications of the collection, storage, analysis, and use of big data? For example, do the current U.S. policy framework and privacy proposals for protecting consumer privacy and government use of data adequately address issues raised by big data analytics?
Current policy is not sufficient to address big data issues. There are many good proposals, but the speed with which they are taking shape means they are being lapped by the constantly shifting realities of the technology they are meant to shape. NSTIC (National Strategy for Trusted Identities in Cyberspace http://is.gd/JOrjCw) has been an excellent example of this. That digital identity is core to properly addressing big data should be obvious. How can we hope to protect privacy if we cannot identify the proper steward of data? How can we identify data stewards if the people who ought to be identified have no consistent digital identity? The very founding notion of NSTIC, trusted identities, begs the question of if we are prepared to approach empowering people via assigning responsibility. If we do not have identities that can be trusted, then we don’t even have one of the basic building blocks that would be required to approach big data as a whole.
That said, the implications that big data has are too large to ignore. In “The Social, Cultural & Ethical Dimensions of Big Data” (http://is.gd/EGe7tD), Tim Hwang raised the notion that data is the basic element in (digital) understanding; and further that understanding can lead to influence. This is the big data formulation of the notion that knowledge leads to rights, and rights lead to power – the well tested idea of Michel Foucault. In the next century, the power of influence will go to those who have understanding culled from big data. This will be influence over elections, economies, social movements and the currency that will drive them all – attention. People create big data in what they do, but they also absorb huge amounts of data in doing so. The data that can win attention will win arguments. The data that gets seen will influence all choices. We see this on the internet today as people are most influenced not by what they read which is correct but rather what they see that holds their attention. And gaining that influence seems to be playing out as a winner takes all game. With nothing short of the ethical functioning of every aspect of human life on the line, big data policy implications cannot be understated.
(2) What types of uses of big data could measurably improve outcomes or productivity with further government action, funding, or research? What types of uses of big data raise the most public policy concerns? Are there specific sectors or types of uses that should receive more government and/or public attention?
The amount of data involved in some big data analysis and the startlingly complex statistical and mathematical methods used to power them give an air of fairness. After all, if it’s all real data powered by cold math, what influence could there be hiding in the conclusions? It is when big data is used to portray something as inherently fair, even just, that we need to be the most concerned. Any use of big data that is meant to make things “more fair” or “evenly balanced” should immediately provoke suspicion and incredulity. Just a small survey of current big data use shows this to be true. Corrine Yo from the Leadership Conference gave excellent examples of how surveillance is unevenly distributed in minority communities, driven by big data analysis of crime. Clay Shirky showed how even a small issue like assigning classes to students can be made to appear fair through statistics applied to big data when there are clearly human fingers tipping the scales. There are going to be human decisions and human prejudices built into every big data system for the foreseeable future. Policy needs to dictate what claims to fairness and justice can be made and outline how enforcement and transparency must be applied in order to be worthy of those claims.
The best way for government to speed the nation to ethical big data will be to fund things that will give us the building blocks of that system. In no particular order a non-exhaustive list of these ethical building blocks will be trusted identity, well defined ownership criteria of data generated by an individual directly and indirectly, simple and universal terms for consent to allow use of data, strong legal frameworks that protect data on behalf of citizens (even and especially from the government itself), and principles to guide the maintenance of this data over time, including addressing issues of human lifecycles (e.g. what happens to data about a person once they are dead?). There are many current proposals that apply here, e.g. NSTIC as mentioned above. All of these efforts could use more funding and focus.
(3) What technological trends or key technologies will affect the collection, storage, analysis and use of big data? Are there particularly promising technologies or new practices for safeguarding privacy while enabling effective uses of big data?
Encryption is often introduced as an effective means to protect rights in the use of data. While encryption will doubtless be part of any complete solution, today’s actual encryption is used too little and when used often presents too small a barrier for the computing power available to the determined. Improvements in encryption, such as quantum approaches, will surely be a boon to enforcement of any complete policy. The continued adoption of multi-factor authentication, now used in many consumer services, will also be an enabler to the type of strong identity controls that will be needed for a cooperative control framework between citizens and the multitude of entities that will want to use data about the citizens. As machines become better at dealing with fuzzy logic and processing natural language, there will be more opportunities to automate interactions between big data analysis and the subjects of that analysis. When machines can decide when they need to ask for permission and know how to both formulate and read responses to those questions in ways that favor the human mode of communication, there will be both more chances for meaningful decisions on the part of citizens and more easily understood records of those choices for later analysis and forensics.
(4) How should the policy frameworks or regulations for handling big data differ between the government and the private sector? Please be specific as to the type of entity and type of use (e.g., law enforcement, government services, commercial, academic research, etc.).
Policies governing interaction of government and private sector is one area where much of what is defined today can be reused. Conversely, where the system is abused today big data will multiply opportunities for abuse. For example, law enforcement data, big or not, should always require checks and balances of the judicial process. However, there is likely room for novel approaches where large sets of anonymized data produced from law enforcement could be made available to the private and educational sectors en masse as long as that larger availability is subject to some judicial check on behalf of the people in place of any individual citizen. Of course, this assumes a clear understanding of things being “anonymized” – one of many technical concepts that will need to be embedded in the jurisprudence to be applied in these circumstances. There are cracks in the current framework, though. This can allow data normally protected by regulations like HIPPA to seep out via business partners and other clearing house services that are given data for legitimate purposes but not regulated. All instances of data use at any scale must be brought under a clear and consistent policy framework if there is any hope to forge an ethical use of big data.
I’m gearing up to go to the NSTIC convened steering group meeting in Chicago next week. Naturally, my inner nerd has me reviewing the founding documents, re-reading the NSTIC docs, and combing through the by laws that have been proposed (all fo which can be found here). I am also recalling all the conversations where NSTIC has come up. One trend emerges. Many people say they think the NSTIC identity provider responsibilities are too much risk for anyone to take on. With identity breaches so common now that only targets with star power make the news, there does seem to be some logic to that. If your firm was in the business of supplying government approved identities and you got hacked then you are in even hotter water, right?
The more it rolls around in my head, the more I think the answer is: not really. Let’s think about the types of organization that would get into this line of work. One that is often cited is a mobile phone provider. Another is a website with many members. One thing these two classes of organization – and most others I hear mentioned – have in common is that they are already taking on the risk of managing and owning identities for people. They already have the burden of the consequences in the case of a breach. Would having the government seal of approval make that any less or more risky? It’s hard to say at this stage, but I’m guessing not. It could lessen the impact in one sense because some of the “blame” would rub off on the certifying entity. “Yes, we got hacked – but we were totally up to the obviously flawed standard!” If people are using those credentials in many more places since NSTIC’s ID Ecosystem ushers in this era of interoperability (cue acoustic guitar playing kumbaya), then you could say the responsibility does increase because each breach is more damage. But the flipside of that is there will be more people watching, and part of what this should do is put in place better mechanisms for users to respond to that sort of thing. I hope this will not rely on users having to see some news about the breach and change a password as we see today.
This reminds me of conversations I have with clients and prospects about single sign on in the enterprise. An analogy, in the form of a question, a co-worker came up with is a good conversation piece: would you rather have a house with many poorly locked doors or one really strongly locked door? I like it because it does capture the spirit of the issues. Getting in one of the poorly locked doors may actually get you access to one of the more secure areas of the house behind one of the better locked doors because once you’re through one you may be able to more easily move around from the inside of the house. Some argue that with many doors there’s more work for the attacker. But the problem is that also means it’s more work for the user. They may end up just leaving all the doors unlocked rather than having to carry around that heavy keychain with all those keys and remember which is which. If they had only one door, they may even be willing to carry around two keys for that one door. And the user understands better that they are risking everything by not locking that one door versus having to train them that one of the ten doors they have to deal with is more important than the others. All of this is meant to say: having lots of passwords is really just a form of security through obscurity, and the one who you end up forcing to deal with that obscurity is the user. And we’ve seen how well they choose to deal with it. So it seems to me that less is more in this case. Less doors will mean more security. Mostly because the users will be more likely to participate.