Security/identity / XML · 2006-06-12

What to do about promiscuity

I’m referring, of course, to users’ identity habits, which as many people have noted (including myself, in a W3C position paper) are far more promiscuous than we might wish. How can we work towards more robust privacy and security if people simply don’t care? What does it take to get people to shut up about themselves?

An article in New Scientist reports that the NSA is researching “mass harvesting of the information that people post about themselves on social networks.” One example given seems benign and even useful:

The research ARDA [the Advanced Research Development Activity] funded was designed to see if the semantic web could be easily used to connect people. The research team chose to address a subject close to their academic hearts: detecting conflicts of interest in scientific peer review. Friends cannot peer review each other’s research papers, nor can people who have previously co-authored work together.

So the team developed software that combined data from the RDF tags of online social network Friend of a Friend (, where people simply outline who is in their circle of friends, and a semantically tagged commercial bibliographic database called DBLP, which lists the authors of computer science papers.

Joshi [one of the team’s leaders] says their system found conflicts between potential reviewers and authors pitching papers for an internet conference. “It certainly made relationship finding between people much easier,” Joshi says. “It picked up softer [non-obvious] conflicts we would not have seen before.”

The article places more emphasis on RDF and the formal semantic web than I think is warranted; arbitrary (well-documented) XML formats, microformats, and even HTML used in a regularized manner can be harvested or at least screen-scraped. And it’s actually very hard to do precise equivalence mapping between RDF (or any!) schemas in practice, just because taxonomies in the real world are so messy (are “given names” and “first names” and “Christian names” the same thing?). So it’s likely that well-known attribute schemas of whatever type will be just as effective targets for harvesting as RDF schemas will be. But the point remains: greater data portability and more accessible semantics for personal information add up to easier harvesting by other parties, whether they wear black hats or white.

Even if users have the opportunity to give informed consent, in many cases they may choose not to spend time thinking hard on the consequences of allowing access — possibly a form of rational ignorance if they never pay those consequences. An example from a more general context appears in Bill Cheswick’s talk from the the inaugural SOUPS conference:

To most attendees, it came as no surprise that the Cheswick found his father’s Windows machine chock-full of adware and spyware. Also unsurprising was the fact that even after a full cleanup, the machine was infected again within weeks (when the speaker visited his father next). Here’s the punch-line: the father was adamant that none of the security “fixes” or “solutions” break his machine. After all, explicit and annoying pop-up ads notwithstanding, he was still getting his work done, wasn’t he? Why fix something that ain’t broke?

(SOUPS is the “Symposium on Usable Privacy and Security”; its 2006 program looks incredibly meaty — soup-to-nutsy? — and I sure wish I could go.)

For those who do want to exercise more care, or if the consequences begin to be felt (Tag Boy points me to this example of googling-before-hiring), applying strong human-computer interaction principles in identity UIs should help in reducing misunderstanding and fatigue. And we could allow users to set up policies for avoiding annoying interactions involving identity exchange — reserving synchronous interaction for garnering point-of-“sale” consent for areas with a large potential for loss (of privacy, money, or whatever). Identity Rights Agreements could be a useful tactic, if users can get to know the options and if the interfaces for managing them are value-add rather than value-subtract.

The article concludes, in part:

… Tim Finin, a colleague of Joshi’s, thinks the spread of such technology is unstoppable. “Information is getting easier to merge, fuse and draw inferences from. There is money to be made and control to be gained in doing so. And I don’t see much that will stop it,” he says.

I’ve mentioned some forces that could potentially “stop it”, but people still have to want them. Let’s say that the perfect interfaces have been developed and people use them to set up policy-based bounds on identity sharing. But a really awesome new social networking program is all the rage and it requires fairly wide access in order to provide, say, genealogy linkages. A user has clicked all the right buttons to prove that they have given consent, or they’ve selected the desired identity “card” and sent it along. Their information gets used in some cool new way that was accounted for by the consent they gave, but embarrasses them or gets them into hot water. Has a confidence been breached? Are they just SOL?

Is it possible to come up with a “do what I mean” button for identity info exchange?