Tag Archives: privacy

People and online services: leaving value on the table

The recent Google-Facebook flap demonstrates that the hottest battleground for users’ control of the data they pump into these online services is the sites’ Terms of Service. Why? Because when you’re not a paying customer, you’re not in a hugely strong bargaining position. As I put it to ReadWriteWeb in their piece on data portability implications of the debate: Facebook’s end-users are not its customers; they’re the product. (Or as my Data Without Borders pal Steve Greenberg sometimes puts it, users are crops…getting harvested. Oh dear.)

For all “free” online services, it’s worthwhile to ask: What am I paying instead? If it’s not money, is it attention to ads? …behavioral cues about myself and my preferences? …personally identifiable data? …beta-testing time? …what, exactly? Payment for services rendered isn’t a bad thing. But it’s always something, and you might as well not be a chump.

That’s why I like Frank Catalano’s new TechFlash post viewing personal data sharing through an economic lens and discussing how to barter your data more equitably. Regarding his second point, “hide”: I’d actually be thrilled if more online services that were marketed to individuals offered a premium for-pay option; it would keep out the riff-raff and give people more meaningful control over their relationships with the companies offering the services.

It’s not just individuals who are leaving something on the table, though. I think there’s a big untapped market in selective sharing, which is like “privacy” (poor abused word), without the assumption that minimal disclosure is the be-all and end-all. What would you start sharing with a selective set of people and businesses, if you could have confidence that your expectations around context, control, choice, and respect would be met?

That’s why I think Dave McClure has it right with his notion of intimacy as a market opportunity Facebook currently has no idea how to address. (“maybe I only want to tell a few close buddies about that episode with the VERY BAD bean burrito” — yeah, thanks for keeping this sharing episode VERY selective. :-)

And that’s why I think Esther Dyson doesn’t quite have it right in saying privacy is a marketing problem. Her exhortation to “Know your customer, and talk to that person as an individual, not as someone in a bucket” has a natural barrier: Facebook and others are serving their actual customers very well indeed by, uh, making more product.

And that’s why I think User-Managed Access could help: Becoming paying customers of services that need our data is good. But becoming, in addition, producers of data products as peers in a selective data-sharing network, and dictating our own Terms of Access for getting to them, is even better.

Aiming for data usage control

Earlier this week, W3C held a workshop on privacy and data usage control. Among the submitted position papers are quite a few interesting thoughts, and though I couldn’t attend the workshop, it will be good to see the eventual report from it.

I did manage to submit a paper that explores the contributions of User-Managed Access (UMA) to letting people control the usage of their personal data. It was a chance to capture an important part of the philosophy we bring to our work, and the challenges that remain. From the paper’s introduction:

…UMA allows a user to make demands of the requesting side in order to test their suitability for receiving authorization. These demands can include requests for information (such as “Who are you?” or “Are you over 18?”) and promises (such as “Do you agree to these non-disclosure terms?” or “Can you confirm that your privacy and data portability policies match my requirements?”).

The implications of these demands quickly go beyond cryptography and web protocols and into the realm of agreements and liability. UMA values end-user convenience, development simplicity, and web-wide adoption, and therefore it eschews such techniques as DRM. Instead, it puts a premium on user visibility into and control over access criteria and the authorization lifecycle. UMA also seeks at least a minimum level of enforceability of authorization agreements, in order to make the act of granting resource access truly informed, uncoerced, and meaningful. Granting access to data is then no longer a matter of mere passive consent to terms of use. Rather, it becomes a valuable offer of access on user-specified terms, more fully empowering ordinary web users to act as peers in a network that enables selective sharing.

Some of the challenges are technical, some legal, and some related to business incentives. The paper approaches the discussion with what I hope is a sense of realism, along with some justified optimism about near-term possibilities.

(Speaking of which, I like the realism pervading Ben Laurie’s recent criticism of the EFF’s suggested bill of privacy rights for social network users. He cautions them to stay away from implicitly mandating mechanisms like DRM — and, in focusing on broader aims, to be careful what they wish for.)

If you’re so inclined, I hope you’ll check out the paper and the other workshop inputs and outputs.

Personal RFP Model and Information Sharing Report

The Kantara Information Sharing group, led by the intrepid Joe Andrieu and Iain Henderson, has been doing a ton of work to make the business justifications for Vendor Relationship Management scenarios concrete and and the use cases actionable.

The group has two documents out for review, and seeks your input. (I’m really tardy blogging this; comments are due tomorrow, but I’m sure they’d be welcome even coming in a little late…) See Joe’s writeup for document links and descriptions.

Here’s a taste of the pRFP document:

Sally uses a Personal Request for Proposal (pRFP) to solicit offers for, negotiate, and purchase a new car through the MyPal pRFP Broker. She has previously researched her options and made up her mind about the kind of car she wants to buy. She has also secured financing and credentials asserting that fact. Sally’s information is maintained in a personal data store which provides it on demand for use by service providers and vendors. On the Vendor side, Frank at Chryota of London responds to Sally’s Personal RFP (pRFP), using a hands‐on approach that integrates CoL’s CRM system, MyPal, and Chryota Manufacturing’s CRM program HEARING AID, which is managed by Jimmy.

The Info Sharing Report is interesting too, but in a totally different way; it’s chock full of interesting statistics and trends around the cost of acquiring customers and the privacy pitfalls of the current ecosystem.

Check ‘em out, and send in your thoughts.

Identity tweetup at OASIS conference next week

Ian Glazer and I were planning a get-together next week at the OASIS Identity Management conference in D.C., and he suggested we make it a tweetup (bona fides established here). So if you’re in town because of the conference, or just…around, join us at Buffalo Billiards next Monday at 6ish.

The agenda looks solid, and since it’s arranged in a single track, should get some intensity going. I’m looking forward to participating in the privacy/identity/cloud computing session led by Jim Harper on Monday.

The conference hashtag is #oasisidm (RSS). If you can’t make it out, you can at least follow the fun from home.

(For all pool hustlers flying in, remember: cue sticks are prohibited items…)

A privacy fear factor Venn

The excellent Wall Street Journal online privacy series got me thinking of a new Venn of human-to-application interaction, sort of an evil twin of this one.

Intersection A ∩ C ∩ U might be a video that starts playing the moment you visit a site with sound you can’t turn off … showing you a marketing message that seems eerily connected to your ongoing search for a new car … when you realize the video is of yourself at home looking at car reviews online.

(Cue dramatic music.)

Where web and enterprise meet on user-managed access

Phil Hunt shared some musings on OAuth and UMA recently. His perspective is valuable, as always. He even coined a neat phrase to capture a key value of UMA’s authorization manager (AM) role: it’s a user-centric consent server. Here are a couple of thoughts back.

In the enterprise, an externalized policy decision point represents classic access management architecture, but in today’s Web it’s foreign. UMA combines both worlds with the trick of letting Alice craft her own access authorization policies, at an AM she chooses. She’s the one likeliest to know which resources of hers are sensitive, which people and services she’d like to share access with, and what’s acceptable to do with that access. With a single hub for setting all this up, she can reuse policies across resource servers and get a global view of her entire access landscape. And with an always-on service executing her wishes, in many cases she can even be offline when an access requester comes knocking. In the process, as Phil observes, UMA “supports a federated (multi-domain) model for user authorization not possible with current enterprise policy systems.”

Phil wonders about privacy impacts of the AM role given its centrality. In earlier federated identity protocol work, such as Liberty’s Identity Web Services Framework, it was assumed that enterprise and consumer IdPs could never be the authoritative source of all interesting information about a user, and that we’d each have a variety of attribute authorities. This is the reality of today’s web, expanding “attribute” to include “content” like photos, calendars, and documents. So rather than having an über-IdP attempt to aggregate all Alice’s stuff into a single personal datastore — presenting a pretty bad panoptical identity problem in addition to other challenges — an AM can manage access relationships to all that stuff sight unseen. Add the fact that UMA lets Alice set conditions for access rather than just passively agree to others’ terms, and I believe an AM can materially enhance her privacy by giving her meaningful control.

Phil predicts that OAuth and UMA will be useful to the enterprise community, and I absolutely agree. Though the UMA group has taken on an explicitly non-enterprise scope for its initial work, large-enterprise and small-business use cases keep coming up, and cloud computing models keep, uh, fogging up all these distinctions. (Imagine Alice as a software developer who needs to hook up the OAuth-protected APIs of seven or eight SaaS offerings in a complex pattern…) Next week at the Cloud Identity Summit I’m looking forward to further exploring the consumer-enterprise nexus of federated access authorization.

Data portability and wagon-circling

One of the breakout tracks at EIC last week was Cloud Platforms and Data Portability. Dave Kearns had asked me to speak for a few minutes on the subject of social data portability before joining Drummond and Christian for a panel discussion.

I brainstormed a bit and suggested that I could comment on the notion of data statelessness, and the continuum of individuals’ data portability on the web. That somehow turned into a boldface uppercase talk called Data Statelessness and the Continuum of Individuals’ Data Portability on the Web. :-) (Hmm, maybe in German that boils down to a single long word…) I thought I’d share those thoughts here.

The Web is a teenager already

People have been pouring content onto it since Web 1.0. It’s enough time for there to be major failures of data portability.

For example, Geocities started in 1994 (with an offer of 2 whole Mb free!), and ended its life in 2009 with about 23 million individual pages — which were at risk of being abandoned.

300px-Archiveteam

Archive Team is one of the groups that performed “data portability of last resort”; they’ve managed to resurrect more than a terabyte of all that content…at Geociti.es.

data-portability-logo

DataPortability.org was formed in 2007, and it advocates being able to “take your data with you” to new services.

The Web 2.0 cocktail is even more potent

It’s a mix of some application’s features plus our own data contributions. The more “social” the application — that is, giving us human-to-human connection benefits — the more we drink.

But there’s always an application in the middle. It knows everything we share — and increasingly, selling access to that information is its business model.

Just a reminder…

Take a look at EFF’s compilation of Facebook privacy policies from 2005 to now.

Recall that a newspaper’s readers traditionally were not its real customers; that would, of course, be the advertisers.

Facebook’s end-users are not its customers.

They’re the product.

[Not that I'm picking on Facebook specifically. Though this news about a Facebook all-hands meeting tomorrow afternoon to "circle the wagons" is interesting...]

Solving the password anti-pattern began a new era of data portability

Was it accidental?

In 2008, Robert Scoble famously discovered that Facebook’s terms of service didn’t allow him to bulk-extract his own contact information, and they cut him off (at which point he got involved in the Data Portability effort!).

In the meantime, Facebook and Yahoo! and AOL and Google and many others have discovered how valuable it is to let third-party apps get access to fresh feeds of your data without your having to reveal your username and password.

They couldn’t exactly let these connections happen without your go-ahead, and so user delegation of authorized access was born — or at least standardized.

facespace
(click to embiggen)

BBAuth, OpenAuth, and other proprietary solutions led to OAuth (and its proprietary competitor Facebook Connect) — and now the draft OAuth 2.0, which Facebook already supports.

Third-party services getting access to your data with your okay is tantamount to you getting access through an “agent” — and not just one-time export when you leave, either, but regular fresh access for a variety of purposes. This has turned out to be a Good Thing overall for individuals’ chances at data portability.

What is data statelessness?

It’s the ability of a third-party service to think in terms of caching rather than replicating your data, because they can get it whenever they need it.

It’s the ability of a third-party service to add value without having to “own” your data.

It’s the ability for a single source of truth to arise — and for you to choose what it is.

Even weirder, it’s the ability for automatic syncing among a variety of sources of truth to arise — and for you to choose where to inject the first copy. (This is the effect when, say, you tell a bunch of your OAuth-enabled location services that they can all read from and write to each other.)

treasure-chest

Federated identity management in the enterprise has been striving for just-in-time delivery of user attributes from authoritative sources for a long time; it’s perhaps ironic that consumer-driven web companies seem to be getting there first.

Enter Data Portability Policy

Along with privacy policies, terms of service, and end-user license agreements, sites should have a (good) data portability policy — and the DataPortability.org folks are working on it.

The project is spearheaded by Steve Greenberg (of stevenwonders.com! that’s stevenwonders.com — that’s S, T, E, … sorry, inside joke among our little Data Without Borders podcast crew).

It addresses issues like:

  • Are your APIs and data formats documented?
  • Do people need to create a new identity for this site, or can they use an existing one?
  • Must people import things into this product, or can the product refer to things stored someplace else?
  • Does this product provide an open, DRM-free way for people to retrieve or access via third party all of the things they’ve created or provided?
  • Will this site delete an account and all associated data upon a user’s request?

Having standard templates for policy of this sort is immensely valuable. (And I can’t resist a mention of how UMA may be able to help us demand the kinds of policies we want our services to follow, in an automated fashion vs. ever having to read legalese.)

End of rant

Exit questions:

Is Facebook’s new Open Graph Protocol, openly published and based on semantic web standards, a good thing for data portability? What relationship does that have to privacy?

And do individuals get more empowered, or less, when lots of newer, smaller social apps flood the market looking for user-delegated authorization to connect with your data?

The Economist and “ecto gammat”

Remember in The Fifth Element when Leeloo threatens to shoot Korben Dallas for stealing a kiss, saying “ecto gammat”? Turns out it means “never without my permission”. A good rallying cry for personal data sharing in today’s world!

The Economist has a thoughtful article called The Data Deluge on the benefits, and the privacy risks, of making better use of the torrent of data (it mostly focuses on, but doesn’t ever say, “personal” data) being generated in all kinds of business and marketplace endeavors. My favorite part, ’cause I share this assumption with the author:

The best way to deal with these drawbacks of the data deluge is, paradoxically, to make more data available in the right way, by requiring greater transparency in several areas. First, users should be given greater access to and control over the information held about them, including whom it is shared with.

This article makes a great companion to this meaty blog post by Iain Henderson laying out a serious vision for the notion of a personal datastore as a personal data warehouse. Iain knows whereof he speaks; he’s been in the CRM business a long time, and runs the Kantara InfoSharing work group (along with Joe Andrieu, another thoughtful guy who’s passionate about this stuff). I’m lucky to have both of them on my entirely complementary User-Managed Access group, UMA serving as a technological match for InfoSharing use cases.

I tried to add a comment to the Economist article about an aspect it didn’t cover: the quality of the personal data that’s floating around. Either this commenting effort completely failed, or in the fullness of time three copies of the same comment will appear — sigh. In the spirit of using this blog as my pensieve, here’s the main bit:


Volatile data goes stale. Excessive data collected directly from people is often larded with, to put it bluntly, lies. (To acquire a comment account on this site, I was required to provide my given name, surname, email address, country of residence, gender, and year of birth. If everyone were totally honest when signing up, that’s a powerful set of facts with which to locate and track them pretty precisely. You can tell which fields are excessive by looking at which ones people lie to…) And data collected silently through our behavior is, at best, second-hand and can never know our true intent.

Privacy is not secrecy (says digital identity analyst Bob Blakley). It is context, control, choice, and respect. Ideal levels of personal data sharing may actually be higher in total than now — but more selective. And they won’t be interesting to people without offering convenience at the same time.


Wouldn’t it be great to get out of the defensive crouch of “never without my permission” and turn it into “with my permission, sure, why not, it’ll help me just as much as it will help you”?

(Any bets on whether I told the truth and nothing but the truth when I registered at the Economist site?)

Digital shadow cruft

Robin Wilton’s post on Google Buzz hits the nail(s) right on the head(s). The benefits of social networking center on human-to-human connectedness and collaboration, but the entire “social networking” construct obscures the fact that it’s really human-to-application-to-human. In revealing information that its users never authorized nor expected to be revealed, Google has created digital shadow cruft.

How to rest assured

Everybody’s talking about identity assurance these days, meaning, generically, the confidence a relying party needs to have in the identity information it’s getting about someone so that it can manage its risk exposure.

A lot of the conversation to date has revolved around NIST Special Publication 800-63 (newer draft version here) and its global cousins, which boil down assurance into four levels — hence all the loose talk of LOA (for “level of assurance” or sometimes AL for “assurance level”), even when people aren’t focusing on specific levels or even systems of assurance numbering. NIST 800-63 is intended to answer the use cases defined in OMB Memo 04-04, which deals with making sure users of the U.S. Federal government’s online systems are who they purport to be. Here’s an example given in OMB M-04-04 for one particular need for level 3 assurance:

A First Responder accesses a disaster management reporting website to report an incident, share operational information, and coordinate response activities.

And here’s how NIST 800-63 defines assurance (I’m quoting the Dec 2008 draft here; strangely, the official Apr 2006 version doesn’t include a formal definition):

In the context of OMB M-04-04 and this document, assurance is defined as 1) the degree of confidence in the vetting process used to establish the identity of an individual to whom the credential was issued, and 2) the degree of confidence that the individual who uses the credential is the individual to whom the credential was issued.

So there’s an identity proofing component at registration time that nails down the precise real-world human being being referred to, and there’s a security/protocol soundness/authentication component at run time that establishes that the credential is being waved around legitimately. These get added up into four levels defined roughly like this (leaving aside the security and protocol soundness factors):

nist-matrix

(Here, “same unique user” means that the same user can be correlated by the RP across sessions. And “verified name provided” means that the user’s real-world name is exposed to the RP, versus some sort of pseudonym; level 1, where no proofing is done, is implicitly pseudonymous, while level 2 offers a choice.)

I don’t mean at all to criticize this rolled-up four-level approach. It seems to have met the needs set out in M-04-04, and it predated both the “user-centric” movement (Dale Olds has a nice rundown of its use cases here) and truly modern notions of online privacy.

But I think we need more clarity about assurance use cases and terminology, for two reasons: One is to help ensure that identity providers can give RPs what they need, rather than what might just be a poor approximation based on NIST 800-63′s fame. The other is to help ensure that IdPs give RPs only what they need, since more assurance is likely to involve more personal information exposure.


To that end, let me explain some assurance use case buckets I’m seeing in the wild, and their relationship to the NIST requirements and each other. First, here are some use case buckets hiding in plain sight in the NIST levels:

buried-use-cases

Simple cross-session correlation: While NIST 800-63 doesn’t formally include “same unique user” as a goal, it’s in there:

Level 1 – Although there is no identity proofing requirement at this level, the authentication mechanism provides some assurance that the same claimant is accessing the protected transaction or data.

Funnily enough, cross-session correlation (without the baggage of proofing) is a key requirement of many enterprise and Web federated identity interactions. Lots of sites don’t need or want to know you’re a dog; they just need to know you’re the same dog as last time. This way, they can authorize various kinds of ongoing access and give you something of a personalized experience across sessions. Though NIST treats this as an also-ran and couples it with weak authentication in level 1, other use cases may have reason to match up “mere correlation” with higher authentication.

Identity proofability: If an RP can trust that it’s dealing with a human being who has some level of serious representation in civil society, it’s a powerful kind of assurance for lots of purposes. More about this below.

Real-world identity mapping: When level 3 or 4, or verified-name level 2, is used, this means a user’s real name is used to build up the unique identifier that the RP sees, and this verified name leaks PII like crazy, even if it’s not itself unique. (As far as I know, I’m the only Eve Maler out there…) This is strong stuff, and in a modern federated identity environment, it is to be hoped that most RPs simply don’t need this information. (John Bradley — that is, the John Bradley who works with the U.S. government on its ICAM Open Identity Solutions program — tells me he believes pseudonyms should be an acceptable choice all up and down the four levels, indicating that this use case bucket is fairly rare.)


Now things get really interesting, because there are other use case buckets that you can sort of see in this matrix if you squint, but really they’re just different:

addl-use-cases

Anonymous authorization/personalization: This is the flip side of cross-session correlation. OMB M-04-04 talks about “attribute authentication” and the potential for user attributes to serve as “anonymous credentials” (where an RP simply can’t know if this is the same unique user coming back but can still base its authorization decisions and personalization actions on the veracity of the attributes it’s getting). The attributes in question can range from “this user is over 18″ to “this user is a student at University ABC” to “this user is of nationality XYZ”.

Ultimately M-04-04 puts the whole area of attribute authentication firmly out of scope, but lots of folks have been picking at the general problem of attribute assurance in the last several months — like Internet2 in its Tao of Attributes workshop, and the Concordia group in a forthcoming survey (stay tuned for more on that).

This bucket often requires being able to check who issued some assertion or claim, and considering whether they’re properly authoritative for that kind of info. The way I think about this is: Who has the least incentive to lie? That’s why you can be said to be truly authoritative for self-asserted preferences such as “aisle vs. window”. Any other way lies madness (“What is your favorite color?” “Blue. No yel– Auuuuuuuugh!”).

Of course, there are cases where an RP really does need attribute assurance along with other kinds, like correlation or identity mapping. And don’t forget that it takes precious little in the way of personal information for an RP to figure out “who you really are” anyway. (Check out this cool Tao of Attributes diagram, which touches on all these points.)

Financial engagement: Sometimes an RP just just wants some assurance they’re dealing with someone who has sufficient ties to the world’s legitimate financial systems not to screw them over entirely. It turns out that identity proofability can often be a serviceable proxy for this kind of confidence. (Financial account numbers are one kind of proofing documentation in NIST 800-63.) And the reverse is also true: financial engagement can sometimes give a modicum of confidence in identity proofability.

Interestingly, this bucket can be useful even without any of the other kinds, partly because the parties can lean on a mature parallel financial system instead of just lobbing identifiers and attributes all over the place. For example, users often “self-assert” credit card numbers (which RPs then validate out of band with the card issuer), or use third-party payment services like PayPal (where the service provider does a lot of the risk-calculation heavy lifting).


No doubt there are other assurance use cases. Understanding them more deeply can, I think, help us get better at sharing the truth and nothing but the truth about people online — without having to expose the whole truth.

(Thanks to John Bradley, Jeff Hodges, and Andrew Nash for comments on early drafts of this post. And check out Paul Madsen’s many excellent commentaries on assurance matters.)