Data portability and wagon-circling

One of the breakout tracks at EIC last week was Cloud Platforms and Data Portability. Dave Kearns had asked me to speak for a few minutes on the subject of social data portability before joining Drummond and Christian for a panel discussion.

I brainstormed a bit and suggested that I could comment on the notion of data statelessness, and the continuum of individuals’ data portability on the web. That somehow turned into a boldface uppercase talk called Data Statelessness and the Continuum of Individuals’ Data Portability on the Web. :-) (Hmm, maybe in German that boils down to a single long word…) I thought I’d share those thoughts here.

The Web is a teenager already

People have been pouring content onto it since Web 1.0. It’s enough time for there to be major failures of data portability.

For example, Geocities started in 1994 (with an offer of 2 whole Mb free!), and ended its life in 2009 with about 23 million individual pages — which were at risk of being abandoned.

300px-Archiveteam

Archive Team is one of the groups that performed “data portability of last resort”; they’ve managed to resurrect more than a terabyte of all that content…at Geociti.es.

data-portability-logo

DataPortability.org was formed in 2007, and it advocates being able to “take your data with you” to new services.

The Web 2.0 cocktail is even more potent

It’s a mix of some application’s features plus our own data contributions. The more “social” the application — that is, giving us human-to-human connection benefits — the more we drink.

But there’s always an application in the middle. It knows everything we share — and increasingly, selling access to that information is its business model.

Just a reminder…

Take a look at EFF’s compilation of Facebook privacy policies from 2005 to now.

Recall that a newspaper’s readers traditionally were not its real customers; that would, of course, be the advertisers.

Facebook’s end-users are not its customers.

They’re the product.

[Not that I’m picking on Facebook specifically. Though this news about a Facebook all-hands meeting tomorrow afternoon to “circle the wagons” is interesting…]

Solving the password anti-pattern began a new era of data portability

Was it accidental?

In 2008, Robert Scoble famously discovered that Facebook’s terms of service didn’t allow him to bulk-extract his own contact information, and they cut him off (at which point he got involved in the Data Portability effort!).

In the meantime, Facebook and Yahoo! and AOL and Google and many others have discovered how valuable it is to let third-party apps get access to fresh feeds of your data without your having to reveal your username and password.

They couldn’t exactly let these connections happen without your go-ahead, and so user delegation of authorized access was born — or at least standardized.

facespace
(click to embiggen)

BBAuth, OpenAuth, and other proprietary solutions led to OAuth (and its proprietary competitor Facebook Connect) — and now the draft OAuth 2.0, which Facebook already supports.

Third-party services getting access to your data with your okay is tantamount to you getting access through an “agent” — and not just one-time export when you leave, either, but regular fresh access for a variety of purposes. This has turned out to be a Good Thing overall for individuals’ chances at data portability.

What is data statelessness?

It’s the ability of a third-party service to think in terms of caching rather than replicating your data, because they can get it whenever they need it.

It’s the ability of a third-party service to add value without having to “own” your data.

It’s the ability for a single source of truth to arise — and for you to choose what it is.

Even weirder, it’s the ability for automatic syncing among a variety of sources of truth to arise — and for you to choose where to inject the first copy. (This is the effect when, say, you tell a bunch of your OAuth-enabled location services that they can all read from and write to each other.)

treasure-chest

Federated identity management in the enterprise has been striving for just-in-time delivery of user attributes from authoritative sources for a long time; it’s perhaps ironic that consumer-driven web companies seem to be getting there first.

Enter Data Portability Policy

Along with privacy policies, terms of service, and end-user license agreements, sites should have a (good) data portability policy — and the DataPortability.org folks are working on it.

The project is spearheaded by Steve Greenberg (of stevenwonders.com! that’s stevenwonders.com — that’s S, T, E, … sorry, inside joke among our little Data Without Borders podcast crew).

It addresses issues like:

  • Are your APIs and data formats documented?
  • Do people need to create a new identity for this site, or can they use an existing one?
  • Must people import things into this product, or can the product refer to things stored someplace else?
  • Does this product provide an open, DRM-free way for people to retrieve or access via third party all of the things they’ve created or provided?
  • Will this site delete an account and all associated data upon a user’s request?

Having standard templates for policy of this sort is immensely valuable. (And I can’t resist a mention of how UMA may be able to help us demand the kinds of policies we want our services to follow, in an automated fashion vs. ever having to read legalese.)

End of rant

Exit questions:

Is Facebook’s new Open Graph Protocol, openly published and based on semantic web standards, a good thing for data portability? What relationship does that have to privacy?

And do individuals get more empowered, or less, when lots of newer, smaller social apps flood the market looking for user-delegated authorization to connect with your data?