Tag Archives: policy

Data portability and wagon-circling

One of the breakout tracks at EIC last week was Cloud Platforms and Data Portability. Dave Kearns had asked me to speak for a few minutes on the subject of social data portability before joining Drummond and Christian for a panel discussion.

I brainstormed a bit and suggested that I could comment on the notion of data statelessness, and the continuum of individuals’ data portability on the web. That somehow turned into a boldface uppercase talk called Data Statelessness and the Continuum of Individuals’ Data Portability on the Web. :-) (Hmm, maybe in German that boils down to a single long word…) I thought I’d share those thoughts here.

The Web is a teenager already

People have been pouring content onto it since Web 1.0. It’s enough time for there to be major failures of data portability.

For example, Geocities started in 1994 (with an offer of 2 whole Mb free!), and ended its life in 2009 with about 23 million individual pages — which were at risk of being abandoned.

300px-Archiveteam

Archive Team is one of the groups that performed “data portability of last resort”; they’ve managed to resurrect more than a terabyte of all that content…at Geociti.es.

data-portability-logo

DataPortability.org was formed in 2007, and it advocates being able to “take your data with you” to new services.

The Web 2.0 cocktail is even more potent

It’s a mix of some application’s features plus our own data contributions. The more “social” the application — that is, giving us human-to-human connection benefits — the more we drink.

But there’s always an application in the middle. It knows everything we share — and increasingly, selling access to that information is its business model.

Just a reminder…

Take a look at EFF’s compilation of Facebook privacy policies from 2005 to now.

Recall that a newspaper’s readers traditionally were not its real customers; that would, of course, be the advertisers.

Facebook’s end-users are not its customers.

They’re the product.

[Not that I'm picking on Facebook specifically. Though this news about a Facebook all-hands meeting tomorrow afternoon to "circle the wagons" is interesting...]

Solving the password anti-pattern began a new era of data portability

Was it accidental?

In 2008, Robert Scoble famously discovered that Facebook’s terms of service didn’t allow him to bulk-extract his own contact information, and they cut him off (at which point he got involved in the Data Portability effort!).

In the meantime, Facebook and Yahoo! and AOL and Google and many others have discovered how valuable it is to let third-party apps get access to fresh feeds of your data without your having to reveal your username and password.

They couldn’t exactly let these connections happen without your go-ahead, and so user delegation of authorized access was born — or at least standardized.

facespace
(click to embiggen)

BBAuth, OpenAuth, and other proprietary solutions led to OAuth (and its proprietary competitor Facebook Connect) — and now the draft OAuth 2.0, which Facebook already supports.

Third-party services getting access to your data with your okay is tantamount to you getting access through an “agent” — and not just one-time export when you leave, either, but regular fresh access for a variety of purposes. This has turned out to be a Good Thing overall for individuals’ chances at data portability.

What is data statelessness?

It’s the ability of a third-party service to think in terms of caching rather than replicating your data, because they can get it whenever they need it.

It’s the ability of a third-party service to add value without having to “own” your data.

It’s the ability for a single source of truth to arise — and for you to choose what it is.

Even weirder, it’s the ability for automatic syncing among a variety of sources of truth to arise — and for you to choose where to inject the first copy. (This is the effect when, say, you tell a bunch of your OAuth-enabled location services that they can all read from and write to each other.)

treasure-chest

Federated identity management in the enterprise has been striving for just-in-time delivery of user attributes from authoritative sources for a long time; it’s perhaps ironic that consumer-driven web companies seem to be getting there first.

Enter Data Portability Policy

Along with privacy policies, terms of service, and end-user license agreements, sites should have a (good) data portability policy — and the DataPortability.org folks are working on it.

The project is spearheaded by Steve Greenberg (of stevenwonders.com! that’s stevenwonders.com — that’s S, T, E, … sorry, inside joke among our little Data Without Borders podcast crew).

It addresses issues like:

  • Are your APIs and data formats documented?
  • Do people need to create a new identity for this site, or can they use an existing one?
  • Must people import things into this product, or can the product refer to things stored someplace else?
  • Does this product provide an open, DRM-free way for people to retrieve or access via third party all of the things they’ve created or provided?
  • Will this site delete an account and all associated data upon a user’s request?

Having standard templates for policy of this sort is immensely valuable. (And I can’t resist a mention of how UMA may be able to help us demand the kinds of policies we want our services to follow, in an automated fashion vs. ever having to read legalese.)

End of rant

Exit questions:

Is Facebook’s new Open Graph Protocol, openly published and based on semantic web standards, a good thing for data portability? What relationship does that have to privacy?

And do individuals get more empowered, or less, when lots of newer, smaller social apps flood the market looking for user-delegated authorization to connect with your data?

Both a data borrower and a data lender be

Christian Scholz and his Data Portability Project pals have roped me into their Data Without Borders podcasts. On Friday, Christian and Trent Adams and Steve Greenberg and I had some fun relaunching the series by talking about the DPP Terms of Service and End-User License Agreement (TOS/EULA) task force.

Steve was passionate in describing this work. I think he’s right when he says that you first have to ensure that people are aware of a site’s terms of service; disclosing them in a form human beings can grok (à la Creative Commons or the nutrition label approach I wrote about here) can begin to empower humans to change things if they so desire, using a variety of means.

At one point we talked about the Archive Team project run by Jason Scott, which I think of as “data portability of last resort”. These folks are like digital historian ninjas who swoop in to save data that might otherwise be lost forever — like everything on GeoCities.

The thing is, website-sanctioned bulk import and export of data isn’t all that huge an improvement on this kind of rescue operation. True data portability wants granularity and timeliness. For example, if you choose to host (so to speak) your current location info at FireEagle, you might still want to reuse it in other places for other purposes, and luckily OAuth lets FireEagle, Dopplr etc. give you a nimble and safe way to “port” this data back and forth.

This is a kind of data statelessness, in that when you tell various sites they can set, read, and republish your location, they’re letting go of any pretense of exclusive hosting control so that they can offer you a different kind of value.

Now, in the IdM and VRM worlds, some of us have been talking about identity statelessness for a while, which is similar but looks more like straight data-sharing (reading) rather than arbitrary service access (setting). For some reason this is a tougher sell — even though CRM systems and user accounts are shot through with pale copies of stale data (and, in the enterprise case, even though syncing directories and replicating databases is brittle and no fun).

Even when one party — say, you yourself — is authoritative for some piece of personal data (like your home address), all the sites insist on making you provision a copy of this data into their profile pages by hand and by value, and insist on thinking they own something truly valuable even after you move and forget to tell them.

In short: To the extent data is volatile, copies of it leak value. If the chain of evidence between its authoritative source and a recipient of data is broken, it quickly becomes value-free. And if the chain of authorization breaks, you’ve got digital shadow cruft. Why oh why can’t we get to a place where, as Scott Cantor put it to me once, identity-aware apps think in terms of data caching rather than data replication?

The Data Portability TOS/EULA work is helping us raise our standards for what true data portability should look like: Open Arms – Ever Fresh – Graceful Exit. OAuth already helps us get a bit beyond disclosure of site terms, closer to a world where users have an active say in what sites do with our stuff. I’m hoping UMA (recent deep-dive Technometria podcast here) can help us go even further because of its notion of user-dictated terms that recipients must meet in order to have the privilege of fresh access.

We’re likely to discuss this topic in the DWB podcast sometime soon, so I hope you’ll give a listen.