Speculating on PRISM

PRISM is out.  And there’s been a lot written about it.  From denials at the very top of some of the world’s (nee America’s) biggest internet and technology companies to lawmakers defending the approach.  It’s also emerged that the UK’s GCHQ has access to at least some of the data being collected.

There are a number of strands to this that are worth highlighting.  First we heard that Verizon, a large US cellular provider, was providing phone records to the NSA.  It appeared to be the originating number, destination number and call duration (presumably with the date and time).  Then it emerged that some of the world’s largest internet providers were in on the act.  And of particular worry is the claim that the NSA and FBI had unfettered direct access to these company’s systems.  A claim that the Washington Post has since backtracked (albeit unconvincingly) on.  However, the slides that the world has seen certainly suggest that this level of access exists.

Some have been speculating on how the FBI, NSA and GCHQ can interpret this data.  GigaOM have a post up discussing this topic.  It’s interesting, because this would certainly be one of the largest data stores in the world.

However, people seem to be missing a key point.  The way in which these systems usually work is that they identify entities.  A person, a location, an item etc.  Then, information is built up regarding this entity.  Adding various attributes (e.g. name, DOB for a person) as more and more information is ingested.  It then identifies links between entities, plots changes over time, speculates on relationships and so on.  This information can be added from other sources and ingested in to the system to flesh out the data and the connections.  These can be open data sources, closed data sources, sources that can be bought in or sources that can be built up.  And it doesn’t matter whether that data is structured, unstructured or clean.  These systems can take it all, and understand it all.  Automatically and very quickly.

To these systems, the links are just as important as the content.  This makes Obama’s claim:

When it comes to telephone calls, nobody is listening to your telephone calls. That’s not what this program is about. As was indicated, what the intelligence community is doing is looking at phone numbers and durations of calls. They are not looking at people’s names and they’re not looking at content. But by sifting through this so-called ‘metadata,’ they may identify potential leads with respect to folks who might engage in terrorism. If the intelligence community then actually wants to listen to a phone call, they’ve got to back to a federal judge, just like they would in a criminal investigation.

more than a bit disingenuous.  The content of a phone call can often be gleaned from other information.

These systems aren’t secret.  IBM, for example, market their Identity Insight product (part of infoSphere) to companies and governments around the world.

Identity Insight leverages advanced algorithms specifically optimized to recognize nefarious individuals and organizations in spite of their sophisticated attempts to mask their identity, their unscrupulous relationships, and their activities.

And that’s just IBM.  And more to the point, that’s just what IBM advertises publicly.  I’d be far more worried about companies like BAE Systems who bought Detica Information Intelligence, Norkom Technology and ETI all within the space of a few years.  People watching the industry will not be surprised by these revelations.

This leaves major concerns.  If the NSA, FBI and GCHQ are storing all this information, regardless of how well developed it is, can they store it securely?  Government agencies have, in the past, proved to be incompetent when it comes to secure data storage over the years.  The other question is over how much of this is developed in house?  Are the NSA, FBI or GCHQ using companies to work with this data?  Are they privy to information that would give them a competitive advantage?  Are they able to securely store and process this information?

There are more questions than answers at this point (in particular, why just the US and UK, are other governments involved?).  But I’m not sure I want to know the answers.