Facebook’s Cambridge Analytica Scandal: How We Got Here

Like the proverbial straw that broke the camel’s back, some four-year-old data that wasn’t destroyed when it was supposed to be maybe what finally starts to bring internet giant Facebook under the scrutiny of legislators and a previously complacent public. It’s helpful to look back at what happened, and at how it has blown up into an international headline story.

It Might Not Have Been a Breach, But It Was Clearly a Mistake

Until sometime in 2014, Facebook’s default privacy settings allowed you to access information about your friend’s friends. And you could do this not just by browsing; there was an API (programming interface) so it could be done automatically. This was a pretty amazing capability. When tools like Mathematica incorporated it into their development environments, you could look at an extended network of your friends and their friends with only a couple lines of code. It was pretty cool to see who you wound up being only two steps removed from. Used that way, it was mostly harmless.

However, researchers realized they could supercharge this capability by enlisting people into using an application on Facebook, and thus unknowingly giving them access to Facebook’s information about their friends. There is an important distinction here that’s glossed over in many press reports. While the users didn’t explicitly allow access, they almost certainly didn’t realize Facebook’s default privacy settings meant they had granted permission without realizing it. So the information wasn’t technically stolen, or breached (as Facebook is careful to point out). But no one was explicitly asked to provide it, either.

Facebook Took a Cavalier Attitude to User Data

Researchers used this feature to build large datasets, which then allowed them to build profiles of users and begin to characterize them (including skin color, sexual orientation, and political affiliation), all simply based on elements of their Facebook data. Facebook explicitly granted permission to researchers to do this, although with the proviso that the information was not supposed to be sold. In hindsight, this seems nuts. Even if you think it was okay for anyone to have access to that much private data without people’s active consent, there was no real system in place for controlling access or auditing the information.

The Guardian has had some of the best coverage of the unfolding news, including this interview with a former Facebook insider who explains how common this kind of abuse was.

In the case of Cambridge Analytica, according to The New York Times, CA paid $800,000 to a psychology professor, Alexandar Kogan, to create an app to harvest exactly this kind of information. His personality quiz attracted 270,000 users. Thanks to the lax default permissions of Facebook, that meant he could collect not just information about those users, but about their approximately 50 million friends. One clear lesson here, by the way, is Not To Ever Take A Quiz You See On Facebook. Even with tighter privacy settings, you’re just giving away yet more personal data to people you’ve never met with no idea what they’ll use it for.

The next link in the chain is that despite Kogan having told Facebook he was only using the data for research (at least according to Facebook — although they don’t seem to have done much to verify it), Kogan then shared the data with CA for them to use in ad targeting. According to The Guardian, CA used that data to help with the Ted Cruz’s presidential campaign. That disclosure was in 2015, but possibly because Cruz wound up losing, it didn’t generate a lot of national attention.

Facebook reacted by getting CA to promise it had subsequently deleted the data. This seems at least as naive as Facebook’s original trust that the data was only being used for research. It’s a little like asking a bank robber to promise they gave the money back. It was the recent disclosure that the data was not deleted that has lit a large fire under the scandal. In the interim, CA has gained notoriety for its work on the Trump campaign. Connecting the dots, the assumption is being made that the data on 50 million Facebook users was helpful — perhaps instrumental — in the targeted advertising and social media efforts launched by the Trump campaign itself and its supporters.

In fact, it isn’t really clear the data was all that effective. CA CEO Nix bragged about its use in the Trump campaign, but it is quite possible he was just bragging without cause in an effort to make the firm seem cutting-edge to attract new business. The New York Times provides plenty of reasons to be skeptical. Among them are statements from the campaign that more traditional micro-targeting methods were actually more effective, and CA’s executives eventually conceded that their psychographics technology actually didn’t get used in the campaign. We’ll certainly learn more over time, since UK regulators have now raided CA’s offices.

Twitter is still pretty generous with its privacy settings. One line of Mathematica code gives you a graph of Trump’s friends and their friends. The real version has names for each node.

Facebook Was Warned About Data Privacy Years Earlier

It’s not like Facebook didn’t know it needed to be more careful with user data. In 2011 it entered into a consent decree with the FTC where it promised to better enforce user privacy settings, and that included substantial fines for future violations. It is unclear so far whether the CA incident violated that agreement. But it certainly meant that Facebook knew it needed to do better, years before the CA-related data debacle unfolded.

The big picture isn’t really about the CA data or how it was used. It’s about how much of the internet is built around a small number of increasingly large companies consolidating, marketing, using, and abusing our personal data without effective regulation or transparency. CA’s use of Facebook data is merely a tiny window into how that system can be abused, and fail to safeguard our privacy.

Micro-targeting Isn’t New; We’re Under a Microscope Already

Another important takeaway from this story is that there are many ways to manipulate audiences. The ad industry has come a long way from obvious 30-second TV spots. CA itself, and firms that compete with it, like TargetPoint, already cull through massive troves of data to build profiles of individual voters and various groups of voters that can be directly influenced with customized messages. Tech giants like Facebook, Google, Amazon, and Netflix, among many others, do this on a massive scale. Historically, this data has been used primarily for marketing of consumer products and services. But beginning with the 2008 Obama campaign political organizations started to do more precise targeting of their messages using social media and demographics. Over the ensuing decade, the efforts have become more advanced, and have expanded into additional realms of manipulation.

Early Facebook investor Roger McNamee was one of the first to explain that it isn’t necessary to hack Facebook to take this manipulation to new levels. As he points out, Facebook’s business model and platform are designed to allow advertisers to do exactly that. With a few clicks, an advertiser can micro-target by group membership, interests, age, region, income, and more. The only additional element needed was to aim those tools at the sphere of politics. McNamee is further quick to note that, unlike more traditional media, there are virtually no regulations on politically related advertising, posts, or groups on Facebook. In fact, Facebook won’t even admit it’s a media company or publisher. So far, it’s insisting it’s just a platform.

Where Do We Go From Here

First, there will be a much more serious set of hearings in the US and Europe where Facebook executives will get asked some hard questions. Whether that generates new laws or regulations, or what they will look like, is unclear. Personally, I expect the EU, which is already way ahead of the USA in privacy regulations, to be more aggressive than our current federal government.

UPDATE 3/21 1pm Pacific: Facebook CEO Mark Zuckerberg has posted his and Facebook’s response to the situation, including the following: “This was a breach of trust between Kogan, Cambridge Analytica and Facebook. But it was also a breach of trust between Facebook and the people who share their data with us and expect us to protect it. We need to fix that.” He then outlines a series of steps the company is (finally) taking to tighten up the way apps can use your data. This is a small and belated gesture, but is definitely a step in the right direction.

In the meantime, McNamee, former Google Design Ethicist Tristan Harris, and others have formed the Center for Humane Technology to help drive both policy reform and more user-centered software designs. In addition to responding to the specific issues like the misuse of Facebook data, they’re working on the larger problems inherent in the current internet business model of creating addictions to increase profits.

While McNamee still believes in the potential of the Facebook platform, WhatsApp co-founder and Facebook-made billionaire Brian Acton has gone further and launched a #deletefacebook campaign. Whether you want to go that far or not, we think you should make sure you understand your privacy settings, be careful what you share, be skeptical of what you read, and use best practices for protecting your privacy online.

Check out our Explains series for more in-depth coverage of today’s hottest tech topics.

Facebook’s Cambridge Analytica Scandal: How We Got Here

It Might Not Have Been a Breach, But It Was Clearly a Mistake

Facebook Took a Cavalier Attitude to User Data

Facebook Was Warned About Data Privacy Years Earlier

Micro-targeting Isn’t New; We’re Under a Microscope Already

Where Do We Go From Here

Continue reading

Facebook to Settle Cambridge Analytica Suit, Save Zuckerberg From Testifying

Mozilla Pulls Ads From Facebook in Response to Cambridge Analytica Scandal

Facebook: Cambridge Analytica Got Data on 87M Users