How does GCHQ's internet surveillance work?
Learn more about the system for collecting content and metadata, and what
GCHQ can do with it
Ewen MacAskill, Julian Borger, Nick Hopkins, Nick Davies and James Ball
The Guardian, Friday 21 June 2013 12.24 EDT
What is an internet buffer?
In essence, an internet buffer is a little like Sky+, but on an almost
unimaginably large scale. GCHQ, assisted by the NSA, intercepts and collects
a large fraction of internet traffic coming into and out of the UK. This is
then filtered to get rid of uninteresting content, and what remains is
stored for a period of time - three days for content and 30 days for
The result is that GCHQ and NSA analysts have a vast pool of material to
look back on if they are not watching a particular person in real time -
just as you can use TV catch-up services to miss a programme you hadn't
How is it done?
GCHQ appears to have intercepts placed on most of the fibre-optic
communications cables in and out of the country. This seems to involve some
degree of co-operation - voluntary or otherwise - from companies operating
either the cables or the stations at which they come into the country.
These agreements, and the exact identities of the companies that have signed
up, are regarded as extremely sensitive, and classified as top secret. Staff
are instructed to be very careful about sharing information that could
reveal which companies are "special source" providers, for fear of
"high-level political fallout". In one document, the companies are described
as "intercept partners".
How does it operate?
The system seems to operate by allowing GCHQ to survey internet traffic
flowing through different cables at regular intervals, and then
automatically detecting which are most interesting, and harvesting the
information from those.
The documents suggest GCHQ was able to survey about 1,500 of the 1,600 or so
high-capacity cables in and out of the UK at any one time, and aspired to
harvest information from 400 or so at once - a quarter of all traffic.
As of last year, the agency had gone halfway, attaching probes to 200
fibre-optic cables, each with a capacity of 10 gigabits per second. In
theory, that gave GCHQ access to a flow of 21.6 petabytes in a day,
equivalent to 192 times the British Library's entire book collection.
GCHQ documents say efforts are made to automatically filter out UK-to-UK
communications, but it is unclear how this would be defined, or whether it
would even be possible in many cases.
For example, an email sent using Gmail or Yahoo from one UK citizen to
another would be very likely to travel through servers outside the UK.
Distinguishing these from communications between people in the UK and
outside would be a difficult task.
What does this let GCHQ do?
GCHQ and NSA analysts, who share direct access to the system, are repeatedly
told they need a justification to look for information on targets in the
system and can't simply go on fishing trips - under the Human Rights Act,
searches must be necessary and proportionate. However, when they do search
the data, they have lots of specialist tools that let them obtain a huge
amount of information from it: details of email addresses, IP addresses, who
people communicate with, and what search terms they use.
What's the difference between content and metadata?
The simple analogy for content and metadata is that content is a letter, and
metadata is the envelope. However, internet metadata can reveal much more
than that: where you are, what you are searching for, who you are messaging
One of the documents seen by the Guardian sets out how GCHQ defines metadata
in detail, noting that "we lean on legal and policy interpretations that are
not always intuitive". It notes that in an email, the "to", "from" and "cc"
fields are metadata, but the subject line is content. The document also sets
out how, in some circumstances, even passwords can be regarded as metadata.
The distinction is a very important one to GCHQ with regard to the law, the
document explains: "There are extremely stringent legal and policy
constraints on what we can do with content, but we are much freer in how we
can store metadata. Moreover, there is obviously a much higher volume of
content than metadata.
"For these reasons, metadata feeds will usually be unselected - we pull
everything we see; on the other hand, we generally only process content that
we have a good reason to target."
(F)AIR USE NOTICE: All original content and/or articles and graphics in this
message are copyrighted, unless specifically noted otherwise. All rights to
these copyrighted items are reserved. Articles and graphics have been placed
within for educational and discussion purposes only, in compliance with
"Fair Use" criteria established in Section 107 of the Copyright Act of 1976.
The principle of "Fair Use" was established as law by Section 107 of The
Copyright Act of 1976. "Fair Use" legally eliminates the need to obtain
permission or pay royalties for the use of previously copyrighted materials
if the purposes of display include "criticism, comment, news reporting,
teaching, scholarship, and research." Section 107 establishes four criteria
for determining whether the use of a work in any particular case qualifies
as a "fair use". A work used does not necessarily have to satisfy all four
criteria to qualify as an instance of "fair use". Rather, "fair use" is
determined by the overall extent to which the cited work does or does not
substantially satisfy the criteria in their totality. If you wish to use
copyrighted material for purposes of your own that go beyond 'fair use,' you
must obtain permission from the copyright owner. For more information go to:
THIS DOCUMENT MAY CONTAIN COPYRIGHTED MATERIAL. COPYING AND DISSEMINATION IS
PROHIBITED WITHOUT PERMISSION OF THE COPYRIGHT OWNERS.