Friday, June 21, 2013

How does GCHQ's internet surveillance work?


How does GCHQ's internet surveillance work?


Learn more about the system for collecting content and metadata, and what

GCHQ can do with it


    Ewen MacAskill, Julian Borger, Nick Hopkins, Nick Davies and James Ball


    The Guardian, Friday 21 June 2013 12.24 EDT       

What is an internet buffer?


In essence, an internet buffer is a little like Sky+, but on an almost

unimaginably large scale. GCHQ, assisted by the NSA, intercepts and collects

a large fraction of internet traffic coming into and out of the UK. This is

then filtered to get rid of uninteresting content, and what remains is

stored for a period of time - three days for content and 30 days for



The result is that GCHQ and NSA analysts have a vast pool of material to

look back on if they are not watching a particular person in real time -

just as you can use TV catch-up services to miss a programme you hadn't

heard about.


How is it done?


GCHQ appears to have intercepts placed on most of the fibre-optic

communications cables in and out of the country. This seems to involve some

degree of co-operation - voluntary or otherwise - from companies operating

either the cables or the stations at which they come into the country.


These agreements, and the exact identities of the companies that have signed

up, are regarded as extremely sensitive, and classified as top secret. Staff

are instructed to be very careful about sharing information that could

reveal which companies are "special source" providers, for fear of

"high-level political fallout". In one document, the companies are described

as "intercept partners".


How does it operate?


The system seems to operate by allowing GCHQ to survey internet traffic

flowing through different cables at regular intervals, and then

automatically detecting which are most interesting, and harvesting the

information from those.


The documents suggest GCHQ was able to survey about 1,500 of the 1,600 or so

high-capacity cables in and out of the UK at any one time, and aspired to

harvest information from 400 or so at once - a quarter of all traffic.


As of last year, the agency had gone halfway, attaching probes to 200

fibre-optic cables, each with a capacity of 10 gigabits per second. In

theory, that gave GCHQ access to a flow of 21.6 petabytes in a day,

equivalent to 192 times the British Library's entire book collection.


GCHQ documents say efforts are made to automatically filter out UK-to-UK

communications, but it is unclear how this would be defined, or whether it

would even be possible in many cases.


For example, an email sent using Gmail or Yahoo from one UK citizen to

another would be very likely to travel through servers outside the UK.

Distinguishing these from communications between people in the UK and

outside would be a difficult task.


What does this let GCHQ do?


GCHQ and NSA analysts, who share direct access to the system, are repeatedly

told they need a justification to look for information on targets in the

system and can't simply go on fishing trips - under the Human Rights Act,

searches must be necessary and proportionate. However, when they do search

the data, they have lots of specialist tools that let them obtain a huge

amount of information from it: details of email addresses, IP addresses, who

people communicate with, and what search terms they use.


What's the difference between content and metadata?


The simple analogy for content and metadata is that content is a letter, and

metadata is the envelope. However, internet metadata can reveal much more

than that: where you are, what you are searching for, who you are messaging

and more.


One of the documents seen by the Guardian sets out how GCHQ defines metadata

in detail, noting that "we lean on legal and policy interpretations that are

not always intuitive". It notes that in an email, the "to", "from" and "cc"

fields are metadata, but the subject line is content. The document also sets

out how, in some circumstances, even passwords can be regarded as metadata.


The distinction is a very important one to GCHQ with regard to the law, the

document explains: "There are extremely stringent legal and policy

constraints on what we can do with content, but we are much freer in how we

can store metadata. Moreover, there is obviously a much higher volume of

content than metadata.


"For these reasons, metadata feeds will usually be unselected - we pull

everything we see; on the other hand, we generally only process content that

we have a good reason to target."



(F)AIR USE NOTICE: All original content and/or articles and graphics in this

message are copyrighted, unless specifically noted otherwise. All rights to

these copyrighted items are reserved. Articles and graphics have been placed

within for educational and discussion purposes only, in compliance with

"Fair Use" criteria established in Section 107 of the Copyright Act of 1976.

The principle of "Fair Use" was established as law by Section 107 of The

Copyright Act of 1976. "Fair Use" legally eliminates the need to obtain

permission or pay royalties for the use of previously copyrighted materials

if the purposes of display include "criticism, comment, news reporting,

teaching, scholarship, and research." Section 107 establishes four criteria

for determining whether the use of a work in any particular case qualifies

as a "fair use". A work used does not necessarily have to satisfy all four

criteria to qualify as an instance of "fair use". Rather, "fair use" is

determined by the overall extent to which the cited work does or does not

substantially satisfy the criteria in their totality. If you wish to use

copyrighted material for purposes of your own that go beyond 'fair use,' you

must obtain permission from the copyright owner. For more information go to:









No comments:

Post a Comment