Thursday, April 11, 2013

What can we learn from the last 200 million things that happened in the world?

 

What can we learn from the last 200 million things that happened in the

world?

Posted By Joshua Keating Wednesday, April 10, 2013 - 1:43 PM Share

http://ideas.foreignpolicy.com/posts/2013/04/10/what_can_we_learn_from_the_last_200_million_things_that_happened_in_the_world#.UWXAH_m2tuc.twitter

 

 

 

If you've been following the political science geek Twitter/blogosphere for

the last few days, you've probably come across the mysterious acronym GDELT.

The excitement over Global Data on Events, Location, and Tone - to give its

full name -- is understandable. The singularly ambitious project could have

a transformative effect on how we use data to understand and anticipate

political events.

 

Essentially, GDELT is a massive list of important political events that have

happened -- more than 200 million and counting -- identified by who did what

to whom, when and where, drawn from news accounts and assembled entirely by

software. Everything from a riot over food prices in Khartoum, to a suicide

bombing in Sri Lanka, to a speech by the president of Paraguay goes into the

system.

 

Similar event databases have been built for particular regions, and DARPA

has been working along similar lines for the Pentagon with a project known

as ICEWS, but for a publicly accessible program (you can download it here

though you'll need some programming skills to use it) GDELT is unprecedented

in it geographic and historic scale. The database updates with new events

every night following the day's news and while it currently goes back to

1979, its developers are working on adding events going back as far as 1800

according to lead author Kalev Leetaru, a fellow at the University of

Illinois Graduate School of Library and Information Science. (I've

previously written about his work here.)

 

"It's the sheer size," says Leetaru, when asked what makes the project

unique. " And the resolution. It's not just saying an event took place in

Syria. It's saying who did what to whom. It will tell us that it was the

military who attacked Christian civilians in this city on this day. If the

article says it was worshippers who were attacked in their church, that will

all be captured."

 

Events are classified by four different types: material conflict, material

cooperation, verbal conflict, and verbal cooperation. Within those

categories, events are classified using a 300 category taxonomy system

called CAMEO, developed by Penn State's Philip A. Schrodt, to provide detail

on the actors and the action that occurred.

 

For instance, an event like "Students and police fought in the Egyptian

capital" will be coded as "EDU fought COP," and also include the location

and time when the event took place.

 

"We're already hard at work on a new version that will expand this

dramatically, adding everything from disease to different classes of

political transitions, and things like cyberwarfare," says Leetaru, noting

that new types of events have increased in importance in the decade since

the CAMEO system was developed. Geopolitically important financial events

may also soon be included.

 

So what can we do with all this? Well for one thing it could be an extremely

powerful tool for researchers looking to track political events over time,

and even predict them in the future. One early paper by Penn State PhD.

candidate James Yonamine uses GDELT data to track patterns of violence in

different districts of Afghanistan:

 

GDELT could also be used to study political rhetoric, for instance the kind

of statements that politicians make in the run-up to war in order to prime

their citizens for conflict.

 

Leetaru sees even broader applications for researchers using a branch of

mathematics known as complexity theory (closely related to chaos theory) to

identify global patterns in seemingly random human events. "Most datasets

that measure human society, when you plot them out, don't follow these nice

beautiful curves," he says. They're very noisy because they reflect reality.

So mathematical techniques now let us peer through that to say, what are the

underlying patterns we see in all this."

 

Of course, for all the high-tech software behind its creation and its

potentially far-out applications, GDELT is, at its core, a way of

summarizing news coverage, and old fashioned legacy-media news coverage at

that. The sources used to identify events include world news coverage from

Agence France Press, the AP, BBC, Christian Science Monitor, New York Times,

UPI, and the Washington Post, as well as a few more specialized outlets and

Google News. Leetaru notes in his recent paper introducing the project that

the increasing availability of news on the web has led to a "dramatic

increase [of recorded events] since the beginning of the 21st century."

 

This leads to another potential problem: that the frequency of recorded

events in a given region will be less correlated to the actual frequency of

them occurring than the frequency of the international media covering them.

A politically motivated shooting in Syria or the West Bank this month will

probably be recorded. In a rural region of the Congo or Central African

Republic? It's harder to say. 

 

Leetaru says he's looking into supplementing some of the journalistic data

with information from social media driven projects like Ushahidi. "As

quality journalism is under attack from all sectors, whether that's

government stepping up efforts to squelch it or the collapsing economics of

it, we're starting to look at all the citizen journalism that's out there,"

he says. "One of the reasons we're focusing on mainstream journalism is that

social media is a relatively new phenomenon."

 

Leetaru notes that he's been cautious to integrate social media data because

of difficulties with quality control and verification. "Journalism's not

perfect either, but at least there's that professional code of ethic," he

says.

 

So whether or not data kills theory in the social sciences, someone still

needs to get the information in the first place.

 

==========================================

(F)AIR USE NOTICE: All original content and/or articles and graphics in this

message are copyrighted, unless specifically noted otherwise. All rights to

these copyrighted items are reserved. Articles and graphics have been placed

within for educational and discussion purposes only, in compliance with

"Fair Use" criteria established in Section 107 of the Copyright Act of 1976.

The principle of "Fair Use" was established as law by Section 107 of The

Copyright Act of 1976. "Fair Use" legally eliminates the need to obtain

permission or pay royalties for the use of previously copyrighted materials

if the purposes of display include "criticism, comment, news reporting,

teaching, scholarship, and research." Section 107 establishes four criteria

for determining whether the use of a work in any particular case qualifies

as a "fair use". A work used does not necessarily have to satisfy all four

criteria to qualify as an instance of "fair use". Rather, "fair use" is

determined by the overall extent to which the cited work does or does not

substantially satisfy the criteria in their totality. If you wish to use

copyrighted material for purposes of your own that go beyond 'fair use,' you

must obtain permission from the copyright owner. For more information go to:

http://www.law.cornell.edu/uscode/17/107.shtml

 

THIS DOCUMENT MAY CONTAIN COPYRIGHTED MATERIAL. COPYING AND DISSEMINATION IS

PROHIBITED WITHOUT PERMISSION OF THE COPYRIGHT OWNERS.

 

 

 

 

 

 

No comments:

Post a Comment