Big Data

34
Tracey P. Lauriault @TraceyLauriault [email protected] http://www.maynoothuniversity.ie/progcity/ NIRSA, Maynooth University GY610 Mapping, GIS and Critical Spatial Data Week 10: Session 10: Thursday 2.00 4.00 pm Geography Library; Physical Geography Lab Big Data

Transcript of Big Data

Tracey P. Lauriault

@TraceyLauriault

[email protected]

http://www.maynoothuniversity.ie/progcity/

NIRSA, Maynooth University

GY610 Mapping, GIS and Critical Spatial Data

Week 10: Session 10: Thursday 2.00 – 4.00 pm

Geography Library; Physical Geography Lab

Big Data

Plan

1. 4 readings

2. Brainstorm and discuss commonalities and outliers

3. Brainstorm & discuss each paper – definitions, concepts, ideas, conclusions, concerns, dislikes, new ideas...

4. Look at some maps & discuss

5. Do a big data assessment exercise based on Kitchin’s big data definition

6. Introduction to the Programmable City Project

Readings:

Mark Graham and Taylor Shelton, 2013, Geography and the future of big data, big data

and the future of geography, Dialogues in Human Geography 3:255, available at

http://dhg.sagepub.com/content/3/3/255 (5 pages)

Rob Kitchin and Tracey P. Lauriault, 2014, Small data in the era of big data, GeoJournal,

available at http://link.springer.com/article/10.1007%2Fs10708-014-9601-7 (12 pages)

Harvey J. Miller and Michael F. Goodchild, 2014, Data-driven geography, GeoJournal,

available at http://link.springer.com/article/10.1007%2Fs10708-014-9602-6 (12 pages)

Emma Uprichard, Roger Burrows and Simon Parker, 2009, Geodemographic code and the

production of space, Environment and Planning A, Vol. 41:2823-2835, available at

http://www.envplan.com/abstract.cgi?id=a41116 (11 pages)

Commonalities

• Black boxed algorithms

• Predictive governance / predictive

categories / pre crime/ technological

agency / data dictatorships / anticipatory

governance / Post-hegemonic power –

algorithmic!

• Digital ghettoization or balkanization / Data

rich areas / samples / sorting

• Control & Power & humans matter

3 of the 4 papers mentioned

these documents

http://archive.wired.com/science/discoveries/maga

zine/16-07/pb_theory

“There's no reason to cling to our old

ways. It's time to ask: What can science

learn from Google?” (2008)

http://blogs.gartner.com/doug-laney/files/2012/01/ad949-3D-

Data-Management-Controlling-Data-Volume-Velocity-and-

Variety.pdf (2001)

All 4 papers include one of the

other of these

http://dhg.sagepub.com/content/3/3/262.abstract (2013) http://mitpress.mit.edu/books/codespace (2011)

1. Graham & Shelton, 2013, Geography and the

future of big data, big data and the future of

geography

1. Graham & Shelton, 2013, Geography and the

future of big data, big data and the future of

geography

• Big Data Characteristics • Volume

• Velocity

• Variety

• Transactional?

• Effects they engender?

• Computational paradigm

• Meme – establishment of truth

• Big Data View of Authors • Discourses, objects, practices

• Views of the world • Measuring, models, algorithms, info

systems...

• Scientisvistic, positivistic and quantitative turn

• Data as facts, validity and objective truth

• End of theory?

• Actors • Technologists

• Journalists

• Venture capitalists

• Private sector

• Geographers?

• Concepts • Data shadows

• Data and algorithmic governance

• Computational approaches

• Augmented space

• Behavioural profiles

• Privacy

• Metadata

• Predictive categories

• Triangulation

• Neutrality of databases and algorithms

• Black box algorithms

• Obfuscation and refraction

• Amplified socio-spatial unevenness

• Data as depoliticizing tool

• Digital ghettoization or balkanization

• Openness, trust, transparency

Graham & Shelton, 2013

• Conclusion • Exposed the promises and perils of big data

• demonstrated the discursive power of big data as a meme

• Opportunity to use big data for social justice, inequality, and relationship with the environment

• But, unevenness of representation, limited opportunities for participation, barriers to research, opaqueness, governance issues and privacy are a concern

• Who is big data serving?

2. Kitchin & Lauriault, 2014, Small data in the era

of big data

2. Kitchin & Lauriault, 2014, Small data in the era

of big data

• Growth • Development of tek, infrastructure, techniques, &

processes,

• embedded into everyday business, social practices & spaces,

• embedded into mobile devices, objects, machines, and systems that are networked,

• social media, online interactions, transactions, data analytics

• Objects • Traffic systems & web cams

• BIMS

• Surveillance & policing systems, biometrics

• Gov. Dbases

• Customer, production & logistic chains

• Data enabled & data producing infrastructures

• Finance & payment systems

• Locative & social media

• Algorithmically controlled cameras, sensors, scanners,

• smart phones,

• clickstreams,

• by-product of networks systems

• Derived data products

• Infrastructure • Catalogues, portals, directories and repositories,

archives

• Cyberinfrastructure – SDI

• standards, protocols and policies

• Assemblage

• Concepts • Small data

• Data rich areas

• Big data analytics

• Ontological characteristics

• Data brokers

• Dataveillance

• Social sorting

• Control creep

• Anticipatory governance

• Augmented

• Monitored

• Regulated

• Assemblage

• Socio-technical systems

• Volunteered or crowdsourced

• Oligoptic view of the world vs gods eye view

• Openly expressed data – swipe cards, sensors

• Exhaust – by products

• Ecological fallacies

• Gamed data

• Curated image of the self

• Streams of data, garden hose, spritzers, white list

• Data storage vs archiving

• Data brokers

• Abductive, deductive, inductive

• Geodemographic segmentation

• Black boxed algorithms

• Data determinism

Kitchin & Lauriault, 2014

• Issues • Big data become more important than

small data

• “Small data mine gold from working on a narrow seam, whereas big data studies seek to extract nuggets through open pit mining”

• Data quality, fidelity, lineage, objective, authenticity, reliability – big data are so large that these no longer matter

• Inexactitude

• Open vs closed

• Replication & validation

• Combining big data with small data

• Data free from theory

• Lack of hypothesis

• Data driven science

• Weak surface analysis vs deep penetrating insight

• Stigmatization and redlining

• Informed consent

• Big data are shaped by: • Field of view/ sampling, location of

devices, settings/parameters, users

• Technology / platform used – produce variance and bias

• Context w/in which generated

• Data ontology

• Regulatory environment

• They capture what is easy to ensnare

• Data Analytics • Struggle with social & context

• Create bigger haystacks

• Do not address big issues well

• Favours memes over masterpieces

• Obscures values

Kitchin & Lauriault

• Conclusions

• Small data will continue to be vital, big and small data will be complementary, small data are the baseline

• Data infrastructures store and disseminate small data

• Scaling, linking, joining, combining big and small data • Small data are exposed to epistemologies of data science (e.g.,

digital humanities)

• Small data combined with big data are influencing the growth of data brokers and profiling

• Pernicious effects of combining: dataveillance, social sorting, control creep and anticipatory governance impinge on privacy, social freedom and have structural consequences on peoples lives

Comparing Small & Big Data

Characteristics Small Data Big Data Attributes of Big Data

Volume Limited to

large Very large Terabytes and pet bytes

Exhaustivity Samples Entire

population

In scope striving toward entire population and

systems n=all

Resolution &

indexicality

Coarse & weak

to tight &

strong

Tight &

strong

As detailed as possible and uniquely indexical in

identification

Relationality Weak to strong Strong Common fields to enable co-joining of datasets

Velocity Slow, Freeze-

framed Fast Real & near-real time

Variety Limited to

wide Wide

Diverse in type, structured and unstructured,

maybe temporally and spatially referenced

Flexible &

Scalable

Low to

middling High Can easily add to and extend, can expand in size

Table compiled by Kitchin from:

Boyd & Crawford 2012, Dodge & Kitchin 2005, Marz & Warren 2012, Mayer-Schonberger & Cukier 2013

3. Miller & Goodchild, 2014, Data-driven geography

3. Miller & Goodchild, 2014, Data-driven geography

• Big Data Characteristics • Volume

• Velocity

• Veracity

• Data capturing technologies • Sensors ground based

• Software

• Location aware tech

• GPS

• Mobile phones

• Surveillance cameras

• In situ sensors – cars, phones, in infrastructure

• Remote sensors – airborne and satellite platforms

• Radiofrequency

• RFID

• Georeference social media & crowdsourcing

• Def: • Predictions are made by mining data for patterns

w/correlation among new data sources and some accurate predictions

4 paradigms in science 1. Empirical science

2. Theoretical science

3. Computational science

4. Data driven science – big data

Tensions 1. Theory driven vs data driven

2. Prediction vs discovery

3. Law seeking vs description seeking

4. Evolution vs revolution

5. From question to sample – from sample to question

Issues: 1. Population not samples

2. Messy not clean

3. Correlation not causation

4 capabilities of abductive reasoning 1. Ability to posit fragments of theory

2. Massive set of knowledge, common sense to domain expertise

3. Means to search to find connections and patterns and potential explanation

4. Complex problem solving – analogy, approximation and guessing

5. Background kn and interesting measures, formalized kn

Miller & Goodchild, 2014

• Big questions • Are theory and

explanation archaic?

• Does data velocity matter?

• Can lack of QC & rigorous sampling be overcome?

• Can we make valid generalizations from serendipitous data collection?

• Can big data data-driven methods lead to significant discoveries?

• Or will we continue to rely on scarce data (small data)?

Sections 1.Theory in data driven geo

• correlation supersedes causation, explanation but not laws.mid range theories, general propositions, long terms big space vs short term small space, nomotheic vs idiographic

2.Approaches to data driven geo • knowledge discovery, data exploration and hypothesis generating, abductive, deductive and inductive reasoning

•Data-driven modelling – general to specific vs specific to general, predictive performance

•Theory may not be possible, data drive the form of the model, complexity, de-skilling

3.Caution with data driven •Formalizing geo kn, spuriousness, truth and understanding, black boxed algorithms, privacy, pre-crime, pre-punishment, data-driven dictatorship

Benefits • Spatial temporal dynamics vs snapshots @ multiple scales

•Mundane & unplanned phenomena captured

•Probable and inconsequential

• Improbable but consequential

Miller & Goodchild

• Conclusion • Most fundamental changes are variety and velocity

in data

• Old issues in new clothes – volume, n, messy data, idiographic vs nomothetic kn

• Big data can inform both geographic kn discovery and spatial modelling – but need to formalize geog kn to clean data and ignore spurious patterns, and to build true and understandable models

• Blackbox of closed systems

• Caution on social implications – predictive governance, avoid data dictatorships and humans need to be part of the decision making process

4. Uprichard, Burrows & Parker, 2009,

Geodemographic code and the production of space

4. Uprichard, Burrows & Parker, 2009,

Geodemographic code and the production of space

• Geodemographic classifications: • the spaces people occupy says something

about the sort of people that live there

• Classes are sets of practices

• Inscriptions

• Embedded in social action and power

• Socially produced

• Have some social meaning about the subjects, esp. name, useful

• Combines national censuses and other data, admin & commercial

• Data used are already pre-classed – contingent, historical, political and cognitive

• Use of statistical knowledge

• credibility

• Tools: • PRIZM

• Acorn

• Mosaic

• Concepts • Social spatial vectors / forms

• Code/space

• Geodemographics as code

• Coded space

• Technological agency

• Algorithmic power

• Technological unconscious

• Automatic production of space

• Software sorted geographies

• Ground truth

• Urban ecology – socio spatial structure

• Ecological determinants

• Clusters, types of spaces, sorting

• Complexity, contingency, contrivance & desirability

• Making hold and being held

• Coded classifications

• Mechanics of method

• Production of reality/space

• Ontological properties of the world

• Self-organizing, Fractal

• Dynamic interaction

• Post-hegemonic power – algorithmic!

• Translation and transduction of space

Uprichard, Burrows & Parker, 2009

• Big Questions: • How code is instantiated,

materialised and constructed via code/space

• Reiterative, transformative or recursive practices of technology

• How are the code that construct coded spaces constructed

• Problematize the contingency in producing spaces on coded classifications

• Who is constructing the code for who?

• Material outcomes of code

• Issues • Making coded space

• Which one becomes useful?

• Who decides what is and not useful?

• Political, and ethical concerns

• Social shaping

• Entrenchment of categories – normalization

• Intrinsic or natural kinds?

• Circularity of measurement

Uprichard, Burrows & Parker

• Conclusion

• If posthegemonic power are algorithmic, and if

algorithms are fundamental to the transduction

of space, then we need to rethink the analysis of

the production of space so that the cultural,

social, political and technical construction of

code becomes a fundamental part of that

process

Exercise

Characteristics Small

Data

Big

Data Census Sensors

Remote

Sensing

Social

Media

Other

Volume Limited

to large

Very

large Very large • •

Exhaustivity Samples Entire

pop. all •

Crucial •

Resolution &

indexicality

Coarse

& weak -

tight &

strong

Tight

&

Stron

g

Individual

ID • ?

Relationality Weak to

strong

Stron

g

Name

address • ?

Velocity

Slow,

Freeze-

framed

Fast

Decennial

quinquen-

nial X

Crucial

Variety Limited

to wide Wide Questions X

One

stream X

Flexible &

Scalable

Low to

middling High

Hard to

change,

fields

fixed time

X ?

The Programmable City

• A European Research Council (ERC) and

Science Foundation of Ireland (SFI) funding

• SH3: Environment and Society

• Led by Dr Rob Kitchin, the Primary Investigator

• Based at the National Institute for Regional and

Spatial Analysis (NIRSA)

• At the National University of Ireland Maynooth

(NUIM)

MIT Press 2011 Sage 2014

Aim of the ERC project is to build off and extend a

decade of work that culminated in

Code/Space book (MIT Press) with a set of detailed empirical

studies

Objective

• to provide:

• an interdisciplinary analysis of the two core

inter-related aspects of the emerging

programmable city:

• (a) Translation: how cities are translated

into code, and

• (b) Transduction: how code reshapes city

life” (Kitchin 2011).

Objectives

How is the city translated into software and data? How do software and data reshape the city?

Translation:

City into Code &

Data

Transduction:

Code & Data

Reshape City

THE CITY CODE & DATA

Discourses, Practices, Knowledge, Models

Mediation, Augmentation, Facilitation, Regulation

ProgCity Research Matrix

Translation:

City into code

Transduction:

Code reshapes city

Understanding

the city (Knowledge)

How are digital data

materially and discursively

supported and processed

about cities and their citizens?

How does software drive public

policy development and

implementation?

Managing

the city (Governance)

How are discourses and practices

of city governance translated

into code?

How is software used to

regulate and govern city life?

Working

in the city (Production)

How is the geography and

political economy of software

production organised?

How does software alter the

form and nature of work?

Living

in the city

(Social Politics)

How is software discursively

produced and legitimated by

vested interests?

How does software transform

the spatiality and spatial

behaviour of individuals?

Kitchin’s Data Assemblage

Attributes Elements Systems of

thought

Modes of thinking, philosophies, theories, models,

ideologies, rationalities, etc.

Forms of

knowledge

Research texts, manuals, magazines, websites,

experience, word of mouth, chat forums, etc.

Finance Business models, investment, venture capital,

grants, philanthropy, profit, etc.

Political

economy

Policy, tax regimes, public and political opinion,

ethical considerations, etc.

Govern-

mentalities /

Legalities

Data standards, file formats, system requirements,

protocols, regulations, laws, licensing, intellectual

property regimes, etc.

Materialities &

infrastructures

Paper/pens, computers, digital devices, sensors,

scanners, databases, networks, servers, etc.

Practices Techniques, ways of doing, learned behaviours,

scientific conventions, etc.

Organisations

& institutions

Archives, corporations, consultants, manufacturers,

retailers, government agencies, universities,

conferences, clubs and societies, committees and

boards, communities of practice, etc.

Subjectivities

& communities

Of data producers, curators, managers, analysts,

scientists, politicians, users, citizens, etc.

Places Labs, offices, field sites, data centres, server farms,

business parks, etc, and their agglomerations

Marketplace

For data, its derivatives (e.g., text, tables, graphs,

maps), analysts, analytic software, interpretations,

etc.

S

yst

em

s of

thought

Locations

• Dublin (Primary City)

• Boston (Secondary City)

• Ottawa/Montreal (Open Data Case Studies)

The Dublin Dashboard includes:

• real-time information

• time-series indicator data

• & interactive maps about all aspects of the city

Benefits: • detailed, up to date intelligence about

the city that aids everyday decision making and fosters evidence-informed analysis.

Freely available data sources:

• Dublin City Council

• Dublinked

• Central Statistics Office

• Eurostat

• government departments

• links to a variety of existing applications

Produced by:

• The Programmable City project

• All-Island research Observatory (AIRO) at Maynooth University

• working with Dublin City Council

Funded by :

• the European Research Council (ERC)

• Science Foundation Ireland (SFI)

Readings:

Mark Graham and Taylor Shelton, 2013, Geography and the future of big data, big data and the future of

geography, Dialogues in Human Geography 3:255, available at http://dhg.sagepub.com/content/3/3/255

(5 pages)

Rob Kitchin and Tracey P. Lauriault, 2014, Small data in the era of big data, GeoJournal, available at

http://link.springer.com/article/10.1007%2Fs10708-014-9601-7 (12 pages)

Harvey J. Miller and Michael F. Goodchild, 2014, Data-driven geography, GeoJournal, available at

http://link.springer.com/article/10.1007%2Fs10708-014-9602-6 (12 pages)

Emma Uprichard, Roger Burrows and Simon Parker, 2009, Geodemographic code and the production of

space, Environment and Planning A, Vol. 41:2823-2835, available at

http://www.envplan.com/abstract.cgi?id=a41116 (11 pages)