Big Data
-
Upload
communication-studies-carleton-university -
Category
Data & Analytics
-
view
38 -
download
0
Transcript of Big Data
Tracey P. Lauriault
@TraceyLauriault
http://www.maynoothuniversity.ie/progcity/
NIRSA, Maynooth University
GY610 Mapping, GIS and Critical Spatial Data
Week 10: Session 10: Thursday 2.00 – 4.00 pm
Geography Library; Physical Geography Lab
Big Data
Plan
1. 4 readings
2. Brainstorm and discuss commonalities and outliers
3. Brainstorm & discuss each paper – definitions, concepts, ideas, conclusions, concerns, dislikes, new ideas...
4. Look at some maps & discuss
5. Do a big data assessment exercise based on Kitchin’s big data definition
6. Introduction to the Programmable City Project
Readings:
Mark Graham and Taylor Shelton, 2013, Geography and the future of big data, big data
and the future of geography, Dialogues in Human Geography 3:255, available at
http://dhg.sagepub.com/content/3/3/255 (5 pages)
Rob Kitchin and Tracey P. Lauriault, 2014, Small data in the era of big data, GeoJournal,
available at http://link.springer.com/article/10.1007%2Fs10708-014-9601-7 (12 pages)
Harvey J. Miller and Michael F. Goodchild, 2014, Data-driven geography, GeoJournal,
available at http://link.springer.com/article/10.1007%2Fs10708-014-9602-6 (12 pages)
Emma Uprichard, Roger Burrows and Simon Parker, 2009, Geodemographic code and the
production of space, Environment and Planning A, Vol. 41:2823-2835, available at
http://www.envplan.com/abstract.cgi?id=a41116 (11 pages)
• Black boxed algorithms
• Predictive governance / predictive
categories / pre crime/ technological
agency / data dictatorships / anticipatory
governance / Post-hegemonic power –
algorithmic!
• Digital ghettoization or balkanization / Data
rich areas / samples / sorting
• Control & Power & humans matter
3 of the 4 papers mentioned
these documents
http://archive.wired.com/science/discoveries/maga
zine/16-07/pb_theory
“There's no reason to cling to our old
ways. It's time to ask: What can science
learn from Google?” (2008)
http://blogs.gartner.com/doug-laney/files/2012/01/ad949-3D-
Data-Management-Controlling-Data-Volume-Velocity-and-
Variety.pdf (2001)
All 4 papers include one of the
other of these
http://dhg.sagepub.com/content/3/3/262.abstract (2013) http://mitpress.mit.edu/books/codespace (2011)
1. Graham & Shelton, 2013, Geography and the
future of big data, big data and the future of
geography
1. Graham & Shelton, 2013, Geography and the
future of big data, big data and the future of
geography
• Big Data Characteristics • Volume
• Velocity
• Variety
• Transactional?
• Effects they engender?
• Computational paradigm
• Meme – establishment of truth
• Big Data View of Authors • Discourses, objects, practices
• Views of the world • Measuring, models, algorithms, info
systems...
• Scientisvistic, positivistic and quantitative turn
• Data as facts, validity and objective truth
• End of theory?
• Actors • Technologists
• Journalists
• Venture capitalists
• Private sector
• Geographers?
• Concepts • Data shadows
• Data and algorithmic governance
• Computational approaches
• Augmented space
• Behavioural profiles
• Privacy
• Metadata
• Predictive categories
• Triangulation
• Neutrality of databases and algorithms
• Black box algorithms
• Obfuscation and refraction
• Amplified socio-spatial unevenness
• Data as depoliticizing tool
• Digital ghettoization or balkanization
• Openness, trust, transparency
Graham & Shelton, 2013
• Conclusion • Exposed the promises and perils of big data
• demonstrated the discursive power of big data as a meme
• Opportunity to use big data for social justice, inequality, and relationship with the environment
• But, unevenness of representation, limited opportunities for participation, barriers to research, opaqueness, governance issues and privacy are a concern
• Who is big data serving?
2. Kitchin & Lauriault, 2014, Small data in the era
of big data
• Growth • Development of tek, infrastructure, techniques, &
processes,
• embedded into everyday business, social practices & spaces,
• embedded into mobile devices, objects, machines, and systems that are networked,
• social media, online interactions, transactions, data analytics
• Objects • Traffic systems & web cams
• BIMS
• Surveillance & policing systems, biometrics
• Gov. Dbases
• Customer, production & logistic chains
• Data enabled & data producing infrastructures
• Finance & payment systems
• Locative & social media
• Algorithmically controlled cameras, sensors, scanners,
• smart phones,
• clickstreams,
• by-product of networks systems
• Derived data products
• Infrastructure • Catalogues, portals, directories and repositories,
archives
• Cyberinfrastructure – SDI
• standards, protocols and policies
• Assemblage
• Concepts • Small data
• Data rich areas
• Big data analytics
• Ontological characteristics
• Data brokers
• Dataveillance
• Social sorting
• Control creep
• Anticipatory governance
• Augmented
• Monitored
• Regulated
• Assemblage
• Socio-technical systems
• Volunteered or crowdsourced
• Oligoptic view of the world vs gods eye view
• Openly expressed data – swipe cards, sensors
• Exhaust – by products
• Ecological fallacies
• Gamed data
• Curated image of the self
• Streams of data, garden hose, spritzers, white list
• Data storage vs archiving
• Data brokers
• Abductive, deductive, inductive
• Geodemographic segmentation
• Black boxed algorithms
• Data determinism
Kitchin & Lauriault, 2014
• Issues • Big data become more important than
small data
• “Small data mine gold from working on a narrow seam, whereas big data studies seek to extract nuggets through open pit mining”
• Data quality, fidelity, lineage, objective, authenticity, reliability – big data are so large that these no longer matter
• Inexactitude
• Open vs closed
• Replication & validation
• Combining big data with small data
• Data free from theory
• Lack of hypothesis
• Data driven science
• Weak surface analysis vs deep penetrating insight
• Stigmatization and redlining
• Informed consent
• Big data are shaped by: • Field of view/ sampling, location of
devices, settings/parameters, users
• Technology / platform used – produce variance and bias
• Context w/in which generated
• Data ontology
• Regulatory environment
• They capture what is easy to ensnare
• Data Analytics • Struggle with social & context
• Create bigger haystacks
• Do not address big issues well
• Favours memes over masterpieces
• Obscures values
Kitchin & Lauriault
• Conclusions
• Small data will continue to be vital, big and small data will be complementary, small data are the baseline
• Data infrastructures store and disseminate small data
• Scaling, linking, joining, combining big and small data • Small data are exposed to epistemologies of data science (e.g.,
digital humanities)
• Small data combined with big data are influencing the growth of data brokers and profiling
• Pernicious effects of combining: dataveillance, social sorting, control creep and anticipatory governance impinge on privacy, social freedom and have structural consequences on peoples lives
Comparing Small & Big Data
Characteristics Small Data Big Data Attributes of Big Data
Volume Limited to
large Very large Terabytes and pet bytes
Exhaustivity Samples Entire
population
In scope striving toward entire population and
systems n=all
Resolution &
indexicality
Coarse & weak
to tight &
strong
Tight &
strong
As detailed as possible and uniquely indexical in
identification
Relationality Weak to strong Strong Common fields to enable co-joining of datasets
Velocity Slow, Freeze-
framed Fast Real & near-real time
Variety Limited to
wide Wide
Diverse in type, structured and unstructured,
maybe temporally and spatially referenced
Flexible &
Scalable
Low to
middling High Can easily add to and extend, can expand in size
Table compiled by Kitchin from:
Boyd & Crawford 2012, Dodge & Kitchin 2005, Marz & Warren 2012, Mayer-Schonberger & Cukier 2013
3. Miller & Goodchild, 2014, Data-driven geography
• Big Data Characteristics • Volume
• Velocity
• Veracity
• Data capturing technologies • Sensors ground based
• Software
• Location aware tech
• GPS
• Mobile phones
• Surveillance cameras
• In situ sensors – cars, phones, in infrastructure
• Remote sensors – airborne and satellite platforms
• Radiofrequency
• RFID
• Georeference social media & crowdsourcing
• Def: • Predictions are made by mining data for patterns
w/correlation among new data sources and some accurate predictions
4 paradigms in science 1. Empirical science
2. Theoretical science
3. Computational science
4. Data driven science – big data
Tensions 1. Theory driven vs data driven
2. Prediction vs discovery
3. Law seeking vs description seeking
4. Evolution vs revolution
5. From question to sample – from sample to question
Issues: 1. Population not samples
2. Messy not clean
3. Correlation not causation
4 capabilities of abductive reasoning 1. Ability to posit fragments of theory
2. Massive set of knowledge, common sense to domain expertise
3. Means to search to find connections and patterns and potential explanation
4. Complex problem solving – analogy, approximation and guessing
5. Background kn and interesting measures, formalized kn
Miller & Goodchild, 2014
• Big questions • Are theory and
explanation archaic?
• Does data velocity matter?
• Can lack of QC & rigorous sampling be overcome?
• Can we make valid generalizations from serendipitous data collection?
• Can big data data-driven methods lead to significant discoveries?
• Or will we continue to rely on scarce data (small data)?
Sections 1.Theory in data driven geo
• correlation supersedes causation, explanation but not laws.mid range theories, general propositions, long terms big space vs short term small space, nomotheic vs idiographic
2.Approaches to data driven geo • knowledge discovery, data exploration and hypothesis generating, abductive, deductive and inductive reasoning
•Data-driven modelling – general to specific vs specific to general, predictive performance
•Theory may not be possible, data drive the form of the model, complexity, de-skilling
3.Caution with data driven •Formalizing geo kn, spuriousness, truth and understanding, black boxed algorithms, privacy, pre-crime, pre-punishment, data-driven dictatorship
Benefits • Spatial temporal dynamics vs snapshots @ multiple scales
•Mundane & unplanned phenomena captured
•Probable and inconsequential
• Improbable but consequential
Miller & Goodchild
• Conclusion • Most fundamental changes are variety and velocity
in data
• Old issues in new clothes – volume, n, messy data, idiographic vs nomothetic kn
• Big data can inform both geographic kn discovery and spatial modelling – but need to formalize geog kn to clean data and ignore spurious patterns, and to build true and understandable models
• Blackbox of closed systems
• Caution on social implications – predictive governance, avoid data dictatorships and humans need to be part of the decision making process
4. Uprichard, Burrows & Parker, 2009,
Geodemographic code and the production of space
• Geodemographic classifications: • the spaces people occupy says something
about the sort of people that live there
• Classes are sets of practices
• Inscriptions
• Embedded in social action and power
• Socially produced
• Have some social meaning about the subjects, esp. name, useful
• Combines national censuses and other data, admin & commercial
• Data used are already pre-classed – contingent, historical, political and cognitive
• Use of statistical knowledge
• credibility
• Tools: • PRIZM
• Acorn
• Mosaic
• Concepts • Social spatial vectors / forms
• Code/space
• Geodemographics as code
• Coded space
• Technological agency
• Algorithmic power
• Technological unconscious
• Automatic production of space
• Software sorted geographies
• Ground truth
• Urban ecology – socio spatial structure
• Ecological determinants
• Clusters, types of spaces, sorting
• Complexity, contingency, contrivance & desirability
• Making hold and being held
• Coded classifications
• Mechanics of method
• Production of reality/space
• Ontological properties of the world
• Self-organizing, Fractal
• Dynamic interaction
• Post-hegemonic power – algorithmic!
• Translation and transduction of space
Uprichard, Burrows & Parker, 2009
• Big Questions: • How code is instantiated,
materialised and constructed via code/space
• Reiterative, transformative or recursive practices of technology
• How are the code that construct coded spaces constructed
• Problematize the contingency in producing spaces on coded classifications
• Who is constructing the code for who?
• Material outcomes of code
• Issues • Making coded space
• Which one becomes useful?
• Who decides what is and not useful?
• Political, and ethical concerns
• Social shaping
• Entrenchment of categories – normalization
• Intrinsic or natural kinds?
• Circularity of measurement
Uprichard, Burrows & Parker
• Conclusion
• If posthegemonic power are algorithmic, and if
algorithms are fundamental to the transduction
of space, then we need to rethink the analysis of
the production of space so that the cultural,
social, political and technical construction of
code becomes a fundamental part of that
process
URLS
• http://www.ethanzuckerman.com/blog/2008/12/26/mapping-infrastructure-and-flow/
• http://atlas.gcrc.carleton.ca/homelessness/intro/intro.xml.html
• http://sikuatlas.ca/cape_dorset_terminology.html
• http://www.floatingsheep.org/
• http://maps.stamen.com/#terrain/12/37.7706/-122.3782
• http://www.dublindashboard.ie/pages/index
Exercise
Characteristics Small
Data
Big
Data Census Sensors
Remote
Sensing
Social
Media
Other
Volume Limited
to large
Very
large Very large • •
Exhaustivity Samples Entire
pop. all •
Crucial •
Resolution &
indexicality
Coarse
& weak -
tight &
strong
Tight
&
Stron
g
Individual
ID • ?
Relationality Weak to
strong
Stron
g
Name
address • ?
Velocity
Slow,
Freeze-
framed
Fast
Decennial
quinquen-
nial X
Crucial
•
Variety Limited
to wide Wide Questions X
One
stream X
Flexible &
Scalable
Low to
middling High
Hard to
change,
fields
fixed time
X ?
The Programmable City
• A European Research Council (ERC) and
Science Foundation of Ireland (SFI) funding
• SH3: Environment and Society
• Led by Dr Rob Kitchin, the Primary Investigator
• Based at the National Institute for Regional and
Spatial Analysis (NIRSA)
• At the National University of Ireland Maynooth
(NUIM)
MIT Press 2011 Sage 2014
Aim of the ERC project is to build off and extend a
decade of work that culminated in
Code/Space book (MIT Press) with a set of detailed empirical
studies
Objective
• to provide:
• an interdisciplinary analysis of the two core
inter-related aspects of the emerging
programmable city:
• (a) Translation: how cities are translated
into code, and
• (b) Transduction: how code reshapes city
life” (Kitchin 2011).
Objectives
How is the city translated into software and data? How do software and data reshape the city?
Translation:
City into Code &
Data
Transduction:
Code & Data
Reshape City
THE CITY CODE & DATA
Discourses, Practices, Knowledge, Models
Mediation, Augmentation, Facilitation, Regulation
ProgCity Research Matrix
Translation:
City into code
Transduction:
Code reshapes city
Understanding
the city (Knowledge)
How are digital data
materially and discursively
supported and processed
about cities and their citizens?
How does software drive public
policy development and
implementation?
Managing
the city (Governance)
How are discourses and practices
of city governance translated
into code?
How is software used to
regulate and govern city life?
Working
in the city (Production)
How is the geography and
political economy of software
production organised?
How does software alter the
form and nature of work?
Living
in the city
(Social Politics)
How is software discursively
produced and legitimated by
vested interests?
How does software transform
the spatiality and spatial
behaviour of individuals?
Kitchin’s Data Assemblage
Attributes Elements Systems of
thought
Modes of thinking, philosophies, theories, models,
ideologies, rationalities, etc.
Forms of
knowledge
Research texts, manuals, magazines, websites,
experience, word of mouth, chat forums, etc.
Finance Business models, investment, venture capital,
grants, philanthropy, profit, etc.
Political
economy
Policy, tax regimes, public and political opinion,
ethical considerations, etc.
Govern-
mentalities /
Legalities
Data standards, file formats, system requirements,
protocols, regulations, laws, licensing, intellectual
property regimes, etc.
Materialities &
infrastructures
Paper/pens, computers, digital devices, sensors,
scanners, databases, networks, servers, etc.
Practices Techniques, ways of doing, learned behaviours,
scientific conventions, etc.
Organisations
& institutions
Archives, corporations, consultants, manufacturers,
retailers, government agencies, universities,
conferences, clubs and societies, committees and
boards, communities of practice, etc.
Subjectivities
& communities
Of data producers, curators, managers, analysts,
scientists, politicians, users, citizens, etc.
Places Labs, offices, field sites, data centres, server farms,
business parks, etc, and their agglomerations
Marketplace
For data, its derivatives (e.g., text, tables, graphs,
maps), analysts, analytic software, interpretations,
etc.
S
yst
em
s of
thought
Locations
• Dublin (Primary City)
• Boston (Secondary City)
• Ottawa/Montreal (Open Data Case Studies)
The Dublin Dashboard includes:
• real-time information
• time-series indicator data
• & interactive maps about all aspects of the city
Benefits: • detailed, up to date intelligence about
the city that aids everyday decision making and fosters evidence-informed analysis.
Freely available data sources:
• Dublin City Council
• Dublinked
• Central Statistics Office
• Eurostat
• government departments
• links to a variety of existing applications
Produced by:
• The Programmable City project
• All-Island research Observatory (AIRO) at Maynooth University
• working with Dublin City Council
Funded by :
• the European Research Council (ERC)
• Science Foundation Ireland (SFI)
Readings:
Mark Graham and Taylor Shelton, 2013, Geography and the future of big data, big data and the future of
geography, Dialogues in Human Geography 3:255, available at http://dhg.sagepub.com/content/3/3/255
(5 pages)
Rob Kitchin and Tracey P. Lauriault, 2014, Small data in the era of big data, GeoJournal, available at
http://link.springer.com/article/10.1007%2Fs10708-014-9601-7 (12 pages)
Harvey J. Miller and Michael F. Goodchild, 2014, Data-driven geography, GeoJournal, available at
http://link.springer.com/article/10.1007%2Fs10708-014-9602-6 (12 pages)
Emma Uprichard, Roger Burrows and Simon Parker, 2009, Geodemographic code and the production of
space, Environment and Planning A, Vol. 41:2823-2835, available at
http://www.envplan.com/abstract.cgi?id=a41116 (11 pages)