Data Journalism - Cleaning Data
-
Upload
bahareh-heravi -
Category
Education
-
view
271 -
download
1
Transcript of Data Journalism - Cleaning Data
Data ProfilingAssess current state of your data.
Data CleaningCorrect the issues you found during ‘data profiling’. ���
Exploring data���Checking dataFiltering data���Cleaning data���Reshaping data���Annotating dataLinking data���
Dataset
Powerhouse Museum objects collection
Download from: http://data.freeyourmetadata.org/powerhouse-museum/phm-collection.tsv
Open Refine and load the dataset.
Faceting dataTo select a subset of your data to work on.
To get useful insight into your data.
To apply a transformation to a subset of your data.
Types of Facets���Text facets for text���
Numeric facets for number and dates
Predefined/customised facets
Numeric facets���Numeric facets used for faceting numerical values and ranges.
Examples: Expenditure, crime rate
Advanced data operationsClusteringTransformationsMulti-valued cells Derived columnsSplitting data across columns
Regular ExpressionsGREL (General Refine Expression Language)
ClusteringTo cluster similar (syntactically) items together.
To be used to fix inconsistencies, typos, etc.
Examples in the dataset: Agricultural equipment &Agricultural Equipment
Costume &Costumes
ResourcesUsing OpenRefine by ���Rubben Verborgh and Max De Wilde
http://freeyourmetadata.org/cleanup/
Cleaning Data with Refine, School of Data
The Bastard Book of Regular Expressions by Dan Nguyen
GREL: https://github.com/OpenRefine/OpenRefine/wiki/General-Refine-Expression-Language