Introduction to OpenRefine
|
OpenRefine is ‘a tool for working with messy data’
OpenRefine works best with data in a simple tabular format
OpenRefine can help you split data up into more granular parts
OpenRefine can help you match local data up to other data sets
OpenRefine can help you enhance a data set with data from other sources
|
Importing data into OpenRefine
|
|
Layout of OpenRefine, Rows vs Records
|
OpenRefine uses rows and columns to display data
Most options to work with data in OpenRefine are accessed through a drop down menu at the top of a data column
When you select an option in a particular column (e.g. to make a change to the data), it will effect all the cells in that column
OpenRefine has a Records mode which links together multiple rows into a single record
Splitting and joining multi-valued cells cleaning the individual values within them
When creating multi-valued cells in your data, choose a separator that will not appear in the data values
|
Faceting and filtering
|
You can use facets and filters to explore your data
You can use facets and filters work with a subset of data in OpenRefine
You can easily correct common data issues from a Facet
|
Transformations - Undo and Redo
|
|
Clustering
|
Clustering is a way of finding variant forms of the same piece of data within a dataset (e.g. different spellings of a name)
There are a number of different Clustering algorithms that work in different ways and will produce different results
The best clustering algorithm to use will depend on the data
Using clustering you can replace varying forms of the same data with a single consistent value
|
Working with columns and sorting
|
You can reorder, rename and remove columns in OpenRefine
Sorting in OpenRefine always sorts all rows
The original order of rows in OpenRefine is maintained during a sort until you use the option to Reorder Rows Permanently
|
Introduction to Transformations
|
|
Writing Transformations
|
|
Transforming Strings, Numbers, Dates and Booleans
|
|
Transformations - Handling Arrays
|
|
Transformations - Exporting
|
|
Advanced OpenRefine functions
|
OpenRefine can look up custom URLs to fetch data based on what’s in an OpenRefine project
Such API calls can be custom built, or one can use existing Reconciliation services to enrich data
OpenRefine can be further enhanced by installing extensions
|