Setup for Lesson
Introduction to the Data for this Lesson
The data used in this lesson comes from a project observing a small mammal community in southern Arizona, US. This is part of a project studying the effects of rodents and ants on the plant community that has been running for almost 40 years. The rodents are sampled on a series of 24 plots, with different experimental manipulations controlling which rodents are allowed to access which plots. This is a real dataset that has been used in over 100 publications. It is published at Ecological Archives and can be found on Portal Project Database. This data is open and free to use for research purposes.
For Interest Only: Portal Project Teaching Dataset
The Portal Project Teaching Database is a simplified version of the Portal Project Database designed for teaching. It provides a real world example of life-history, population, and ecological data, with sufficient complexity to teach many aspects of data analysis and management, but with many complexities removed to allow students to focus on the core ideas and skills being taught. The database is currently available in csv, json, and sqlite formats.
The Portal Project Teaching Database’s GitHub repository can be found at: https://github.com/weecology/portal-teachingdb, where suggested changes or additions to this dataset can be requested or contributed. This database is not designed for research as it intentionally removes some of the real-world complexities. The Python code used for converting the original database to this teaching version can be found in create_portal_teach_dataset.py.
CITATION: Ernest, Morgan; Brown, James; Valone, Thomas; White, Ethan P. (2017): Portal Project Teaching Database. Figshare. https://doi.org/10.6084/m9.figshare.1314459.v6
Download Data for OpenRefine Lesson
The Portal Project Teaching Dataset is a real dataset that has been used in over 100 publications. We have simplified it for the purposes of this lesson, but you can download the full dataset (see below for details) and work with it using exactly the same tools we will learn here.
For this lesson, you will need to download the following file (remember where you downloaded the file!):
Data in some of the columns of the above file (e.g. geolocation
, locality
, county
, country
, JSON
) are contrived for the purpose of the lessons and are in no way related to the original dataset.
Install OpenRefine
For this lesson you will need OpenRefine (formerly GoogleRefine) and a web browser. Download the most recent version of OpenRefine for your operating system, then follow the instructions below.
OpenRefine is a Java program that runs locally on your machine (i.e. you are not accessing a remote service on the Internet). OpenRefine for Mac come with embedded Java, on Windows please select Windows kit with embedded Java, on Linux you will need to install Java separately.
Once it is running on your machine, you access it via your browser at the address http://localhost:3333. No Internet connection is needed for this as the programme is running locally.
Windows
- If you have Internet Explorer (or Edge) set as your default web browser, check that you have Firefox or Chrome installed and set either of them as your default browser. OpenRefine runs in your default browser, but may not run correctly in Internet Explorer. You can check how to set your browser as default for Google Chrome or Firefox.
- Unzip the downloaded file into a directory by right-clicking and selecting
Extract...
. Name that directory something like OpenRefine. - Locate
openrefine.exe
in the extracted folder and launch OpenRefine by double-clicking on it. This will launch a command prompt window first. - Wait for OpenRefine to launch in your default Web browser, which is where you will interact with the program. If this does not happen, head to http://localhost:3333 in your Web browser of choice.
Mac
- Check that you have Firefox or Chrome browser installed and set as your default browser. You can check how to set your browser as default for Google Chrome or Firefox.
- Locate the downloaded
.dmg
file and Ctrl-click it. You may get the warning “macOS cannot verify the developer of “OpenRefine.app”. Are you sure you want to open it?” Click ‘Yes’/’Open’ to this. - Drag
OpenRefine.app
into your Applications folder, and Ctrl-click to open it. You may get the warning “macOS cannot verify the developer of “OpenRefine.app”. Are you sure you want to open it?” Click ‘Yes’/’Open’ to this. - Wait for OpenRefine to launch in your default Web browser, which is where you will interact with the program. If this does not happen, head to http://localhost:3333 in your Web browser of choice.
Linux
- This requires Java to be installed on your computer. If you do not already have it, download OpenJDK Java.
- Check that you have Firefox or Chrome browser installed and set as your default browser. You can check how to set your browser as default for Google Chrome or Firefox.
- Unzip the downloaded file into a directory. Go to this directory from terminal and type ./refine to start.
- Wait for OpenRefine to launch in your default Web browser, which is where you will interact with the program. If this does not happen, head to http://localhost:3333 in your Web browser of choice.
Text Editor
A text editor is the piece of software you use to view and write code. If you have a preferred text editor, please use it. Suggestions for text editors are, Notepad++ (Windows), TextEdit (macOS), Gedit (GNU/Linux), GNU Nano, Vim. Alternatively, there are IDE’s (integrated developer environments) that have more features specifically for coding such as VS Code; there are also IDEs specific to languages will be listed in the appropriate section(s) below.