#DataRefuge Project "We need to remember That data collection is a social practice” - Bethany Wiggin, founding director of the PPEH

BACKGROUND: The Start of #DataRefuge, Dec 17/2016

...at 1:36 the introduction to the DataRefuge project starts

HEADLINE: Universities host pre-Trump archiving events to save US climate data

Researchers at the University of Toronto and the University of Pennsylvania have organised events to preserve climate and environmental data before Donald Trump becomes US president.

The “guerrilla archiving event” at the Canadian institution took place on 17 December, to coincide with the Internet Archive and California Digital Library’s End of Term 2016, a project that aims to preserve federal government information found on the internet at the end of each presidential term. (Bothwell, 2016) https://www.timeshighereducation.com/news/universities-host-pre-trump-archiving-events-save-us-climate-data

HEADLINE: Canadian "guerrilla" archivists will be assisting a rushed effort to preserve US government climate data.

Environmentalists, climate scientists and academics are collaborating to protect what they view as fragile digital federal records and research.

They want the data saved before Donald Trump takes office.

Eric Holthaus, a meteorologist, journalist and self described "climate hawk" helped launch the emergency archival effort by tweeting out a Google spreadsheet that collects databases people want preserved from federal websites.

Mr Holthaus does not believe the incoming Trump administration will intentionally delete government data.

But he said the "biggest concern is that, either through budget cuts or neglect or tough changing priorities, some data is lost or at least access to it is lost. (n.a,2016) http://www.bbc.com/news/world-us-canada-38324045

That’s where the #datarefuge project comes in. The project, which brings together partners to address climate data’s vulnerability, aims to empower everyone to explore, use and ensure the trustworthiness of data. (Rachel, 2016) http://technical.ly/philly/2016/12/28/creating-a-datarefuge/

Data rescuing hackathons: The volunteer programmers at each event have been writing custom scripts to harvest the bigger, more complicated federal data sets, too. And they’re sharing the scripts with each other.

Large data sets are being organized and uploaded to datarefuge.org, a website based on a version of the open-source data portal software Ckan, customized by Allen. All the various data-rescue hackathons are using the site for data storage, and hope it will act as an alternative repository for pre-Trump federal information during the new administration. (Schangler, 2017) https://qz.com/891201/hackers-were-downloading-government-climate-data-and-storing-it-on-european-servers-as-trump-was-being-inaugurated/

What is data literacy: “the simplest definition of data literacy is the ability to interpret, evaluate, and communicate statistical information.” (databrarians, 2015)


Public BYOD events:

Training is provided to volunteers for Seeding and preference for volunteers with experience for Data Scraping (Scrape and download datasets that cannot be crawled by EoT (For this activity, participants should know a scripting language, such as R or Python, or be able to detect patterns in HTML. An overview of scraping techniques will be provided during the first hour.) (n.a., 2016) http://guides.lib.ucdavis.edu/aiap_events

The event...will begin with a presentation that covers which data sets on federal websites should be copied and how to access them, among other things. Volunteers with coding, programing and data scraping backgrounds will offer guidance throughout the day.

The collected data is kept in multiple locations, according to DataRefuge. (Bothwell, 2016) http://www.sacbee.com/news/local/education/article130196614.html#storylink=cpy

1) End of Term Presidential Harvest 2016

…a collaborative project to preserve public United States Government web sites at the end of the current presidential administration ending January 20, 2017. This harvest is intended to document federal agencies' presence on the World Wide Web during the transition of Presidential administrations and to enhance the existing collections of the partner institutions.

In this collaboration, the partners will structure and execute a comprehensive harvest of the Federal Government .gov (n.a, 2016) http://digital2.library.unt.edu/nomination/eth2016/about/

2) Philadelphia Data Refuge project

The #datarefuge project, led by the PPEH and Penn libraries, has a three-fold mission:

education to promote data literacy.

partnering with End of Term Harvest, a collaborative project to archive internet data from federal sites during this presidential transition, focusing on content that is likely to change during the transfer of power. (PPEHLab, n.a.)

data rescue and creating a set of protocols for upcoming events. http://www.ppehlab.org

3) #DataRescueDavis - where we will work together to backup scientific data related to climate change and the environment that was produced with public funding and is available from federal websites.(#datarescuedavis,2017) http://guides.lib.ucdavis.edu/aiap_events

Vocabulary Terms and Roles:

Education resources:

  • STORYTELLERS - Simplified option for students & teachers who may not want to engage in the technical aspect of seeding, harvesting, bagging. - The Data Refuge project isn't just about making trustworthy copies of data. It's also a hub for storytelling about how that data is used to keep people and places safe and healthy on our changing planet. Click here for options http://www.ppehlab.org/storytelling
  • Google docs - example or submission for citizen science non-tech volunteers - Please note: You don't have to be a "computer geek" to participate - We're looking for local business owners, parents, teachers, students, citizens, media - anyone who is interested in participating - we'll find a spot for you! https://docs.google.com/forms/d/e/1FAIpQLSdR8M6b5jisFfx4GCkHEb6oAgLzNg-pgz89jPqW3z84yJ5jvw/viewform?c=0&w=1
  • Survey Example - for modification and activating schemata in the classroom with students. identify the 10 sets of federal environmental and climate data most important to your research. https://upenn.co1.qualtrics.com/jfe/form/SV_5nVihP9u2JZt7eZ
  • Data Rescue Philly Toolkit - a plethora of tools/resources available to help you host your own Data Resuce event. http://www.ppehlab.org/datarescue-philly-toolkit
  • The Data Scientist’s Toolbox - is continuous monthly intake MOOC that is short and free. About this course: You will get an introduction to the main tools and ideas in the data scientist's toolbox. The course gives an overview of the data, questions, and tools that data analysts and data scientists work with. There are two components to this course. The first is a conceptual introduction to the ideas behind turning data into actionable knowledge. The second is a practical introduction to the tools that will be used in the program like version control, markdown, git, GitHub, R, and Studio. https://www.coursera.org/learn/data-scientists-tools
  • School of Data Modules - The key skills to understand, manage and work with data.The School of Data aims to make your learning experience as tailored as possible through independent learning modules. Learning modules are all stand-alone and can be taken in any order. To make your learning experience easier, we curated modules into a series of courses - with a focus on data basics as well as specific skills.https://schoolofdata.org/courses/#DataFundamentals
  • (Data Fundamentals - The Data Fundamental modules provide a solid overview over the workflow with data guiding you from what data is, to how to make your data tell a story. The courses listed below should be seen as a whole, a quick overview of the elements involved in working with data. ** Including 5 modules in the category ‘Collecting Data on your Smartphones’) https://schoolofdata.org/courses/#DataFundamentals
  • Page Freezer - PageFreezer is a SaaS (Software-as-a-Service) application that enables organizations and corporations of all sizes to permanently preserve their website and social media content in evidentiary quality and then access those archives and replay them as if they were still live.
  • a) Archive any Website, Blog or Social Media account
  • b) Automatic Archiving - PageFreezer uses crawling technology, similar to that of Google, to take snapshots of your website. Archiving is an automated process, saving you time. With dynamically monitoring, new web pages and changes to web pages are noted, so your archive is always up to date. https://www.pagefreezer.com/website-archiving/
  • Git - Git is a technology that “records changes to a file or set of files over time so that you can recall specific versions later” (Chacon:2014)
  • a) Git Hub - https://git-scm.com/doc
  • b) Git Beginner’s Guide for Dummies - https://backlogtool.com/git-guide/en/
  • c) Git Tutorial Links - https://git-scm.com/doc/ext
  • d) Data Librarians’ blog on learning Git - http://dainabouquin.com/learning-git_webdev/
Media Resources

1) Data Refuge Updates - http://www.ppehlab.org/blogposts/2017/1/19/data-refuge-last-day-pre-inauguration-update

1) Podcast Interview with Data Refuge Librarian Laurie Allen http://www.slate.com/articles/podcasts/working/2017/02/working_how_does_librarian_laurie_allen_work.html

2) Official Twitter Account for DataRefuge: https://twitter.com/DataRefuge?lang=en&lang=en

3) Official Twitter Account for Bethany Wiggin - Founding Director @PPEHLab: https://twitter.com/bwiggerson

4) Facebook Page: Penn Program in the Environmental Humanities @ppeh seeks to generate scholarship and engagement in the environmental humanities through innovative projects and partnerships on campus and off.

5) #DataRefuge Hashtags: #datarefuge #datarescue #savethedata #climatedata #sciencenotsilence

Whether you decide to go full-on TECH and connect with community partners to host your own #DataRefuge


Take a piece of #DataRescue and modify it to bring inspiration and collaboration to your classrooms Citizen Scientists on a much smaller scale


Take a personal interest in following the ‘Long Trail’ of Data Refuge

I hope this provided you with a modicum of ‘mind-blow’ and something to ponder as we see Mobile Technology, Citizen Science and Mobile Crowd Sensing exponentially consume our social spheres. I must leave you now to seed and bag the pieces of my mind that were blown and scattered about during this project. May the force of #Data be with you.



