Documenting the Now
Reviewed by: Emily Pagano
Review started: March 24, 2021
Review finished: April 29, 2021
Site Link
Data and Sources
- Social media content, primarily from Twitter
- Input from communities identifying their research and archival needs
Processes
- Team creates open-source tools for collecting, archiving, and analyzing social media content ethically
Presentation
- Website including the project aims, news about the project, the tools their team have built, and workshop information for communities to share their needs with the project team and the project team to share and promote the tools they’ve built
Digital Tools Used to Build It
- Code for tools is shared on GitHub, and tech help with their tools is provided via Slack
- DocNow tool uses React, Node, PostgreSQL, and Redis; Hydrator tool is an Electron-based desktop application; Twarc tool is built with Python programming language; Diff Engine tool works with RSS feeds
- Wappalyzer analysis of https://www.docnow.io/: Bootstrap user interface framework; Fastly content delivery network; jQuery JavaScript library; GitHub Pages platform as a service; Ruby on Rails web framework
Languages
- English
Review
Documenting the Now describes itself as both a tool and a community that works to create ways to collect and preserve significant social media content as well as ethical standards for their use. An idea generated in the aftermath of the police killing of Michael Brown in Ferguson, Missouri, on August 9, 2014, the creators recognized that social media platforms like Twitter offer an important look into social movements, but that archiving tweets for preservation poses both ethical and practical challenges. These challenges are how to sort through and collect social media content, and how to do so in a way that respects the content owner’s privacy, consent, and control.
The team behind this project created DocNow, a cloud-based and open-source web application that allows researchers and archivists to easily collect, analyze and preserve tweets as well as any web resources the tweets reference. Through the app, users are able to search Twitter and retrieve a sample of tweets and break them down by their users, hashtags, links, and media, offering a number of entry points for analyzing Twitter conversations. Importantly, DocNow allows users to save searches and download their data in a way that honors content owners’ rights to opt out and to delete their tweets. The app saves only unique tweet identification numbers, not the text of the tweets themselves, and if a content owner deletes a tweet, it is deleted from the central database, and the tweet will not be accessible in the archive.
The DocNow application consists of the following:
- a client side application (React)
- a server side REST API (Node)
- a database (PostgreSQL)
- a messaging queue database (Redis)
Users will need to install Git and Docker to set up a DocNow workstation.
Other tools they’ve developed include Twarc, a tool for archiving Twitter JSON, and Hydrator, an application for “hydrating” Twitter ID datasets by converting them to JSON or CSV.
In addition to these valuable digital tools, Documenting the Now also works to develop standards that are valuable to any project aiming to use social media content as archival material. One of the ways they’ve done this is establishing Social Human Labels. There are two formats, SH-A and SH-C (one for archivists and academics and one for content creators). SH-C labels allow content creators to indicate if and how they want their data to be used and does so in a way that offers more precision than the legal use described in policies of the social media companies themselves. These labels allow these content creators to consent to use of the material, request anonymization or credit, set expiry for content, and more. SH-A labels allow archivists and academics to share contextual information determined in data analysis that may be useful for researchers. These labels include noting whether a social media account is likely a bot and whether a post is an advertisement, as well as content warnings for violent material or hate speech.