#metoo Digital Media Collection
Reviewed by: Asma N., Faihaa Khan, and Brianna Caszatt
Review started: April 3, 2021
Review finished: May 5, 2021
- Project: https://www.schlesinger-metooproject-radcliffe.org/
- Twitter dataset: https://dataverse.harvard.edu/dataset.xhtml?persistentId=doi:10.7910/DVN/2SRSKJ
- Web archives collection: https://archive-it.org/collections/10866
Data and Sources
- Social media and hashtags related to #metoo, news articles, statements of denial and/or apology, web forum conversations, legislation, lawsuits, statistical studies, Fortune 500 companies’ employment manuals
- All data are based in the United States
- “collection dates: 2017 - ” (per Collection Goal and Scope Statement page)
- 32,071,469 tweets and tweet IDs collected from October 15, 2017, to March 31, 2020, based on 71 hashtags associated with both advocacy and criticism (e.g., #metoo, #timesup, #believewomen, #MeTooLiars, and #metoohucksters), acquired via licensing, Twitter API, and Social Feed Manager; only the IDs are publicly available, not the full text of the tweets, per Twitter Terms of Service
- 1,118 archived website pages collected from 341 different sources between July 10, 2017, and May 6, 2020; pages collected by individuals and by Media Cloud
- Bibliography of sources that shaped the project’s approach and influenced their ethics statement
- Tweet IDs have been collected into a dataset, including metadescription, and made accessible via 33 different files
- Website pages pertaining to the project have been archived with Archive-It and categorized and made searchable by capture date, subject (178 different subject tags defined by the project), publisher, source, language, type (26 different type tags defined by the project), and coverage (state or country if outside of the United States)
- Metadata for bibliography sources were collected and shared via Zotero library
- Public website to share collections, including information on how it can be accessed and under what conditions it can be used, and share the project’s goals and methodology
- Tweet IDs stored and shared via the Harvard Dataverse
- Web pages stored and shared via Archive-It
Digital Tools Used to Build It
- Twitter API, Social Feed Manager, Media Cloud, Archive-It, Zotero
- Websites in English
- Tweets collected primarily in English but also in other languages when the point of origin is the United States
- Archived websites predominantly in English, though a few are in Spanish
The #metoo Digital Media Collection, also known as the #metoo Project, is a digital project created by the Schlesinger Library on the History of Women in America, Radcliffe Institute for Advanced Study, Harvard University. The project’s goal is to chronicle the digital footprint of the movement in the United States and document accompanying political, legal, and social battles pertaining to #metoo. As the team states, the collection aims to:
provide enduring scholarly access to…resources which are now pervasive in our collective consciousness and social media feeds, yet will prove acutely vulnerable in the long-term, as proprietary platforms, individual user-accounts, and the ever-changing landscape of the Web continually transform.
The project began in 2017 during the digital peak of the movement started in 2007 by activist and feminist, Tarana Burke. Its end date is specified as until “activity subsides,” and the website invites users to contact the project team with ideas on new content. However, the latest content on the archival pages dates to May 2020. Whether this means the activity has subsided or the collection process has been disrupted by COVID-19 is unclear.
The project’s homepage features the image of a person with a sign (“#MeToo is so not a fad! It’s a revolution of truth + consequences!”) at the time of our review, which precedes its message on the project’s purpose to collect media relevant to the movement. Although this stipulation isn’t overt, we believe there is a political undertone or signal that we believe is worth noting in a time of born-digital information.
The collection’s data are presented in two ways: a dataset of 32,071,469 tweets and tweet IDs collected based on 71 different hashtags and 1,118 website pages archived with Archive-It. Among its collection items are “denial and/or apology, web-forum conversations” and “employment manuals,” plus hashtags. Most of the hashtags are explicitly relevant to the movement and show markers of identity across race (#MeTooHBCU, Historically Black Colleges and Universities), gender, location, profession, and the overall American climate, for example #IStandWithChristineBlaseyFord. The project does not mention tweet authors by name, only identifying collectees as “supporters'' and “critics.” Though we understand the type of items, the reasoning behind how they chose each of the hashtags—and which ones they have left out—is not clear. Also, they acknowledge that the movement pre-dates the beginning of their collection period by attributing the origins of the hashtag to Burke in 2007, yet they do not explain their choice to leave out data from this period. This is somewhat frustrating given their ethics statement in which they say they are “committed to being transparent about [their] practices.”
The collection has stored all of the tweets, but only the tweet IDs are publicly available to researchers because of Twitter’s Terms of Service. Researchers can download the tweet IDs and use rehydrating software to request the full tweet content from the Twitter API. (They suggest Hydrator tool created by Documenting the Now; see Documenting the Now review in this publication.) However, this is subject to the tweet still being up. Whether researchers may be given access to the full tweet from the archive if it is no longer available from Twitter (if it has been removed by the user or Twitter, or should Twitter cease to exist) is not clear. Nor is it clear how many of the tweet IDs may already be leading to content that is not available outside of the archive.
To preserve web pages the team has deemed relevant to the project’s scope, they’ve used Archive-It, a web-based hosting tool with straightforward features on the user-facing side. Although the wireframes presented feel dated, the end-to-end hosting tool allows for uniformity in the directory system for items and the preservation of web content and resources relevant to a collection for an adjustable period of time. It does require consultation to receive a quote for pricing. Despite the expired copyright, Archive-It protects the #metoo collection site by capturing relevant material and indicating redirections for secondary information featured on a site that may be unaffiliated with the collection and/or its mission. These pages were initially curated manually by the team, before they began using the Media Cloud tool to search and archive sites on a larger scale. What parameters they set with the tool and in their searches, as well as how they’ve determined and organized their tags of each site, are not clear. The sites archived are overwhelmingly in English (1,023), and there is seemingly an over-saturation of Harvard resources—The Harvard Crimson is the largest source with 98 pages, and the Harvard Business Review has 13 pages—with Reddit as the second-largest resource with a mere 27.
Overall, the pattern of web/digital content is central to its collection methods but predicated on those who can access the internet to share their narrative. This can reproduce some of the obstacles the #metoo movement attempts to dismantle, which emphasizes the contradistinction of this project from what Burke organized in 2007 and well into its height in 2017 (and present). Could this be a reason for its expiration in addition to its earlier comments on the temporality of the collection? We believe the viewer should take this aspect and the organization of the collection’s data into account to determine their use of the project’s websites and archive.