Open Social Scholarship Annotated Bibliography/Crowdsourcing

Category Overview
Crowdsourcing projects, typically built on information gathered by large groups of unrelated individuals through digital means, are becoming more common in academia. In this category, authors define crowdsourcing and explore most common trends and essential practices (Carletti et al. 2013; Holley 2010; McKinley 2012). Crowdsourcing projects in the digital humanities typically engage participant contribution by adding to existing resources or creating new ones, especially in terms of charting, locating, sharing, revising, documenting, and enriching materials (Carletti et al. 2013). Some exemplary projects are included, such as the Transcribe Bentham project that successfully brings together crowdsourcing and public engagement into a scholarly framework (Causer and Terras 2014), and Prism, a textual markup tool that supports multiple interpretations of text through close reading (Walsh et al. 2014). Authors also propose ways to moderate input from users with unknown reliability (Gosh, Satyen, and McAfee 2011). The category provides a rich snippet of existing crowdsourcing practices and offers suggestions for optimal implementation.

Annotations
** Bradley, Jean-Claude, Robert J. Lancashire, Andrew SID Lang, and Anthony J. Williams. 2009. “The Spectral Game: Leveraging Open Data and Crowd-Sourcing for Education.” Journal of Cheminformatics 1 (9): 1–10. http://link.springer.com/article/10.1186/1758-2946-1-9.
 * Bradley et al. use The Spectral Game to frame their discussion of leveraging open data and crowdsourcing techniques in education. The Spectral Game is a game used to assist in the teaching of spectroscopy in an entertaining manner. It was created by combining open source spectral data, a spectrum-viewing tool, and appropriate workflows, and it delivers these resources through the game medium. The authors evaluate the game in an undergraduate, organic chemistry class, and the authors argue that The Spectral Game demonstrates the importance of open data for remixing educational curriculum.

+ Carletti, Laura, Derek McAuley, Dominic Price, Gabriella Giannachi, and Steve Benford. 2013. “Digital Humanities and Crowdsourcing: An Exploration.” Museums and the Web 2013 Conference. Portland: Museums and the Web. http://mw2013.museumsandtheweb.com/paper/digital-humanities-and-crowdsourcing-an-exploration-4/.
 * Carletti, McAuley, Price, Giannachi, and Benford survey and identify emerging practices in current crowdsourcing projects in the digital humanities. Carletti et al. base their understanding of crowdsourcing on an earlier definition of crowdsourcing as an online, voluntary activity that connects individuals to an initiative via an open call (Estelles-Arolas and Gonzalez-Ladron-de-Guevara 2012). This definition was used to select the case studies for the current research. The researchers found two majors trends in the 36 initiatives included in the study: crowdsourcing projects either use the crowd to (a) integrate/enrich/configure existing resources or (b) create/contribute new resources. Generally, crowdsourcing projects asked volunteers to contribute in terms of curating, revising, locating, sharing, documenting, or enriching materials. The 36 initiatives surveyed were divided into three categories in terms of project aims: public engagement, enriching resources, and building resources.

+ Causer, Tim, Justin Tonra, and Valerie Wallace. 2012. “Transcription Maximized; Expense Minimized? Crowdsourcing and Editing The Collected Works of Jeremy Bentham.” Digital Scholarship in the Humanities (formerly Literary and Linguistic Computing) 27 (2): 119–37. http://dx.doi.org/10.1093/llc/fqs004.
 * Causer, Tonra, and Wallace discuss the advantages and disadvantages of user-generated manuscript transcription using the Transcribe Bentham project as a case study. The intention of the project is to engage the public with the thoughts and works of Jeremy Bentham through creating a digital, searchable repository of his manuscript writings. Causer, Tonra, and Wallace preface this article by setting out five key factors the team hoped to assess in terms of the potential benefits of crowdsourcing: cost effectiveness, exploitation, quality control, sustainability, and success. Evidence from the project showcases the great potential for open access TEI-XML transcriptions in creating a long-term, sustainable archive. Additionally, users reported that they were motivated by a sense of contributing to a greater good and/or recognition. In the experience of Transcribe Bentham, crowdsourcing transcription may not have been the cheapest, quickest, or easiest route; the authors argue, however, that projects with a longer time-scale may find this method both self-sufficient and cost-effective.

+ Causer, Tim, and Melissa Terras. 2014. “Crowdsourcing Bentham: Beyond the Traditional Boundaries of Academic History.” International Journal of Humanities and Arts Computing 8 (1): 46–64. http://dx.doi.org/10.3366/ijhac.2014.0119.
 * Causer and Terras reflect on some of the key discoveries that were made in the Transcribe Bentham crowdsourced initiative. Transcribe Bentham was launched with the intention of demonstrating that crowdsourcing can be used successfully for both scholarly work and public engagement by allowing all types of participants to access and explore cultural material. Causer and Terras note that the majority of the work on Transcribe Bentham was undertaken by a small percentage of users, or “super transcribers.” Only 15% of the users have completed any transcription and approximately 66% of those users have transcribed only a single document—leaving a very select number of individuals responsible for the core of the project’s production. The authors illustrate how some of the user transcription has contributed to our understanding of some of Jeremy Bentham’s central values: animal rights, politics, and prison conditions. Overall, Causer and Terras demonstrate how scholarly transcription undertaken by a wide, online audience can uncover essential material.

Estellés-Arolas, Enrique, and Fernando González-Ladrón-de-Guevara. 2012. “Towards an Integrated Crowdsourcing Definition.” Journal of Information Science 38 (2): 189–200. http://dx.doi.org/10.1177/0165551512437638.
 * Estellés-Arolas and González-Ladrón-de-Guevara present an encompassing definition of crowdsourcing, arguing that the flexibility of crowdsourcing is what makes it challenging to define. They demonstrate that, depending on perspective, researchers can have vastly divergent understandings of crowdsourcing. By conducting a detailed study of current understandings of the practice, Estellés-Arolas and González-Ladrón-de-Guevara form a global definition that facilitates the distinguishing and formalizing of crowdsourcing activities. Using textual analysis, the authors identify crowdsourcing’s three key elements: the crowd, the initiator, and the process. They advance a comprehensive definition that highlights the individuals, tasks, roles, and returns associated with crowdsourcing. They present a verification table, with nine categories, that can be used to determine whether or not an initiative falls into the classification of crowdsourcing. Estellés-Arolas and González-Ladrón-de-Guevara suggest that further research should be done to understand the relationship between crowdsourcing and other associated concepts, such as outsourcing.

+ Franklin, Michael, Donald Kossman, Tim Kraska, Sukrit Ramesh, and Reynold Xin. 2011. “CrowdDB: Answering Queries with Crowdsourcing.” In Proceedings of the 2011 Association for Computing Machinery (ACM) SIGMOD International Conference on Management of Data, 61–72. New York: Association of Computing Machinery.
 * Franklin, Kossman, Kraska, Ramesh, and Xin discuss the importance of including human input in query processing systems due to their limitations in dealing with certain subjective tasks, which often result in inaccurate results. The authors propose using CrowdDB, a system that allows for crowdsourcing input when dealing with incomplete data and subjective comparison cases. The authors discuss the benefits and limitations of having human effort combined with machine processing, and offer a number of suggestions to optimize the workflow. Franklin et al. envision the field of human input combined with computer processing to be an area of rich research due to its improvement of existing models and enablement of new ones.

** Gahran, Amy. 2012. “SeeClickFix: Crowdsourced Local Problem Reporting as Community News.” Knight Digital Media Center. September 19, 2012. http://www.knightdigitalmediacenter.org/blogs/agahran/2012/09/seeclickfix-crowdsourced-local-problem-reporting-community-news.html.
 * Gahran details the benefits of using SeeClickFix, a web-based open access web widget used for illuminating local issues, spurring community discourse, and sparking story ideas. Users can also use it to file public reports on local issues and vote for individual reports when they would like to see a specific issue resolved. The widget allows users to plot locations on a Google Map interface so that users within a geographic area can view a list of individual reports in that area. Having this widget on a site makes it easier to stay aware of community-reported issues and maintain greater engagement with the broader geographic area that the individual or group in question is part of.

Gosh, Aprila, Kale Satyen, and Preston McAfee. 2011. “Who Will Moderate the Moderators? Crowdsourcing Abuse Detection in User-Generated Content.” In EC’11 Proceedings of the 12th ACM Conference on Electronic Commerce, 167–76.
 * Gosh, Kale, and McAfee address the issue of how to moderate the ratings of users with unknown reliability. They propose an algorithm that can detect abusive content and spam, starting with approximately 50% accuracy on the basis of one example of good content, and reaching complete accuracy after a number of entries using machine-learning techniques. They believe that rating each individual contribution is a better approach than rating the users themselves based on their past behaviour, as most platforms do. According to Gosh, Kale, and McAfee, this algorithm may be a stepping-stone in determining more complex ratings by users with unknown reliability.

+ Holley, Rose. 2010. “Crowdsourcing: How and Why Should Libraries Do It?” D-Lib Magazine 16 (3, 4): n.p. http://www.dlib.org/10.1045/march2010–dlib/march10/holley/03holley.html.
 * Holley defines crowdsourcing and makes a number of practical suggestions to assist with launching a crowdsourcing project. She asserts that crowdsourcing uses social engagement techniques to help a group of people work together on a shared, usually significant initiative. The fundamental principle of a crowdsourcing project is that it entails greater effort, time, and intellectual input than is available from a single individual, thereby requiring broader social engagement. Holley’s argument is that libraries are already proficient at public engagement but need to improve how they work toward shared group goals. Holley suggests ten basic practices to assist libraries in successfully implementing crowdsourcing. Many of these recommendations centre on project transparency and motivating users.

** Lampe, Cliff, Robert LaRose, Charles Steinfield, and Kurt DeMaagd. 2011. “Inherent Barriers to the Use of Social Media for Public Policy Informatics.” The Innovation Journal 16 (1): 1–17.
 * Lampe, LaRose, Steinfield and DeMaagd address the barriers to social media use for public policy informatics. For the authors, social media has the potential to foster interactions between policy makers, government officials, and their constituencies. The authors refer to this framework as Governance 2.0, and use AdvanceMichigan as a case study. AdvanceMichigan is a social media implementation designed to crowdsource feedback from stakeholders of Michigan State University Cooperative Extension. This organization approaches the education process in a way that students can apply their knowledge to a range of critical issues, needs, and opportunities. The organization is planning to return to traditional methods for collecting data from stakeholders due to the challenges of crowdsourcing data. The authors conclude with a discussion on how to create compelling technologies tailored to correctly scaled tasks for an audience who are likely to use social media sites.

+ Manzo, Christina, Geoff Kaufman, Sukdith Punjashitkul, and Mary Flanagan. 2015. “‘By the People, For the People’: Assessing the Value of Crowdsourced, User-Generated Metadata.” Digital Humanities Quarterly 9 (1): n.p. http://www.digitalhumanities.org/dhq/vol/9/1/000204/000204.html.
 * Manzo, Kaufman, Punjashitkul, and Flanagan make a case for the usefulness of folksonomy tagging when combined with categorical tagging in crowdsourced projects. The authors open with a defense of categorization by arguing that classification systems reflect collection qualities while allowing for efficient retrieval of materials. However, they admit that these positive effects are often diminished by the use of folksonomy tagging, which promotes self-referential and personal task organizing labels. The authors suggest that a mixed system of folksonomic and controlled vocabularies be put into play in order to maximize the benefits of both approaches while minimizing their challenges. This is demonstrated through an empirical experiment in labeling images from the Leslie Jones Collection of the Boston Public Library, followed by evaluating the helpfulness of the tags.

+ McKinley, Donelle. 2012. “Practical Management Strategies for Crowdsourcing in Libraries, Archives and Museums.” Report for Victoria University of Wellington: n.p. http://nonprofitcrowd.org/wp-content/uploads/2014/11/McKinley-2012–Crowdsourcing-management-strategies.pdf.
 * The purpose of McKinley’s report is to review the literature and theory on crowdsourcing, and to consider how it relates to the research initiatives of libraries, archives, and museums. McKinley begins by claiming that burgeoning digital technologies have contributed to an increase in participatory culture. Furthermore, she argues that this is evinced by the growing number of libraries, archives, and museums that use crowdsourcing. McKinley cites five different categories of crowdsourcing: collective intelligence, crowd creation, crowd voting, crowdfunding, and games. By way of conclusion, McKinley makes the following recommendations for crowdsourcing projects: (a) understand the context and convey the project’s benefits; (b) choose an approach with clearly defined objectives; (c) identify the crowd and understand their motivations; (d) support participation; (e) evaluate implementation.

+ Ridge, Mia. 2013. “From Tagging to Theorizing: Deepening Engagement with Cultural Heritage through Crowdsourcing.” Curator 56 (4): 435–50. http://dx.doi.org/10.1111/cura.12046.
 * Ridge examines how crowdsourcing projects have the potential to assist museums, libraries, and archives with the resource-intensive tasks of creating or improving content about collections. Ridge argues that a well-designed crowdsourcing project aligns with the core values and missions of museums by helping to connect people with culture and history through meaningful activities. Ridge synthesizes several definitions of crowdsourcing to present an understanding of the term as a form of engagement where individuals contribute toward a shared and significant goal through completing a series of small, manageable tasks. Ridge points towards several examples of such projects to illustrate her definition. She argues that scaffolding the project by setting up boundaries and clearly defining activities helps to increase user engagement by making participants feel comfortable completing the given tasks. Ridge sees scaffolding as a key component of mounting a successful crowdsourcing project that offers truly deep and valuable engagement with cultural heritage.

+ Rockwell, Geoffrey. 2012. “Crowdsourcing the Humanities: Social Research and Collaboration.” In Collaborative Research in the Digital Humanities, edited by Marilyn Deegan and Willard McCarty, 135–55. Surrey, England: Ashgate Publishing.
 * Rockwell demonstrates how crowdsourcing can facilitate collaboration by examining two humanities computing initiatives. He exposes the paradox of collaborative work in the humanities by summarizing the “lone ranger” past of the humanist scholar. He asserts that the digital humanities are, conversely, characterized by collaboration because of its requirement for a diverse range of skills. Rockwell views collaboration as an achievable value of digital humanities rather than a transcendent one. He presents case studies of the projects Dictionary and Day in the Life of Digital Humanities to illustrate the limitations and promises of crowdsourcing in the humanities. Rockwell argues that the main challenge of collaboration is the organization of professional scholarship. Crowdsourcing projects provide structured ways to implement a social, counterculture research model that involves a larger community of individuals.

+ Ross, Stephen, Alex Christie, and Jentery Sayers. 2014. “Expert/Crowd-Sourcing for the Linked Modernisms Project.” Scholarly and Research Communication 5 (4): n.p. http://src-online.ca/src/index.php/src/article/view/186/370.
 * Ross, Christie, and Sayers discuss the creation and evolution of the Linked Modernisms Project. The authors demonstrate how the project negotiates the productive study of both individual works and the larger field of cultural modernism through the use of digital, visual, and networked methods. Linked Modernisms employs a four-tier information matrix to accumulate user-generated survey data about modernist materials. Ross, Christie, and Sayers argue that the resulting information allows serendipitous encounters with data and emphasizes discoverability. Linked Modernisms is focused on developing modes of scholarly publication that line up with the dynamic nature of the data and comply with the principles of open access.

+ Walsh, Brandon, Claire Maiers, Gwen Nelly, Jeremy Boggs, and Praxis Program Team. 2014. “Crowdsourcing Individual Interpretations: Between Microtasking And Multitasking.” Digital Scholarship in the Humanities (formerly Literary and Linguistic Computing) 29 (3): 379–86. http://dx.doi.org/10.1093/llc/fqu030.
 * Walsh, Maiers, Nelly, Boggs, et al. track the creation of Prism, an individual text markup tool developed by the Praxis Program at the University of Virginia. Prism was conceived in response to Jerome McGann’s call for textual markup tools that foreground subjectivity as the tool illustrates how different groups of readers engage with a text. Prism is designed to assist with projects that blend two approaches to crowdsourcing: microtasking and macrotasking. The tool balances the constraint necessary for generating productive metadata with the flexibility necessary for facilitating social, negotiable interactions with the textual object. In this way, Prism is poised to redefine crowdsourcing in the digital humanities.

** Wiggins, A., and K. Crowston. 2011. “From Conservation to Crowdsourcing: A Typology of Citizen Science.” In 2011 44th Hawaii International Conference on System Sciences (HICSS), 1–10. https://doi.org/10.1109/HICSS.2011.207.
 * Wiggins and Crowston engage in a discussion of citizen science in terms of the common attributes many projects share, and attempt to provide a theoretical sampling that future citizen science projects may rely on. The authors argue that the majority of scholarship on citizen science is invested in describing the process of integrating volunteers into the various levels of scientific research, without taking into account the macrostructural and sociotechnical factors. They believe that this comes at the expense of crucial design and process management. Wiggins and Crowston identify and discuss five distinct typologies witnessed in various citizen science projects: action, conservation, investigation, virtuality, and education. The authors classify these typologies by major goals and extent to which they are virtual. One of the main motivations for developing these typologies is to describe the existing state of citizen science and to make accessible the necessary conditions for successful citizen science projects.