Open Social Scholarship Annotated Bibliography/Data Management

Category Overview
Data management concerns effective methods for organizing data and documents through the application of a systematic mechanism. The works included here address metadata, database management, and data visualization (Fear 2011; Hedges, Hasan, and Blanke 2007). Some articles investigate ethical uses of data obtained from research, as well as accountability mechanisms and guidelines to ensure that collected data is properly managed, stored, and preserved (Krier and Strasser 2014; Lewis 2010; Romary 2012; Surkis and Read 2015; Wilson et al. 2011). The resources in this category address what can be done with data collected from research projects and how research can be conducted more efficiently, with specific attention paid to data preservation and curation strategies (Krier and Strasser 2014; Yakel 2007). Overall, the resources address the lifecycle of data management and the necessary infrastructural mechanisms for effective governance of digital information.

Annotations
Akers, Katherine G., and Jennifer Doty. 2013. “Disciplinary Differences in Faculty Research Data Management Practices and Perspectives.” International Journal of Digital Curation 8 (2): 5–26. http://ijdc.net/index.php/ijdc/article/view/263.
 * Akers and Doty conduct a survey on disciplinary differences in faculty research data management practices and perspectives. The authors divide faculty members into four broad research domains: arts and humanities, social sciences, medical sciences, and basic sciences. The percentages of faculty per area are considered, as well as attitudes toward open access data and familiarity with basic terms of data management. The survey also seeks to understand faculty attitudes towards digital documentation and preservation. Both authors worked to create Shibboleth authentication access for Emory University researchers to the DMPTool that walks researchers through the creation of data management plans for grant proposals. The authors also point out that OpenEmory, the current institutional repository, does not warrant further research data development and that more effort could be focused on facilitating the deposit of data in disciplinary repositories or setting up instances of the Dataverse Network. Serious consideration of both similarities and dissimilarities among disciplines can guide academic librarians in the development of a range of data management related services.

Corrall, Sheila, Mary Anne Kennan, and Wasseem Afzal. 2013. “Bibliometrics and Research Data Management Services: Emerging Trends in Library Support for Research.” Library Trends 61 (3): 636–74.
 * Corrall, Kennan, and Afzal analyze current trends in library support for research. Funding bodies are increasing viewing libraries as “bottomless pits” rather than self-evident positive support for researchers, especially as the web becomes more accessible and user friendly (qtd. in Wood, Miller, and Knapp 3). According to the authors, e-research should provide libraries with the impetus to extend their services beyond the material archive. Libraries in the US, such as MIT’s libraries, are quicker to adapt to digital services, and the Association of Research Libraries in 2009 found 21 libraries that already provide infrastructure or support for e-science and another 23 that intend to do so. The authors conducted a questionnaire that asked respondents questions about their organizations, bibliometrics, research data management, and future plans. Corrall, Kennan, and Afzal suggest that academic librarians involved in research support need to understand governmental and institutional research agenda so that they can support strategy and policy development and implementation.

** Crompton, Constance, Cole Mash, and Raymond G. Siemens. 2015. “Playing Well with Others: The Social Edition and Computational Collaboration.” Scholarly and Research Communication 6 (3). http://src-online.ca/src/index.php/src/article/view/111/431.
 * Crompton, Mash, and Siemens study the use of microdata format to include larger groups of researchers and editors working on a digital social edition. They also provide readily parsable data about the content of A Social Edition of the Devonshire Manuscript, the main object of their study. The authors argue that adopting linked data standards allows for an interconnection between texts and virtual collaboration across projects and scholars. Crompton, Mash, and Siemens explain how Resource Descriptions Framework in Attributes (RDFa) is well suited for academic projects and elaborate on the idea of encoding for the Semantic Web. They discuss technical decisions that would shift the focus of the encoder on data entry instead of the technical details of encoding. In their conclusion, the authors suggest that with the RDFa enhancement, A Social Edition of the Devonshire Manuscript will provoke new research questions around the culture and contexts of the Tudor court.

Fear, Kathleen. 2011. “‘You Made It, You Take Care of It’: Data Management as Personal Information Management.” International Journal of Digital Curation 6 (2): 53–77. http://www.ijdc.net/index.php/ijdc/article/view/183.
 * Fear’s article explores data management at the University of Michigan, investigates the factors that have shaped the practices of researchers, and seeks to understand the motives for extending or inhibiting changes in data management practices. She argues that institutions should have an interest in protecting the data of their researchers. For Fear, improving infrastructure for data sharing and accessibility is one way of improving data management standards. She conducts a survey with questions such as whether the researcher believes data to be personal information, how researchers manage their data over the short term, what kind of data management plans are provided when researchers apply for funding, what are the methods for preserving data over the long term, and the extent of their general familiarity with the basics of data management. The study concludes with the observation that data management is part of a continuum of processes that tend to blur together as researchers move from document to document. According to Fear, researchers regard separating data management from other research activities as confusing and counterproductive.

Harth, Andreas, Katja Hose, and Ralf Schenkel, editors. 2014. Linked Data Management. Boca Raton, FL: CRC Press.
 * Harth, Hose, and Schenkel edit an anthology that covers the concept of linked data management. The anthology begins by describing how modern computers still struggle with the idiosyncratic structure and semantics of natural language due to ambiguity. The authors outline many of the key concepts in emerging linked data management systems, including RDF vocabularies and foundational terms such as the Semantic Web. A list of SPARQL and OWL queries are given, and the authors state that the novel Web of Data requires new techniques and ways of thinking about databases, distributed computing, and information retrieval. Topics range from the digital architecture of linked data applications, to the Bigdata RDF Graph database, to different methods of query processing.

Hedges, Mark, Adil Hasan, and Tobias Blanke. 2007. “Management and Preservation of Research Data with iRODS.” In Proceedings of the ACM First Workshop on CyberInfrastructure: Information Management in eScience, 17–22. http://dl.acm.org/citation.cfm?id=1317358.
 * Hedges, Hasan, and Blanke provide recommendations for the management and preservation of research data using the integrated Rule-Oriented Data System (iRODS). iRODS is a recently developed automated, scalable digital preservation tool, equipped with Rule Engine, which allows the system to actively react to events. The Rule Engine allows iRODS data grids to exceed previous limitations through a flexible mechanism for implementing application specific processing. The article provides information on driver requirements for managing large amounts of data, curation and preservation, automation, and transparency, as well as a list of rules used to implement preservation and data management. iRODS is capable of executing rules conditionally and can define multiple rules to implement alternative means towards the same goal simultaneously. The authors conclude that they will continue with an analysis of different preservation strategies and procedures currently followed by the Arts and Humanities Data Service archive in order to increase the automation and reliability of the preservation process.

Henty, Margaret, Belinda Weaver, Simon Bradbury, and Simon Porter. 2008. “Investigating Data Management Practices in Australian Universities.” APSR. http://eprints.qut.edu.au/14549/1/14549.pdf.
 * Henty, Weaver, Bradbury, and Porter. conduct a survey on changing expectations for the provision of data management infrastructure in Australian universities. Most of the respondents are academic staff, with significant postgraduate student participation and a low response rate from emeritus or adjunct professors. The questions asked of respondents are oriented towards researcher awareness of digital data, the types of digital data collected, the sizes of the data selections, the software used for analysis and manipulation of digital assets, and research data management plans. The questions also concern institutional responsibility and structure for data management, such as whether researchers outside the team are allowed to access shared research data, and how the data is accessed and used. Henty et al. compile data from the Queensland University of Technology, the University of Melbourne, and the University of Queensland.

Jackson, Michael, Mario Antonioletti, Bartosz Dobrzelecki, Neil Chue Hong. 2011. “Distributed Data Management with OGSA-DAI.” In Grid and Cloud Database Management, 63-86.
 * Jackson, Antonioletti, Dobrzelecki and Hong outline the OGSA-DAI framework for sharing and managing distributed data. The system can manage and share relational, XML files and RDF triple data. The chapter provides basic definitions of workflows and how they are executed, list markers and how they are used to group outputs, concurrent execution, client requests, and how to access the OGSA-DAI framework. Several graphs and taxonomies are provided to illustrate workflows and workflow execution. The authors suggest that data delivery is slower through web services than direct methods such as FTP and GridFTP, and outline OGSA-DAI’s approach to security, distributed query processing, relational views, interoperability, performance requirements, and a list of related programs. Complete data abstraction is not possible with the program; however, it can be used to build higher-level capabilities and enhance distributed data management.

Jackson, Mike, Mario Antonioletti, Alastair Hume, Tobias Blanke, Gabriel Bodard, Mark Hedges, and Shrija Rajbhandari. 2009. “Building Bridges Between Islands of Data – an Investigation into Distributed Data Management in the Humanities.” In 2009 Fifth IEEE International Conference on E-Science, 33–39.
 * Jackson, Antonioletti, Hume, Blanke, Bodard, Hedges, and Rajbhandrari contribute their research to conference proceedings on digital data management of ancient and classical materials. The islands of data are resources that are separate from larger repositories and often largely inaccessible. The authors discuss the LaQuAT project: an initiative that attempts to link and query ancient texts through cooperation of a group of diverse experts from different institutions. The article describes the databases constructed under the LaQuAT initiative, including Project Volterra, which is a database of Roman legal texts and associated metadata, and the HGV, which is a database of papyrological metadata in relational and TEI XML formats. Problems shared by initiatives such as these are the contamination of data by control characters, which can invalidate XML documents. Database drivers and lack of funds can also pose considerable roadblocks.

Johnston, Lisa, Meghan Lafferty, and Beth Petsan. 2012. “Training Researchers on Data Management: A Scalable, Cross-Disciplinary Approach.” Journal of eScience Librarianship 1 (2): 79–87. http://escholarship.umassmed.edu/jeslib/vol1/iss2/2/.
 * Johnston, Lafferty and Petsan offer advice on how to train researchers on data management through a scalable, cross-disciplinary approach. The authors describe the curriculum, implementation, and results of research data management training offered by the University of Minnesota Libraries. Johnston, Lafferty and Petsan provide a description of Minnesota’s “Creating a Data Management Plan” workshop, which trains university faculty and researchers on the basics of file management, metadata standards, and data accessibility. The research team conducts a survey to understand workshop attendee roles, college affiliations, and the most useful parts of the workshop. The workshop leaders introduced a team-teaching approach that has had an overwhelmingly positive impact on the libraries’ ability to respond to research data management needs.

Jones, Sarah, Alexander Ball, Çuna Ekmekcioglu. 2008. “The Data Audit Framework: A First Step in the Data Management Challenge.” International Journal of Digital Curation 3(2): 112-120. http://ijdc.net/index.php/ijdc/article/view/91.
 * Jones, Ball, and Ekmekcioglu provide a summary of their tool, the Data Audit Framework, which provides organizations with the means to identify, locate, and assess the current management of their research assets. The framework was designed to be applied without dedicated or specialist staff, making librarians suitable auditors for the program. Common issues plaguing data management at the institutional level are storage metadata, lack of awareness of data policy, and a lack of long-term legacy data mechanisms. The authors argue that institutional data policies with guidance on best practices in data creation, management and long term preservation would greatly assist departments in maintaining digital assets. They then provide a list of organizations from which departments can receive advice on best practice and services that can equip postgraduates and department members with the support needed to produce sound data management plans. The Data Audit Framework identifies main data issues, including areas where data is at risk, and helps to develop solutions.

Krier, Laura, and Carly A. Strasser. 2014. Data Management for Libraries: A Lita Guide. Chicago: ALA TechSource.
 * Krier and Strasser’s guide to data management for libraries is intended for libraries that are in the early stages of initializing data management programs at their institutions. The opening chapters provide definitions of data management, different types of research data, as well as curation and lifecycle. The guide contains advice on how to start a new service and point-form questions to help the reader decide what kind of plan works best for their institution. The authors suggest identifying researchers who are receptive to working with the library and request assistance with data management plans or curation services. An overview of descriptive, administrative, and structural metadata is provided, along with an explanation of its role in data management. The differences between storage, preservation, and archiving are discussed, along with definitions of domains and institutional repositories. The authors then briefly describe the preservation process. The final chapters loosely cover access and data governance issues that have caused problems with data management in the past.

Lewis, M.J. 2010. “Libraries and the Management of Research Data.” In Envisioning Future Academic Library Services, edited by S. McKnight, 145-168. London: Facet Publishing.
 * Lewis begins his chapter by asking the rhetorical question of whether managing data is a job for university libraries. He argues that it is part of the role of the university library to help manage data as part of the global research knowledge base; however, the scale of the challenge requires concerted action by a range of stakeholders who are not all necessarily employees of the library, necessarily. Lewis advises that institutions develop several policies for research data management that include developing library workforce data confidence, providing research data advice, developing research data awareness, teaching data literacy to postgraduate students, bringing data into undergraduate research based learning, developing local data curation capacities, identifying required data skills with LIS schools, leading local data policy development, and influencing national data policy. Non-trivial research funding is needed for these initiatives and should be funneled through a primary “pathfinder” phase of two years from major research councils. Lewis concludes with the observation that in order to develop such training, award-bearing programs (Masters level training for data managers and carer data scientists looking to pursue career track positions in data centres), short course accredited provision, and training for data librarians are needed.

Research Data Canada. 2013. “Research Data Canada Response to Capitalizing on Big Data: Towards a Policy Framework for Advancing Digital Scholarship in Canada.” http://www.rdc-drc.ca/wp-content/uploads/Research-Data-Canada-Response-to-the-Tri-Council-Consultation-on-Digital-Scholarship.pdf.
 * Research Data Canada looks at foundational elements for scholarship in Canada: stewardship, coordination of stakeholder engagement, and development of capacity and future funding parameters. The document emphasizes the importance of coordination and the need for it to have clear guidelines and policies in order to achieve exemplary digital scholarship in Canada. The authors suggest that addressing the four following areas would strengthen the paper: long-term data curation, development of data professionals, data generated by government-based research and private research data, and engagement with the international data community. The authors conclude by committing to full engagement in the on-going discussion on behalf of Research Data Canada.

Romary, Laurent. 2012. “Data Management in the Humanities.” ERCIM News 89. April 3, 2012.
 * Romary describes several data management tools in the humanities. The first tool Romary describes is HAL, a multi-disciplinary open access archive for the deposit and circulation of scientific research documents, regardless of publication status. The author then shifts focus to the Digital Research Infrastructure for the Arts and Humanities (DARIAH) project, which aims to create a solid infrastructure to ensure the long-term stability of digital assets and the development of wide range services for the original tools. DARIAH depends on the notion of digital surrogates, which can be metadata records, scanned images, digital photographs, or any kind of extract or transformation of existing data. A unified data landscape for humanities research would stabilize the experience of researchers in circulating their data. Laurent suggests that an adequate licensing policy must be defined to assert the legal conditions under which data assets can be disseminated and researchers involved with projects such as DARIAH need to converse with data providers on how to create a seamless data landscape.

Sakr, Sherif, and Eric Pardede. 2012. Graph Data Management: Techniques and Applications. Hershey, PA: IGI Global.
 * Sakr and Pardede’s anthology of essays on techniques and applications of graph data management covers the use of graphs in the semantic web, social networks, biological networks, protein networks, chemical compounds, and business process models. The mechanisms for the main types of graph queries are prioritized throughout the collection, and authors consider both algorithmic and applied perspectives. The book covers data storage, labeling schemes, data mining, matrix decomposition, and clustering vertices in weighted graphs. The editors claim that the anthology provides a comprehensive perspective on how graph databases can be effectively utilized in different situations.

Schmidt, Albercht, Florian Waas, Martin Kersten, Michael J. Cares, Ioana Manolescu, and Ralph Busse. 2002. “XMark: A Benchmark for XML Data Management.” In Proceedings of the 28th International Conference on Very Large Data Bases, 974–85.
 * Schmidt, Waas, Kersten, Cares, Manolescu, and Busse discuss emerging XMark technology and its role in XML data management. The authors argue that XML is currently in great need of new benchmarks to provide coverage for XML processing. The XMark benchmark features a toolkit for evaluating the retrieval performance of XML stores and query processors. It contains a workload specification, a scalable benchmark document, and a comprehensive set of queries designed to feature natural and intuitive semantics. The authors then provide an outline of XML query processing and related work, database description, hierarchical element structure in XML, as well as benchmark queries. Schmidt et al. conduct an experiment that uses six different systems to measure size/bulk load time ratios operating with XMark. The experiment concludes with an analysis of the essential primitives of XML processing in data management systems. The authors suggest that a W3C standard still needs to be defined and specifications should be updated.

Surkis, Alisa, and Kevin Read. 2015. “Research Data Management.” Journal of the Medical Library Association 103 (3): 154–56.
 * Surkis and Read provide an introductory resource for librarians who have had little or no experience with research data management. Basic concepts are defined, such as the fluidity of data in process and analysis as well as the data lifecycle. The authors suggest that the line between publications and data is blurry, and that data management is essential in making data and publications discoverable. This, they argue, is a central task of the librarian. The authors then recommend the online course, MANTRA: Research Data Management Training, to introduce librarians and researchers to the topic.

Venugopal, Srikumar, Buyya Rajkumar, and Kotagiri Ramamohanarao. 2006. “A Taxonomy of Data Grids for Distributed Data Sharing, Management, and Processing.” ACM Computing Surveys 38 (1): 1–53. http://dl.acm.org/citation.cfm?id=1132955.
 * Venugopal, Buyya, and Ramamohanarao provide a taxonomy of data grids for distributed data sharing, management, and processing. The authors propose that grid computing is a paradigm that proposes aggregating geographically distributed storage and network resources for unified, secure, and pervasive access to their combined capabilities. The study contains a comprehensive discussion on data replication, resource allocation, and scheduling. Venugopal, Buyya, and Ramamohanarao focus on the architecture of data grids, as well as the fundamental requirements of data transport mechanisms, data replication systems, and resource allocation and job scheduling.

Ward, C., L. Freiman, L. Molloy, S. Jones, and K. Snow. 2011. “Making Sense: Talking Data Management with Researchers.” International Journal of Digital Curation 6 (2): 9–17. http://eprints.gla.ac.uk/49201/.
 * Ward, Freiman, Malloy, Jones, and Snow cover the goals and methods of Incremental, a program that identifies institutional requirements for digital research data management and pilots relevant infrastructure projects. The majority of projects piloted are soft infrastructure designed to break down the barriers that information professionals have unintentionally built with the use of specialist terminology. The authors note that researchers organize their data in ad hoc fashion and that a lack of clear file naming practices and version control leads to difficulties when retrieving legacy data later. Language barriers and late starts to digital preservation are both substantial barriers in accessing legacy works and new research. Researchers indicated that they desire diverse, web-based modes of training (online tutorials, videos, and interactive learning resources). The authors argue that collating and repurposing existing guidance, training, and support will be effective in the long run.

Wilson, James A.J., Luis Martinez-Uribe, Michael A. Frazer, and Paul Jeffreys. 2011. “An Institutional Approach to Developing Research Data Management Infrastructure.” International Journal of Digital Curation 6 (2): 274–87. http://ijdc.net/index.php/ijdc/article/view/198.
 * Wilson, Martinez-Uribe, Frazer, and Jeffreys suggest that the University of Oxford needs to develop a centralized institutional platform for managing data through all stages of its life cycle that mirrors the framework of the institution in its highly federated structure. The Bodleian Libraries is currently developing a data repository system (Databank) that promises metadata management and resource discovery services. Researchers are given the role of guiding and validating each strand of data development as projects progress. Institutional data management is favoured over the establishment of national repositories. The authors conclude with the suggestion that data management might be better placed in, or integrated with, cloud based services that are implemented in institutions but do not belong to them.

Yakel, Elizabeth. 2007. “Digital Curation.” OCLC Systems and Services: International Digital Library Perspectives 23 (4): 335–40. http://www.emeraldinsight.com/doi/full/10.1108/10650750710831466.
 * Yakel’s article on digital curation provides an overview of the basic aspects necessary to ensure that digital objects will be maintained, preserved, and available for future use. Yakel remarks that digital curation is becoming an umbrella concept that includes digital preservation, data curation, records management, and digital asset management. She briefly traces the history of the term “digital curation” from its first use in the National Science Foundation’s 2003 report to a later article by Liz Lyon. The Digital Curation Centre in the UK defines digital curation as the maintenance and adding of value to a trusted body of digital information for current and future use. Yakel provides many official definitions of the term for various official organizations. The article concludes by suggesting that the range of diverse definitions of digital curation has brought the scientific, educational, and professional communities together with governmental and private sector organizations.