Making Data Accessible for All: A Conference Report

 In recent decades the status of data as a source of biological knowledge has become a defining characteristic of modern research.  Thus, understanding what constitutes data, how data are shared, and what expectations can be placed on open access data are vital questions in modern science.  In order to examine these questions in more detail, a workshop entitled ‘Making Data Accessible to All’ was hosted by the University of Exeter on the 12th and 13th of July 2012.  This workshop was the result of collaboration between the GARNet and Egenis research networks and was organised by Drs Ruth Bastow and Sabina Leonelli from those centres respectively.  The main theme of the workshop was to discuss issues surrounding data donation, use and publication with the aim to produce a series of recommendations about the problems involved in data dissemination in plant sciences.

The 30 participants represented a number of the key stakeholder communities involved in data dissemination, and included researchers and academics, database curators and developers, journal and book publishers, funding bodies, and for-profit data management companies.  The workshop was divided into thematic discussion sessions, spurred by presentations by representatives from these communities.

The workshop opened with an introductory address by Drs Leonelli and Bastow in which they contextualised the need for the workshop and outlined its desired outcomes.  These included discussing issues surrounding data donation and publication; identifying challenges unique to plant science with respect to biological and biomedical research; clarifying the extent of data reuse and the need for curation; and the development of recommendations which could be used to help researchers and inform funders/publishers’ policies. In particular, the presenters emphasised the general challenges of data sharing: how to share data, what to share, and how best to maintain the shared data. These questions, it was noted, should include all types of data (as evidenced by the ‘data pyramid’ model presented by the Royal Society, 2012), and not be limited to curated data bases. This presentation succinctly captured the goals of the workshop and positioned it within the current discussions in the plant science community.

The first session included presentations on the theme ‘Data donation, analysis and use’.  Andrew Millar from the University of Edinburgh discussed issues relating to creating, leveraging and sustaining public data when confronted with uncertain funding. Second, Nick Smirnoff from the University of Exeter discussed the intricacies and problems associated with accessing and using metabolomics data. He emphasised the key role that adequate metadata plays in minimising the problems associated with metabolomics, while also highlighting the critical need for the initiatives aimed at standardizing how data created within these emerging fields of research are interpreted. Jay Moore from the University of Warwick followed with a presentation detailing issues associated with the “bench to web” flow of data in research groups. He discussed various data sharing options, such as wikis and data warehouses. He also elaborated on the SysMO-DB data sharing solution, a web-based platform for finding, sharing and exchanging data, models and processes in systems biology which was designed to support the SysMO consortium. The final presentation in this session was by Jacob Newman from the University of East Anglia, who discussed data sharing using Omero, an image repository which facilitates storage, management, editing and visualisation of images.

The first session was followed by a discussion slot, which focused on ‘how are publicly accessible data being used?’ This lively session raised some pertinent points, most of which concerned the downstream curation of databases and data deposits. A considerable amount of the discussion centred on the possible roles journals played in the curation of data, and the limits of their roles as overseers of the data produced by a community. Extensive discussion was also had on the role, uses, storage and curation of supplementary data, as well as who had the responsibility to review, store and curate these data. This, in turn, led to some probing questions pertaining to the definition of an academic paper in light of the emergence of “data journals” and innovative models for credit in data generation. The discussion closed with a unanimous awareness of the cost of adequate curation for databases, and the need for funding bodies to proactively support the long-term management of the data that is generated through grants.

The second session was themed ‘Curating and publishing data’. Presentations were given by four representatives of major general science and plant science journals: Mary Traynor, editor of the Journal of Experimental Botany; Gilles Jonker, Executive publisher of Agronomy at Elsevier; Ruth Wilson, from the Nature publishing group; and Claire Bird, a senior publisher in the life sciences from Oxford Journals. The session showed many shared concerns and ideas about the role of journals in facilitating easy access to the data associated with journal articles. Presenters recognised the importance of making primary data accessible to increase the transparency of research articles, and for use in further research. At the same time, all the publishers emphasised that journals should not perform the role of major data storage centres. Instead, they suggested they should collaborate closely with independent data repositories to provide access to primary data. The role of journal publication requirements in ensuring researchers data was publically available was also discussed. Whilst many journals required authors’ data to be openly accessible, they also highlighted their sensitivity to the situation of particular research communities. Ruth Wilson brought up the role that journals could play in ensuring that data was citable, and discussed the recent development of data only journals. Presenters debated new challenges that the current emphasis on data accessibility bought up for publishers, such as establishing standards for the peer review of data.

Session three was themed ‘Data curation and management’, and presentations were given by a diverse panel. Mark Hahnel, the founder of Figshare, showed how this data sharing platform provides an immediate and easy way for researchers to make their data ‘citable, sharable, and discoverable’. In order to encourage researchers to share data, Figshare has worked on developing a simple uploading process and provides researches with metrics about the use of their data. Sean May from NASC, the European Arabidopsis stock centre, also discussed the reasons why researchers do and don’t share their data, and how to encourage researchers to be altruistic with data sharing. He pointed out that many researchers are still reluctant to share their data: some do not fulfil promises for data publication, and others bend the rules by establishing short term web pages on which to publish their data. Peter Burlinson, from the BBSRC, outlined the data sharing policies of the Biotechnology and Biosciences Research Council. He emphasised that the BBSRC regarded itself as playing a facilitative role, rather than a prescriptive one, in the sharing of biological data.

The final discussion of the workshop was entitled, ‘The impact of data dissemination on plant science research’. Participants continued to discuss important issues from the workshop including supplementary data and the resistance of researchers to share data. The discussion focused on the responsibilities of various stakeholders, from governments to individual researchers, in ensuring effective data sharing. Several new issues arose during the conversation; including the importance of ensuring early career researchers received adequate training in data management from institutions.

References

The Royal Society. (2012). Science as an Open Enterprise. London

 

Leave a Reply