Research Data Management
General information on research data management at the University of Calgary
What do you mean by "share"?
We are referring to depositing your data in a publicly accessible data repository; making it available through Open Access.
Data repositories ingest both the raw data as well as metadata providing information describing the data, how it was gathered, how it was organized, who gathered it, and so forth. In turn, these repositories usually assign a DOI (or other unique identifier) and ensure that your data is securely stored, can be found by search engines, and is downloadable by interested parties. Some repositories also provide private working space that you can use to securely store your data, remotely access it, and collaborate with colleagues without making this data publicly available.
Why share your data?
Sharing data is useful to both you and everyone else as well:
- Making data accessible allows others to verify your research
- Available data encourages others to cite your research
- Deposited data is a primary research object and can be cited just like a publication
- Sharing a dataset can lead to new contacts from potential collaborators, funders, and other interested parties
- It provides a securely-stored, authoritative copy of your data that will be easy to find in 10 years (as opposed to data on a USB drive in the bottom of your desk drawer or that your former student has somewhere).
- You never know who or how data you have collected may be useful in other research. By making your data accessible, scholars from other fields that you never would have expected can do great things. Similarly you may benefit from their data.
- Ultimately, sharing your data serves as a public good to academia and future research.
Sharing research data after the project is complete is an avenue that more and more scholars are taking. Some publishers are increasingly requiring authors to make available the data that support their results published in accepted articles:
When sharing is difficult
There are several factors that may inhibit sharing your data:
- Ethics considerations concerning human research subjects or sensitive information, such as breeding grounds of endangered species.
- Data licensing issues where you might have integrated data that you are not licensed to distribute
- Confidentiality agreements where collaborating scholars or organizations do not want data publicly distributed.
Even in these circumstances it may be possible to share parts of the data that are not sensitive.
Also be aware, that if you are worried about others publishing articles based on your data before you are able to, that many repositories offer embargo periods where the data is kept private until you are ready for its release.
Sharing @ UCalgary
We now have a data repository for University of Calgary scholars available at https://dataverse.scholarsportal.info/dataverse/calgary. This dataverse repository provide a variety data-specific features: data-specific metadata template, digital object identifiers (DOI) for every datasets, built-in linking to associated publications, data previews, file versioning, data-specific licenses, and the potential to create custom data use agreements/restrictions.
There are also a wide variety of third party repositories that provide free or inexpensive data repository services to scholars. See the Data Repositories tab above.
There exist a wide variety of data repositories that will provide open access to your data. Many repositories are for specific types of data (e.g., Astronomy, Genetics, Proteins) while others will accept any sort of data.
University of Calgary
Data can be deposited in the University of Calgary's Dataverse . This repository is hosted and maintained by Scholars Portal but datasets within the University of Calgary Dataverse are managed by UCalgary Libraries and Cultural Resources.
Re3data.org provides a comprehensive listings of disciplinary and institutional repositories to host and share research data.
The following is a list of a few, popular popular discipline-specific data repositories:
- Dryad - frequently used for scientific and medical publications
- ICPSR - a repository commonly used for social sciences data
- Figshare - a general purpose repository often used in partnership w/ PLOS publications.
- NCBI Gene - repository for genetic information, managed by the United States' National Center for Biotechnology Information.
- PANGAEA - data from earth and life sciences.
- RCSB Protein Data Bank - data repository for the 3D structures of large biological molecules, including proteins and nucleic acids.
- SIMBAD (Set of Identification Measurements, and Biliography for Astronomical Data) - observational data on astronomical objects that are located outside our solar system.
- SAO/NASA Astrophysics Data System (ADS) - covers data and publications in Astronomy, Astrophysics, and Physics.
- Zenodo - a repository associated with CERN that is open to research outputs from all fields of science.
Your subject librarian can also advise you on available repositories in your field.
Licensing research data
A license determines how others may (and may not) use your data. There are many possibilities, from broad licenses that allow anything to happen, to narrower license that restrict activites and require attribution to the data creator when and wherever the data is used.
The two predominant licenses used for research data are Creative Commons and Open Data Commons. Both are widely used, but Open Data Commons licenses are designed specifically for data. That is the open data commons license account for the following differences between data(bases) and creative content:
- In licensing data(bases) one may need to distinguish between the data(base ) and its contents. For example if a person has a data set containing images the images (i.e., the contents) may need to be licensed separately from the overall data set.
- The distinction between the data(base) and material (content) generated from it (known as “produced works”) — a distinction which is not relevant when licensing “content”. For example, consider using a geospatial database to generate a map (an image). The map is distinct from the database and, as an image, is a classic piece of “content” but is has been generated from that database. This relationship is different from that between the database and a derivative database (e.g. a database created by adding the locations of post offices to the original database).
- The relationship and prominence of derivative works. Data(bases) are unlike content (but similar to code) in having a high level of reuse (as opposed to simple use or redistribution). E.g., “mash-ups” are all about recombining and reusing data. This fact needs to be borne in mind when designing the licence with particular attention paid to the issue of reuse and derivative data(bases) — for example how must derivative material be made available when applying share-alike provisions.
The examples above were selected and paraphrased from http://opendatacommons.org/faq/licenses/.
Open Data Commons (ODC) provides three licensing options:
- ODC Public Domain & Dedication License: imposes no restrictions on the use of your data. Others are free to copy, distribute, and use your work as well as produce works from your data and to modify, transform, and build upon your data.
- ODC Attribution License: allows others to copy, distribute, and adapt the data so long as they properly attribute their use of your data.
- ODC Open Database License: allows others to copy, distribute, and use your work; produce works from your data/database; and to modify, transform, and build upon the data/database, as long as they provide you proper attribution and any new works created are made available under this same license.
What is a DOI?
A digital object identifier is a unique identifier associated with an object. It contains metadata that makes the object much easier to find, and track how it is being cited. You are likely familiar with seeing DOIs as part of a citation for research articles published in recent years.
Which electronic objects benefit from a DOI?
A DOI can be assigned to pretty much any object.
- book chapters
- data and data sets
- research reports
- theses and dissertations
What are the benefits of DOIs?
- Aids discovery: DOIs contain metadata that is picked up by search tools including Google, ORCID, and many others, making it much more visible online.
- Ensures a persistent home: A DOI is a persistent identifier that is available and managed over time. It will stay the same even if the object is renamed, edited, and/or moved.
- Tracks scholarly impact: A DOI allows you to track the number and location of both citing and cited references in the scholarly record. DOIs also help to track altmetrics, for example the number of times an article is tweeted or blogged.
How is the University of Calgary assigning DOIs?
Libraries and Cultural Resources is providing a free service for the University of Calgary community to obtain DOIs for their objects. We have partnered with DataCite Canada, a central registration centre for Canadian DOIs, for this service. DOIs can be assigned to:
- Articles published in our Open Journal System journals
- Digitized images published in ContentDM
- Research data deposited into Dataverse will be assigned a DOI and persistent URL through Scholars Portal
It is also possible for us to issue DOIs for University of Calgary research objects not included in a Library repository. Please contact us for more information.
How can I get started?
To get more information about DOIs, or to request identifiers for your items, please e-mail email@example.com
Datasets require citations just like articles, books, proceedings, etc. This provides credit to the author/producer, verifies the integrity of the content, and helps others find the resource.
A dataset citation includes the same components as any other citation:
- Author(s) / Creator(s)
- Year of publication
- Publisher (for data sets, this is often the archive/repository where it is housed.)
- Edition or version
- Access information (a URL or other persistent identifier).
While standards for data citation are still in flux, DataCite provides a brief guide, and recommends choosing one of the following citation formats:
- Creator (Publication Year): Title. Publisher. Identifier.
- Creator (Publication Year): Title. Version. Publisher. Resource Type. Identifier.
The UK's Digital Curation Centre also provides a guide on citing datasets and linking to publications.
- Last Updated: Nov 8, 2017 12:53 PM
- URL: https://library.ucalgary.ca/guides/researchdatamanagement
- Print Page