Skip to Main Content
library logo banner

Finding and reusing research datasets: Using and citing data

Guide on finding secondary data for reserach and identifying suitable data archives for research dataset deposits.

Using and citing third party research data

When using the ideas and words of other authors in your research publications, you are required to observe limitations placed on you by copyright law, and provide the proper attribution to avoid charges of plagiarism. The same is true when you reuse data shared by other researchers and other organisations. 

When using third party research data, you have a responsibility to respect the rights that may be held with other people or organisations, including copyright, sui generis database rights, and moral rights. 

Normally data are provided under licence. The contents of such licences vary but the terms used in the 'commonly used licences' tab are the most common types. If data are supplied without licence information, terms of use or usage agreement, you should interpret this as meaning that the originator has reserved all rights to the data. 

Using data in the public domain

If you use third party data that have been dedicated to the public domain, you do not have to fulfill any particular legal responsibilities in respect to those data. Nevertheless, you are still expected to act honestly regarding them, meaning you should acknowledge the source of the data in your documentation. You should also acknoweldge that you used data in any research outputs arising from them, preferably in the form of a data citation. 

These are the most commonly used licences for research data. If you have a question about the terms of a licence associated with data that you are using please contact us (research-data@bath.ac.uk). 

 

No derivatives

Under this licence you can use and share the data as they stand but you are not allowed to alter or transform them in any way. As with the case of 'all rights reserved', you are unlikely to be able to do much more than verify results that have already been derived from the data. 

Examples: Creative Commons Attribution-NoDerivs (CC BY-ND), Creative Commons Attribution-NonCommercial-NoDerivs (CC BY-NC-ND)

 

Non-commercial 

Some licences forbid data being used for commercial purposes. This is not usually an issue in academic research, but you might come across circumstances where you would not be able to use the data. Examples include undertaking consultancy for an external organisation, applying for a patent, or commercialising your research in some other way. 

Examples: Creative Commons Attribution-NonCommercial (CC BY-NC), Creative Commons Attribution-NonCommercial-NoDerivs (CC BY-NC-ND)

 

Share-Alike or Copyleft

A licence with a Share-Alike or Copyleft requirement allows you to make adaptations to the data, and combine them with data from different sources, but if you share the resulting dataset you must apply the same licence to it. 

Some licences are stricter than others about their Share-Alike or Copyleft conditions. Most allow you to use a later version of the same licence. Some allow you to use a functionally equivalent licence, or have explicit compatibility clauses.  If you have any questions or concerns about licensing data that have been provided with a Share-Alike or copyleft licence please contact us.

Examples: Creative Commons Attribution-ShareAlike (CC BY-SA), Creative Commons Attribution-NonCommercial-ShareAlike (CC BY-NC-SA).

 

Attribution 

Most licences require you to acknowledge that you have used the resource in question, and many require an explicit acknowledgement of the originator or rights holder.  In addition to the licence requirements, you are also expected to acknowledge in your research outputs any third party data underlying your results, preferably in the form of a data citation.

Examples: Creative Commons Attribution (CC BY) and all other examples given above that include  'BY'.

 

Openness

Some licences have terms that explicitly prevent you from locking down the copies or derivations of the data that you share with others. 

Examples: Public Domain Mark (PD), Public Domain Dedication (CC0)

 

Data licence resources

 

The approach you take regarding archiving third party data depends on how you have used the data and what permissions you have been granted. 

If you have used third party data without altering them in a significant way, and they are already available from a third party archive, you do not need to archive them again. Simply cite the original dataset when you publish your results. 

If they are not available from an archive, check with the data originator to see if they plan to archive the data themselves. If not, and you have permission to do so, you should archive your copy. Be sure that you credit the correct creators and rights holders in the archive record, and apply the licence under which you received them. 

If you used a subset of third party data or database, and it would take some effort to extract the same subset again, you should consider archiving your subset. Ensure you have permission to retain and share your copy, credit the original creators and rights holders, and apply the licence under which you received them. 

If you have integrated a third party dataset with other data, check the licence you received it under. If you have permission to share the resulting dataset, archive it, remembering to fulfil all relevant licence terms such as those relating to onward licencing, acknowledgement and preservation of notices. If you do not have permission to share the resulting dataset, archive those components of the dataset you do have the rights or permissions to share, and in the documentation provide full instructions for how to obtain the remaining components and derive the final dataset. 

When you use third party data in your research, you must acknowledge this in the resulting research outputs. Ideally, you would cite the data directly, just as you would an academic paper. If this is not possible there are indirect methods you can use instead. 

 

Formatting a reference to a dataset

University of Bath Harvard style

Smith, M, and Jones, G.R., 2015. Title of dataset. Version 1. University of Bath. Available from: http://doi.org/10. 15125/12345 [Accessed 1 March 2018]. 

You can omit the version number if this is not provided. 

APA

Smith, M., & Jones, G.R. (2015). Title of dataset. [Data set].[insert DOI here].

Chicago

Footnote: Melville Smith and G.R. Jones. Title of dataset (accessed March 1, 2018). [insert DOI here].

Reference list: Smith, Melville, and G.R. Jones. Title of dataset (accessed March 1, 2018). [insert DOI here].

MLA

Smith, Melville, and G.R. Jones. "Title of dataset". University of Bath, 2015. Web. 1 March 2018, [insert DOI here].

No publisher or style manual example for referencing datasets

  • Put the names of the data creators where the authors would go.
  • Put the title of the dataset where the report title would go. 
  • Put the year that the version of the dataset you used was released in the place of publication year. 
  • Put the name of the institution hosting the data, or the name of the data archive, as the publisher. 
  • Include an identifier for the dataset at the end of the reference. Use the DOI if the dataset has one. If the dataset has another identifier, give both the scheme and the identifier, for example Accession E-MTAB-01234. If the dataset has no identifier at all, provide the URL of the archive page that describes it, in the usual way for printing URLs. 

 

Citing datasets indirectly

Some journals are hesitant to include direct citations of datasets and may ask you to remove them. If this happens:

  • Check whether the data originators have published a data paper that you can cite instead; or
  • Include acknowledgment of the dataset as part of your Data Access Statement. Here you should include the name of the data archive, a link to the archive's record of the dataset, and the dataset's identifier if this is not obvious from the link. You should also mention if the dataset has any access restrictions. If the dataset is dynamic, you should include the date and time you accessed it, or the version number if appropriate.  

 

Citing a subset of data

There are two ways that you can approach the citation of a subset of a larger dataset or database: 

  • Cite the whole database or dataset, then provide the information the reader would need to extract the same subset. If there were multiple or complex steps involved, you may need to include this information in the supplementary data section instead. 
  • If you have permission to do so, archive a snapshot of the subset that you actually used. 

 

Further information on data citation