Skip to main content

Working with data: Organising data

Guide on working with data

Image for decorative purposes

On this page you will find information on structuring folders and files, naming folders and files, version control and links to resources for organising your files. 

Overview of organising your data

Good file and folder organisation will help you to locate, identify and retrieve your data quickly and accurately, therefore making it easier to manage your data. To do this you need to: 

  • use folders to sort out your files into a series of meaningful and useful groups
  • use naming conventions to give your files and folders meaningful names according to a consistent pattern

You should establish a file organisation scheme at the start of each project to avoid having to sort out your files retrospectively:

  • if you are new to a group, or are working in a research facility, check whether there is an established procedure to follow
  • if you are working within a research group, it essential that the whole group agrees on a file organisation structure so that everyone can find data within the group's shared storage area
  • if you are working alone, it is still important for you to set up a scheme for yourself. 

Document your file organisation scheme in a 'readme' file, preferably in plain text, and store it at the top level folder for your project where you (or anyone in your group) will be able to access it easily. 

Although these principles are aimed at digital files and folders, it is just as important to organise physical files, folders and other materials in a meaningful, consistent and documented manner. 

Folder structures, file naming conventions and version control

There are many ways of organising your files so think about what makes sense for your research. If you are doing qualitative work you might want to organise your folders by topic, participant group or data collection method; if you are doing experimental work you might want to organise the results into folders by the dat that you did the experiment, or by key experimental condition. 

You can use the following suggestions to help you organise your data:

  • use folders to group files with common properties: think about how you might want to browse your files in the future. 
  • apply meaningful folder names: ensure that you use clear and appropriate folder names that concisely convey the contents of the folder.
  • keep group numbers manageable: if you have too many sub-folders you might find this difficult to navigate but you also don't want to have to look through numerous files within a folder to find the one that you want to use. 
  • structure folders hierarchially: design a folder structure with broad topics at the highest level and then use sub-folders within these. 
  • separate current and completed work: You may find it helpful to move drafts and completed work into separate folders. This will also make it easier to review what you need to keep as you go along. 
  • keep your raw data separately from the data you are working on: keep a 'raw data' file so that you have a copy of the file before any processing has taken place just in case you need to go back to it. 
  • control access at the highest level: It is easier to set access permissions near the top of the folder structure than to control permissions on sub-folders. You will need to ask Computing Services to set access permissions for folders. 

Example of a folder structure. The arrows indicate the contents of that folder organised into subfolders.

Choosing a file or folder naming convention

Naming conventions are rules that allow electronic and physical records to be named in a consistent and logical way.  Use of consistent and meaningful names will enable you to identify and distinguish between similar records, making it easier to find your data.

You can use the following suggestions to decide how to name your files:

  • keep names short but meaningful. If you use abbreviations keep a record of these abbreviations in a 'readme' file so that others can understand them and use them.
  • include dates in the YYYY-MM-DD format. This makes alphabetical order coincide with chronological order. It also helps to prevent confusion with naming files using MM-DD-YYYY rather than DD-MM-YYYY. 
  • avoid using spaces. Use punctuation such as hyphens or underscores to separate words, particularly for files that will be available online.
  • avoid using dots and special characters such as / \ : * ? " < > | as these may be reserved for the operating system. 
  • capture relevant information in file names rather than just relying on the date that they were created. 
  • if you are re-using the same file name repeatedly consider using this name for a folder instead. 
  • if you use personal names be sure that these are not the names of study participants (unless the folders are encrypted) and use family name followed by initial. 

 

Examples of file naming conventions

Here is an example of a file naming convention using the date of file creation, information about the contents of the file and data type:

  • 2018-03-22_Subject-A_Audio.mp3
  • 2018-03-22_Subject-A_Transcript-anonymised.docx
  • 2018-03-22_Subject-A_Transcript-raw.docx
  • 2018-04-08_Subject-B_Audio.mp3
  • 2018-04-08_Subject-B_Transcript-anonymised.docx
  • Interview-plan.docx
  • Readme.rtf
  • Summary.docx

It is important to be able to distinguish between different versions or drafts of your files. Version control can help you to easily identify the current version of your data or document so that you avoid working on older or outdated copies. If you are working with others it can also help to link versions of the data or document to time and author of the change.

There are number of ways that version control can be managed: 

 

File naming

A simple method of version control is to create a duplicate copy and then update the version information to create a unique file or folder name. This is method is appropriate for data or documents where you are not expecting to have numerous versions or to need to keep track of exactly what has changed between one version and another. 

  • if you expect to generate a small number of versions, a single integer at the end of the file name is likely to be sufficient. For example, 2018-08-09_Asthma-rates_v1.xlsx, 2018-08-09_Asthma-rates_v2.xlsx.
  • if you expect to generate a moderate number of versions, a two-integer scheme will be more useful (for example, 1-0, 1-1, 1-2). The first number is used for major revisions and the second for minor revisions. 
  • for more complex files, a three-integer scheme might be needed (for example 1-0-0, 1-0-1, 1-1-0, 2-0-0)
  • if you are working in a group it is useful to keep track of who has made the change to the data, for example: 2018-08-09_Asthma-rates_v1_AN.xlsx and  2018-08-09_Asthma-rates_v1_AB.xlsx

 

Version control tables

These are included within documents and can capture more information than using file naming conventions. Version control tables typically include the new version number, date of the change, person who made the change, and the nature and purpose of the change. 

Version Date Name Summary of change
Version 1.1 2018-08-09 AN Amended body mass index (BMI) categories to include a separate category for a BMI >40
Version 2.0 2018-09-05 AB Added geographical location data for each data subject and updated participant identifiers with agreed new identifier structure.

 

Version control systems

These are automated systems available that can store a repository of files and monitor access to them, logging who has made what change and when. These are essential for the development of software or complex code where updates may be released to users. They are also particularly useful for collaborative work on data or on code. Computing Services provide an institutional GitHub service and there is online guidance on using Git for version control and an online guide to using GitHub. Please contact Computing Services for more information on this service. 

 

Loading ...