Library: Working with data: Transcription

What are Transcription Services?

Transcription services convert speech into a written or electronic text document. They can either work on live speech or a recording. Transcription is an important part of qualitative research, enabling interview or focus group recordings to be converted into a transcript that can then be analysed.

Conversion of an interview recording to text also enables de-identification (removal of explicit identifiers) or anonymization (identification is impossible). Anonymous transcripts are easier to work with, as they are no longer covered by data protection legislation and so can be stored without encryption. De-identified transcripts may still contain indirect identifiers and should be treated more carefully depending on the potential for harm. Transcripts can also be archived and published as an output of the research project, enabling other researchers to verify, re-use, and build on research.

Transcription can either be carried out manually by a person listening to the recording and typing out the dialogue, or automated using speech recognition software.

Before starting transcription

In order to create a high-quality, accurate transcript that is suitable for analysis, you should make decisions on the following:

What audio should be transcribed, the entire recording or only a subset?
Is a full transcription required, or only a summary of the key points?
How much time is available to allocate to transcription?
How will the accuracy of the transcripts be established?

The Finnish Social Science Data Archive provides four levels of data transcription to help you determine which are appropriate for your project.

What does it involve?

Use of transcription

Summary transcription

Only key points and topics raised are noted, with a few selected quotations recorded verbatim.

Interpretation is important as the transcriber decides what is transcribed.

Insufficient for in depth analysis or re-use, but can be useful for prioritising full transcription.

Does not enable in-depth analysis or reuse of data.

Basic level transcription

An accurate transcript of the participants' words and any significant expressions of emotion.

Statements or sounds not relevant to the discussion, repeated or cut-off words, fillers ("you know"), and non-lexical sounds ("uh", "ah") are left out.

It can be used when focus is analysing the content of speech.

This is the minimum level required for data sharing and archiving.

Exact transcription

All speech is transcribed verbatim with nothing left out.

Fillers, repeated or cut-off words, and non-lexical sounds are included in the transcription, as are expressions of emotion (laughter, sighs, upset), emphasis, or stress.

Pauses are timed (in seconds) and background noises or disturbances are noted.

Often used when the focus of analysis is expressions and interaction.

Allows for varied and rich reuse of data.

Conversation analysis transcription

Full verbal transcription using standardised notation symbols.

Careful reproduction of colloquial speech patterns.

Includes all words, pauses are timed (in seconds), intonation, volume, word stress, and non-lexical actions (sneezes, breaths, sighs, facial expressions).

The most detailed level of transcription, which represents the conversation event in as much detail as possible, in a textual format.

Allows for varied and rich reuse of data.

Finnish Social Science Data Archive
Qualitative and quantitative social science datasets available in English and Finnish. To search the catalogue click 'Download data from Aila'.

Types of transcription

Human Transcription

Human transcription can mean either conducting the transcription yourself, or outsourcing the work to an external transcription service.

If you conduct the transcription yourself, there are tools available that can help to make this process easier, such as foot pedals to control the recording as you type. There are also online services that provide browser-based tools to assist with your own transcription.

If you outsource the work to someone else, this is typically an external transcription service that specialises in converting audio recordings to structured text.

The length of time necessary should not be underestimated - it can take 4-7 hours to transcribe an hour of audio.

Human Transcription method

Advantages

Disadvantages

Self-transcription

No costs involved.

Equipment available to help within the University.

Fully compliant with data protection legislation.

Can anonymise as you transcribe.

Time consuming.

External Transcription services

Quicker to use than doing it yourself.

More accurate transcripts than automated versions.

Data sharing agreement required prior to data transfer.

Terms and conditions may not be compatible with data protection legislation.

Requires security checks for storage and data transfer.

Automated Transcription

Automated transcription uses speech recognition software to create a transcript that can involve either using embedded features of an existing piece of software, or as a stand-alone online application.

Advantages

Disadvantages

Very quick to use.

Lower costs than human-based services.

Often based in the US, and so not compatible with UK data protection legislation.

Lower accuracy when dealing with fast speech, accents, slang, or audio distortion.

Cannot provide custom transcript formats.

Transcribing your own interviews

There are tools available to help you transcribe your own interviews. These typically focus on how you control playback of the recording, to allow you to focus on typing your transcription.

The AV (Audio Visual) Unit provide transcription pedals to help you control playback of the recording as you transcribe your interview. There is a booking form for equipment.

Alternatively, researchers associated with Psychology can request loan of a foot pedal and access to a limited number of copies of the Express Scribe Software. These are managed by the Department of Psychology. Please contact Susie Martin or Nathan Taylor for further information.

oTranscribe.com is a browser-based tool that allows you to playback your recording within your web-browser, and then capture time stamps within your transcription.

Both the recording and the transcription remain in the browser on your local computer, as long as external browser services aren't used. There is therefore no requirement for a data sharing agreement or consent from participants for use of external services.

The following settings must be used when using oTranscribe:

Use private browsing mode on your browser.
Ensure no auto sign-in to services.
Clear your browser cache when you have completed the transcription.
Disable services such as Grammerly, as this would provide external access to the content of your transcription.
Avoid exporting files to non-enterprise grade cloud storage services (please see the 'Storage of sensitive data' section on the Sensitive data LibGuide)

To extract your transcription, you should avoid using the export function, but instead copy and paste the text and time stamps from within your browser window into a separate text document stored on secure storage.

You can now use the transcription service available through Teams, but please be aware that it will not be entirely accurate as some words can be misunderstood. This transcription is editable. However, only use an institutional Teams account and not a personal account. This is because the university's provision of Teams is enterprise-grade and therefore compliant with the University's data policies and UK Data Protection legislation. For further information, please see the 'Storage of sensitive data' section on the Sensitive data LibGuide.

As Teams can also record video, you may need to disable this feature if it is not needed.

Finally, if you intend to use Teams to record a group session, participants may wish to sign-in with a pseudonym, and not their institutional account, to avoid their name being revealed to participants unnecessarily.

Please seek further advice from the Research Data Service, your Departmental Research Ethics Officer or your supervisor before proceeding with the use of Teams.

Outsourcing transcription to colleagues within the University

It also possible to ask colleagues within the University to transcribe your recordings using the tools described above. These can include staff such as research assistants, or postgraduate students.

If you choose to do this, you do not need a data sharing agreement in place, as the recordings will not leave the University, but you must have obtained ethical consent from participants to demonstrate that their data may be shared with others outside the research group.

Using external transcription services

There are many external companies that provide these services. Which ever one you choose to use, you must have a data sharing agreement in place before you transfer any personal data to third parties outside the University.

When choosing a transcription service there are a number of things that you need to check before you transfer any identifiable or confidential data:

Location of the service: The service should be based in the UK, and data must not be transferred to any servers outside the EU. Many online services are based in the US, or store their data on servers outside the European Economic Area, and so would not be compliant with data protection legislation.
That the upload and download file transfers are secure. Encrypted protocols should be used, and links for both file upload and file download must be password protected.
That data will be held by the organisation securely.
That the terms and conditions for use are compatible with the University's data sharing agreement.
That a data sharing agreement is in place, before transfer of any identifiable data.

Data sharing agreements

A data sharing agreement ensures that the transcription services must secure the data while they hold it, and would be liable for any data protection breach or loss of data.

The legal team have prepared a Data Sharing Agreement template, specifically for transcription services that can be found in the Help/Templates section of Ethics@Bath (Applications).

If you have any questions about this agreement, please contact the University’s legal team.

Externally available services

The University does not recommend or endorse any transcription service, and we are aware that there are many different services in use by researchers across the University.

Ethics@Bath (Applications)
The portal (Infonetica) for submitting research ethics application at Bath.
Ethics@Bath (Sharepoint)
Sharepoint site for guidance, news and announcements.

Working with data: Transcription