Non-Speech Notation: Transcribing things other than speech

Updated: 04/02/25

Use these if something other than direct transcription of speech is required to convey meaning, as conversations are not always purely verbal. There are also technical reasons, such as recording quality, where these may come in handy. DO NOT overuse tags!

Most Common Used Notation

<unintelligible>

This is used for speech that cannot be discerned due to bad recording quality, someone speaking too softly, way too fast (although you can slow down the recording), or other reasons that cause confusion. Try not to overuse this; if there is an address or name that you can’t quite hear, try to look it up if you know the area the recording is taking place in. Clients won’t enjoy a transcription filled with <unintelligible> but sometimes, using it is unavoidable. Rather use this than making up something that you think it could have been, as your goal should always be accuracy. So use this notation as sparingly as possible!

<inaudible>

When transcribing audio, you can use the word "inaudible" in brackets to indicate that part of the recording is unclear or impossible to understand. This could be due to low quality speech, Zoom cutting out, or the microphone not picking up any noise.

NOTE: This is quite similar to <unintelligible>, but there’s a key difference. “Inaudible” means that the sound cannot be heard at all, whereas <unintelligible> refers to speech that can be heard but is unclear or difficult to understand. In other words, “inaudible” indicates complete silence, such as when someone’s audio cuts out but you can still see their mouth moving, while <unintelligible> means you can hear the sound but cannot decipher what is being said.

<crosstalk>

When speakers talk at the same time, it can be difficult to discern what order to place the transcribed sentences in. Here, use your intuition and place them in the order that makes the most sense, and if parts of the speech are being jumbled or cut off, indicate that crosstalk occurred using <crosstalk>. This helps the client make sense of interruptions and other conversational interactions that might make sense in real life, but not on the page. This notation should go at the beginning of the speaker who interrupted.

NOTE: <crosstalk> should ONLY be used if they are talking over each other and some part of the interruption isn't heard in the audio file.

NOTE: <crosstalk> should always go at the beginning of the speaker block of the speaker that is interrupting. Also, it should NEVER go in the middle or end of a speaker block.

Least Commonly Used Notation

IMPORTANT Rule of Thumb: Use the non-speech tags only to fill in for when the speaker fails to give a verbal response and only shakes or nods their head. Don't use tags if everything is already clear from the speech.

<background-noise>

The <background-noise> tag is to denote external sounds that impact the conversation, such as a phone ringing, a door slamming, or other people talking in the room. However, this tag should only be used when a speaker explicitly acknowledges the noise. If no one references it, it should not be included in the transcript. This ensures we capture only relevant disruptions without adding unnecessary clutter.

✅ Example: "<background-noise> Sorry, my dog’s barking in the other room."

❌ Incorrect: "<background-noise>" (without a speaker acknowledging it).

<video-playing>

This tag is used to indicate that a video is playing in the background or during a recording. This tag is used when the audio of a video is not fully discernible, and it is NOT meant to be transcribed word-for-word. It provides context that a video is present, which may influence the transcription or understanding of the surrounding conversation.

<laughing>

This is not necessary to use at every slight giggle, but if a speaker’s laugh has meaning, then it can be useful. For example, laughter can replace an affirmative response, or demonstrate their feelings toward a statement, idea, or person. This can be important for clients to know.

<redacted>

In legal proceedings, it's not uncommon for both attorneys to agree to exclude certain sensitive information from the official record. This could include personal details such as birth dates, names (especially if the witness is a minor), Social Security Numbers (SSNs), or any other private information.

To maintain the integrity of the record while respecting the confidentiality of these details, the <redacted> notation is employed. This notation indicates that specific information has been intentionally omitted or redacted from the transcript at the mutual agreement of the involved parties.

<pointing>

Often, speakers may say nothing at all when answering a question, but rather point to something in the room, on themselves, or in a picture instead. Note that this should only be used if a video was supplied by the client with the recording, and you can clearly see the speaker pointing. You could add specificity here if necessary, e.g. <pointing-at-photograph>.

<gesture>

Conversations are not always verbal. Sometimes, body language and gestures carry a lot of meaning. If a video is supplied, you can indicate that the speaker used a gesture in response rather than actually speaking. You might also want to add /affirmative or /negative, if this gesture symbolizes “yes” or “no.”

<nodding/affirmative>

This is the same concept as <gesture>, only more specific. Affirmative answers often take the form of a nod rather than “yeah/yes/okay” etc. Only use when a video is supplied and the speaker is clearly nodding and there are no verbal cues from the speaker.

<shaking-head/negative>

This is the same concept as <gesture>, only more specific. Negative answers often take the form of a head shake rather than “no” etc. Only use when a video is supplied and the speaker is clearly shaking their head and there are no verbal cues from the speaker.

<shrugs>

This is the same concept as <gesture>, only more specific. Shrugging refers to moving the shoulders up and down and may indicate that the speaker doesn’t know something, is unsure, etc.

<vocalization>

Although everything we transcribe is in English, speakers sometimes use nonverbal sounds to convey meaning. If the sound can be accurately represented with letters (e.g., "fa-la-la-la-la"), transcribe it as such. However, if no letters will do the sound justice, use <vocalization>. This applies to meaningful non-speech sounds such as sighs, groans, and coughs—but only when their presence affects the clarity of the transcript or when omitting them would result in a loss of context.

NOTE: Avoid overusing <vocalization>. It should only be included when it contributes to the transcript’s accuracy.

NOTE: This is different from <unintelligible>, which is used when actual speech is unclear or inaudible.

<leaves-room> and <returns>

Occasionally one or more speakers will leave the room or the area where the recording takes place. In this instance, it can be useful to add this contextual information so that the transcription is less confusing for the client, as there will obviously be a gap in conversation.

<audio-playback-start>, <audio-playback-paused>, and <audio-playback-end>

These tags are used to denote that an audio playback of a prior line of questioning within the event or a piece of audio (like a voicemail or an interview recording) that is associated with evidence or a piece from an Exhibit is being played during the event. You SHOULD transcribe the contents of the recording by denoting <audio-playback-start> at the beginning and <audio-playback-end> at the end. If there is a pause in the middle of the playback where there is a conversation or questions/answers, use <audio-playback-paused>. Once they begin the audio again, make sure to denote <audio-playback-start>.

<foreign-language>

For whatever reasons, attendees may speak a few words in a foreign language even though the whole deposition is in English. It may be because of the person not being a native English speaker using filler phrases in their native language, maybe even swearing in another language. In such situations, use <foreign-language> tag instead of transcribing such speech. Read more about foreign languages and interpreters in Multiple Languages: Transcribing with Interpreters.