Visualizing Non-Verbal Interactions in Transcription Design

Transcription is an important part of documenting and analyzing all kinds of human-centered research. Yet transcripts that focus exclusively on spoken language can often erase salient elements of the interactions that they are trying to represent. As part of an undergraduate linguistic anthropology course on nonverbal communication, I developed a transcription approach for fixed camera video that preserves many features of nonverbal interaction that are difficult to document with classic transcription styles. Many transcripts erase forms of communication such as body language, gestures, prosody or camera actions.

A transcript will always have selection bias in order to focus attention on interesting aspects of an interaction, but in some cases these aspects will not necessarily be the speech or language of the participants. The best transcript might not always read like a play or a novel. Transcription is of course always an experiment with ways of transforming, reducing and presenting information.1


Paper sketch of different transcription options for representing the same conversation. Transcript 1 presents communication serially and cohesively, while Transcript 2 presents speakers in parallel, showing pauses, interruptions and dominant speakers.

Though multimedia technologies can offer increasingly granular analyses of interactions via video or interactive web-based formats, paper (and static document formats) remains widely-used in many stages of research and publication. Despite the trends toward digital and online media, the question of how to best represent fluid interaction on a paper(-like) medium continues to persist.


Column-style transcripts for HCI show the time-course and dynamics of screen interactions more clearly

Through this experimentation, I found that a simple columns-style transcription worked best to emphasize the role of silence and bodily gestures in multi-person interactions. For fixed-camera video, columns better represent the spatial and visual organization of the scene. Images can also be easily included alongside speech, and it is trivial to illustrate reactions to other speakers.

I prepared a prototype transcript using simple layout tools in MS Word. This transcript focused on gendered power dynamics, nonverbal communication and logical structure in scientists’ informal research discussions and was submitted as part of a final semester-long project in linguistic anthropology.


My prototype transcript of a conversation between several ornthithologists included facial expressions, gestures and emphasis alongside speech. Speech was transcribed using the International Phonetic Alphabet

This project resulted in a multi-purpose columns-style transcription template that can be deployed in any type of research that is interested in the aspects of interactions which are rendered invisible in other sorts of transcripts. Serial transcripts clearly show the content of speech when there are multiple speakers, but parallel transcripts appear to do better in revealing the social dynamics between multiple speakers.

I am currently refining a more modular version of the template via ShareLaTeX that can perform conversions between serial and columnar transcripts.

<– Back to projects

  1. Ochs, E. (1999). Transcription as theory. In A. Jaworski & N. Coupland (Eds.), The discourse reader, (pp. 167 - 182). London; New York: Routledge.