Lights, Camera, Action! (The voice comes in …)

The body of avatar

What a weird and fascinating story the award recently assigned to Scarlett Johansson by the 2013 Rome Film Festival for her vocal interpretation in the last Spike Jonze’s movie “Her”. The actress interprets the voice of computer’s operating system, with which the protagonist ends to fall in love. Given the Italian policy to dub every foreign movies, much irony has flourished because a such vocal performance will be never appreciated by local audience. Moreover, both Johansson and many other winners have attended awards through video messages. Indeed, it’s already sure: we are used to meet each others indifferently in a mixed way of online/offline presence, in this case much more online. Thanks to its fundamental sensitivity for phantasmatic forms, by which “the maximum of reality” can be produced (Alberto Abruzzese), cinema industry takes responsibility to certificate it.
However, the actress made a preventive visit to the festival that was intended as a crafty operation to reanimate ambitions of a festival declining on the wake of a double crisis: those of Italian cinema industry and city budget, having the Capital  the major role in exhibition organization. (Many people blamed the fact that, finally, the visit has been paid by taxes of common citizens).
We know movie industry can orchestrate plots to feed events but, really, there is a lot of stuff in this story.
First of all, 2014 Golden Globes jury has just declared Johansson not eligible because of uncomplete requisites  being only a vocal performance. But Golden Globes already awarded Robin Williams  in 1993 for  his interpretation of Disney movie Aladdin.
There have been two decisions: the first one ready to recognize the value of a vocal reanimation of a computer’s operating system, already intimately connected and connectable to the (trans)human territory/sensibility. The second decision rejected the candidate but the American jury was more positive when the case involved the traditional cartoon characters, as entities more accepted and recognizable as human. It seems to be a question of more or less awareness/backwardness.

But, for people love media archaeology, the most interesting aspect is the power of our ancestral medium. In this story of dematerialization and new forms of man-machine hybridization, the voice brings back corporeal density to the new relations because it (normally) implicates and gives back the body, besides – after modern media revolution (think of recording) – manages zones between human and machinic.

Media philosopher John Durham Peters gives us many cues:

“Each person’s voice is a creature of the shape of one’s skull, sinuses, vocal tract, lungs, and
general physique. Age, geography, gender, education, health, ethnicity, class, and mood all
resound in our voices. Anatomically, the voice is set on the most bottlenecked (literally) part of
the body, along the passageways of the spinal cord, esophagus, and windpipe – it is a busy
highway there, as my colleague Ingo Titze says. This setting makes the voice reflective, in some
deep ways, of the body’s being, and hence its preeminent status as the organ of emotion (it is
connected to the limbic system, the fight-flight response that dwells in all of us at the most
animal level)…

Modern media leave the voice in a curious limbo between body and machine, text and
performance, animal and angel, singular event and endless repetition. When Thom Yorke of
Radiohead sings, to take one example, it is unclear whether he is to be heard as a man, angel,
machine, demon, or animal, or all variously” (2007).

We can also retake a small fraction of reflections that, in an incredible book about the power of dissociated voice – voices we hear without seeing the body sources or voices come from “other” bodies, typical use conditions inside our media environments -, the literary critic and media historian Steven Connor offer us:

“The principle of the vocalic body is simple. Voices are produced by bodies: but can also themselves produce bodies. The vocalic body is the idea – which can take the form of dream, fantasy, ideal, theological doctrine, or hallucination – of a surrogate or secondary body, a projection of a new way of having or being a body, formed and sustained out of the autonomous operations of the voice. The history of ventriloquism is to be understood partly in terms of the repertoire of imagings or incarnations it provides for these autonomous voice-bodies. It shows us clearly that human beings in many different cultural settings find the experience of a sourceless sound uncomfortable, and the experience of a source-less voice intolerable. The ‘sound hermeneutic’ identified by Rick Altman deter-mines that a disembodied voice must be habited in a plausible body. It may then appear that the voice is subordinate to the body, when it fact the opposite is experientially the case; it is the voice which seems to colour and model its container. When animated by the ventriloquist’s voice, the dummy, like the cartoon character given voice, appears to have a much wider range of gestures, facial expressions, and tonalities than it does when it is silent. The same is true of any object given a voice; the doll, the glove puppet, the sock draped over the hand, change from being immobile and inert objects to animated speaking bodies. Our assumption that the object is speaking allows its voice to assume that body, in the theatrical or even theological sense, as an actor assumes a role, or as the divinity assumes incarnate form; not just to enter and suffuse it, but to produce it. In bald actuality, it is we who assign voices to objects; phenomeno-logically, the fact that an unassigned voice must always imply a body means that it will always partly supply it as well.
In fact, so strong is the embodying power of the voice, that this process occurs not only in the case of voices that seem separated from their obvious or natural sources, but also in voices, or patterned vocal inflections, or postures, that have a clearly identifiable source, but seem in various ways excessive to that source. This voice then conjures for itself a different kind of body; an imaginary body which may contradict, compete with, replace, or even reshape the actual, visible body of the speaker.” (2007).

Vocal interfaces and speaking avatars are not more a novelty and major players of internet ecosystem are intensively working on (Google, Yahoo, Microsoft, Amazon, IBM, Nuance, etc.). Smartphones offer personal assistant ready to act by vocal commands or to read us messages. Among vendors, there is a fiery competition to increase functionalities.
Last version of Google Now application – search functionalities, but also personal assistant and organizer – downloadable by smartphone and tablet is impressive for its ability to respond meaningfully and fluently to our vocal commands, at least watching the video posted by Phonebuff, a company specialized on mobile reviews. By the way, some technological observer advances suggestion to think differently about browsing through web pages, but the vision could show some limits  in terms of diffusion and business models. A such development need both high hardware/software performances and network availability, while uses couldn’t be so wide but limited to peculiar functionality, above all applicable to mobility conditions when we suffer the most space-time limitations. For the rest, eyeballs still remain relevant to pay technological advancements!
Keep in mind these exercises of balancing, it’s simpler to skip risks of wrong predictions, framing vocal applications with more commercial pragmatism. The recent acquisition of Skyphrase by Yahoo could be an example. Founded by cognitive scientist Nick Cassimatis, Skyphrase is specialized in NLP (Natural Language Processing). More specifically, its software permits to build structured commands using the voice. Briefly, we can demand Yahoo to have some kind of services (news, video, etc.) when some condition happens – a specific performance or result in a sport context. The example is not casual: Yahoo has a very popular service used by football fans. Fantasy Sports application is used to chat with other participants, but also to choose players, make researches or maintain the team under control.

Finally, this vocal activism has attracted a fresh follower as Wikipedia that wants to add voices 🙂 to its biographical voices.
The initiative has so been explained by one of its relevant contributor. Asked on what he learns from hearing someone’s voice, the journalist Andy Mabbett answers:

It’s a very personal thing. If you think about the people in your own life, you know their voice the moment you hear it, as much as or sometimes even more than a photograph … With a voice, you know instantly. And, I don’t know about you personally but if I hear a voice from the dim and distant past from the days of wax cylinder recordings, somebody like the nurse in Florence Nightingale, it’s so exciting to have that connection back to them. So we’re doing the same for people today.



Connor, S., 2000, Dumbstruck. A Cultural History of Ventriloquism, Oxford University Press, Oxford.

Peters, J. D., 2005, “La voce e i media moderni”, in Petullà, L., Borrelli, D., Il videofonino. Genesi e orizzonti del telefono con le immagini, Meltemi, Roma.

Jarvis, J., “Past the Page”,, 30/11/2013.

Yahoo Acquires SkyPhrase – Pie In The Sky Or Future Cloud Offering?”,, 02/12/2013.

Wikipedia Archiving Voices So You’ll Always Know How Celebs Sound“,, 3/2/2014.