An investigation on artificial voice and the social-cultural aspects of sound event

A cultural history of the talking machines

Despite its important role for social life, sound as event per se isn’t much problematize in the field of social science.

But yet we practically float in an universe of different sounds, meaning events produced by pressures on the air – it can be defined as the raw phenomenon – that we then interpret as a rumor more or less defined or signals of articulated and pregnant sense.

The book La voce artificiale. Un’indagine media-archeologica sul computer parlante (The artificial voice. A media-archaeological investigation on the talking computer) considers the current success of computerized vocal assistants – Siri, Alexa, Google Home, etc. – to enter into the constructive and auditive dynamics hat consent a sound event to be interpreted as a cultural and social artfact.

Having experiences in the phenomenology of sound – of a sound existing because it resounds different in the bodies that encounters – means to be trained to face peculiar issues as the strange effects of its unpredictability and transitory or its grasp on human involvement – at cognitive, emotional and physical level.

Such know-how has clearly helped the author – Domenico Napolitano, a young social researcher engaged also as organizer of sound performances, and artist in sound field himself – revealed ready to face with tenacity and wit a large range of issues examining them from different but profitably interrelate study prospective – philosophical, sociological, technical, literary, linguistic, computational, economical, aesthetical, and also ethnographical one, interviewing professional people that elaborate strategy and use of artificial intelligence in the vocal field.

From this point of view the research on computer that learned to answer and talk through a synthetic voice – at this point settable at will, even with the characteristics and tic of a peculiar human voice (cloning) – goes beyond the fact to be a punctual and clear description of
the different facets of phenomenon, proposing itself as a generous and instructive guide regarding how to analyze in general a technological medium in its construction and public reception as social and cultural artfact.

Research as a work of hacking and demystification

Then research is always conducted at a double level – both on the specificity of the new vocal technology and the more general lapels of new assemblies between human and machine – because, as the author affirms commenting post human thesis, «artificial voice can be seen as the expression of a condition that happens between human beings and technology, a modulation of mobile border putting them in relation » (p. 425).

The continuous raise between particular and general is very useful in a context so complex as
that created by the diffusion of networked digital technologies and the overwhelming development of informational products that – by datification and filtering of every type of activity and behavior – baste every kind of ambient and relations.

This kind of analytic approach works as a sort of hackering/demystification action capable to open and show up the algorithmic black boxes to which we are now all subject describing both how they work and the potential risks related to each new mix of human and machinic.

Among the many issues investigated, for example, there is the examination of the new software (Artificial Intelligence) about which we learn that behind the good job of AI there are statistic calculations that have no needs to comprehend the meaning of signs but only individuating the chance that specific textual data are related to specific sound data – to be done, algorithms must access an incredible amount of real, factual samples – so that speaking turns out to be a data-driven simulation of vocal effects pre-acquired from the real field – powerful operations bypassing the internal ability (semantic and hermetic) human beings have to use.

At same time, we have a clear evidence how the ability to design and outsource to machinic systems such sophisticated human functions – which intimately hybridize themselves with our lives engraving on decision-making will of people – are in the hands of companies following interests tending to escape – despite effectiveness of their power – the principles of responsibility (ethical and social one) that normally regulate community activities.

Resurfacing contexts, knowledges, interests and imaginaries

These first considerations help us to understand why the study of a communication technology can or must embrace a such broad range of issues.

First of all, the imaginary about disembodied voices is ancient as the human beings, such as the attempts to build talking machines and use the voice to anthropomorphize them into our domain – in the movie 2001: Space Odyssey by Stanley Kubrick (1968) the voice of the computer Hal 9000 plays continuously among this sense of familiarity and otherness.

On the other hand, we deal with the fact that communicative ability related to the language has a central role for human being and voice, also as unique body marker, is its medium par excellence.
When computers begin to understand and produce vocal language through computational processes starting from acoustic data, we enter in a new existential scenery where the assumption that every voice assumes a body is no longer true.

But other assumptions are also failing: speaking and answering actions have always been intended as a principle of human individuation, even a clue about a full and responsible presence – this idea is much debated in philosophy in terms of “metaphysics of presence” intending to criticize the privileges attributed to the phonetical/performative function rather than others signs of presence.

For the author, then, artificial voice is a «socio-technic phenomenon moving transversely among technologies, knowledges, stories, desires, interests and imaginaries, regarding at the same time anthropological, social and epistemological structures» (p. 19).

The instructiveness of good research

It has been suggested how the work can be an instructive guide to investigate media technologies. In supporting this thesis there are many factors: the methodological approach; the large examine of theoretical and critical references including international scholars considered more attentive to the problems faced; the discursive intelligibility.

Also the effort to integrate the theories taken in reference is an important add-value – media archeology, materialism and constructivism (Science and Technology Studies, Actor-network theory, Critical algorithm studies, Sound studies) – to show how the process of «voice-making» is historical, social and evolutive, and how technologies – analyzed in the ups and downs of their construction in which economical, social and technical inputs have to be mediated – should also be explored in their materiality, as well as in the contexts of use, being themselves a real «ways of thinking» since technologies actualize and perform them «embodying epistemology, psychology and operating capacity» (p. 70).

The fruitful work of artistic practices in the sound field

Finally, it’s interesting to say about the utility to tap into sound artistic experiences for their contribution to research issues.

In effects, the voice and its effects, even in the intersection with technological artfacts –«the voice in the machine» and «the voice of the machine» – are at the center of reflections and artistic practices. Thanks to experimental spirit they have the ability both to represent new types of events and anticipate creative innovations becoming even commercial products.

Practically, audiences can often understand in events organized by sound artists what likely happens or will happen in the private laboratories of some company as the current technical and socio-cultural scenarios persist – research reports many examples.

In these artistic works, supported by the use of newer software – well-mastered machine learning and neural network – it’s possible, for example, set up challenges by improvising, in real time on stage, duets between your own incarnated voice and the same cloned voice but generated and sung – in vocal expression impossible for a human beings – by a computer.

In this case theme is the border between body identity and machine – in fact the voice is that of the artist but it is also that of the machine.

But we can also find that artistic practices revealing both the powerness and projectual unpredictability of the more general datification processes to which nowadays we are all subjected – and that is a very clear warning.

The datification of each phenomenon and the use of software able to relate them – through the mere fact of having become pure numeric dataness – are able to generate hybrid objects/phenomena combining, at will, some of their singular, internal characteristics.

Thus we learn there are speech syntheses that can hybridize the timbre of someone’s voice with someone else’s prosodic style, and this applies to any other paralinguistic feature (voice conversion) – several companies already offer the so-called skin voices as a commercial product.

Reference

Napolitano, D., 202, La voce artificiale. Un’indagine media-archeologica sul computer parlante, Napoli, Editoriale Scientifica.

A cultural history of the talking machines

Research as a work of hacking and demystification

Resurfacing contexts, knowledges, interests and imaginaries

The instructiveness of good research

The fruitful work of artistic practices in the sound field

Luciano Petullà