In internet everybody knows you can be a dog
Time has gone, socially and technologically speaking, since comics on New Yorker (1993) drowed by Peter Steiner cleverly summarized both powerness and riskies about online interactions, so becoming a meme on internet anonymity condition. The phrase “on internet nobody knows you’re a dog” referred to its colleague could be understood at least in a double meaning.
From one side, as a relational and democratic liberation from material considerations such as class, genre, body apparency, age… and, here, also of species, an idealization of purely discoursive relationships that, thinking of animal inclusion, seemed less radical than Alain Touring’s dream involving even speaking entities built artificially (AI). On the other hand, the comics were a warning to check the “quality” of online contacts because everyone has a chance to cheat.
Actual horizon fueled by social accounts and ostentatious exhibits of personal images (granted of preventive aesthetic adjustments) seems to have absorbed part of anxiety. By the way, the “fake” became more insidious because involving contents even if now it’s a well-known and rejected category that, given our alternating successes, won’t spare us from using a vigilant and active criticism in the future.
Instead, a multitude of speaking bots runs freely but, when we’re aware of them, they serve speedily our informative needs or work as customer care, preventing us to speak to human beings because, online, they are entities maybe rare and too much empathic.
However, internet is not only social networks, having paths of navigation humanly infinite in which we would like to wander as a flaneur in the crowd, carefree and anonymous without worry about a continuous surveillance, really a dream nowadays. Not so much for a paranoiac sense of control but for the real danger to be in the hand of people ready to steal sensitive data or compile detailed report about our choices, behaviors, ideas, and so on.
Indeed, we understand very well there is a real difficulty to master all dynamics happen in this open-ended forge but we find sad that so many people dedicated to elaborate and manage digital marketing strategy and actions can not anticipate, in order to stop, the bad uses of techniques and tools thought originally – we hope – to improve or mitigate specific failures.
It seems incredible this obstinacy to poison the wells in which we all drink – to bring untrust of whole system – not analysing previously tools and distortions they create, sometimes even neutralizing the efforts to put in security data exchange between users and websites.
Last surprise arrived in November 2017 when the Center for Information Technology Policy of Princeton University published a study (No boundaries: Exfiltration of personal data by session-replay scripts) about a technique used to record and replay all interactions and information exchanged between user and website during a web session. The replay session is a complete recording of a web session as a video-recording camera was behind user’ shoulders – in any moment, publisher can retrieve and playback it to have a deep look at all actions and information.
To understand how it has been possible to implement this level of intrusion and sophistication we need to recall the technical efforts produced in many years in order to have, through browsers, a more interactive and powerful web pages.
There were many requirements: the use of browser as unique functional interface to manage every needs about presentation, input, modify and exchange of data; the need of advertising to control both user behavior and external synchronization to convoy the right promotional contents from different sources; the famous and so much celebrated passage to web 2.0 needed a more symmetrical functions between periphery and centre. The user generated content is possible only if the client (web pages/browser), through the script (software code) attached to web pages, can activate the functions prepared to modify, create or send information/content in combination with the server-side domains (sites/cloud).
Then, when we downloaded web pages, alongside HTML and CSS instruction to render content and styles, we import even pieces of code (script) useful to interactively shape web services. Nowadays we are capable to build very sophisticated graphical interfaces with an high level of flexibility in terms of creativity.
The reply session is developed by third-party software companies in order to offer internet publishers the capability to improve user experience in terms of usability and failures. Having the ability to view what happens during a user web session is useful also for customer care purposes in order to rebuild actions and context.
The value proposition seems enough fair but, examining the implementation and utilization, we find that there are many risks and, at the same time, users are not warned to be recorded. At the end, this technique results very used but only thanks to the Center for Information Technology Policy of Princeton University we know about its existence.
The research on reply session has involved only some software companies ( Yandex, FullStory, Hotjar, UserReplay, Smartlook, Clicktale e SessionCam) and, focusing on the first 50000 websites ranked by Alexa for popularity, discovered its use on 482 ones. But the number is evolving because it’s hard to detect the effective use – publisher can decide, as option, to block all or partially the functionalities by a central dashboard. At the end, replay session use can be acknowledged also considering other elements. The updated list, with an increased number of sites, is here.
To have a better understanding on topic we can read an extract of the report.
What can go wrong? In short, a lot.
Collection of page content by third-party replay scripts may cause sensitive information such as medical conditions, credit card details and other personal information displayed on a page to leak to the third-party as part of the recording. This may expose users to identity theft, online scams, and other unwanted behavior. The same is true for the collection of user inputs during checkout and registration processes.
The replay services offer a combination of manual and automatic redaction tools that allow publishers to exclude sensitive information from recordings. However, in order for leaks to be avoided, publishers would need to diligently check and scrub all pages which display or accept user information. For dynamically generated sites, this process would involve inspecting the underlying web application’s server-side code. Further, this process would need to be repeated every time a site is updated or the web application that powers the site is changed….
1. Passwords are included in session recordings. All of the services studied attempt to prevent password leaks by automatically excluding password input fields from recordings. However, mobile-friendly login boxes that use text inputs to store unmasked passwords are not redacted by this rule, unless the publisher manually adds redaction tags to exclude them. We found at least one website where the password entered into a registration form leaked to SessionCam, even if the form is never submitted.
2. Sensitive user inputs are redacted in a partial and imperfect way. As users interact with a site they will provide sensitive data during account creation, while making a purchase, or while searching the site. Session recording scripts can use keystroke or input element loggers to collect this data.
All of the companies studied offer some mitigation through automated redaction, but the coverage offered varies greatly by provider.
3. Manual redaction of personally identifying information displayed on a page is a fundamentally insecure model. In addition to collecting user inputs, the session recording companies also collect rendered page content. Unlike user input recording, none of the companies appear to provide automated redaction of displayed content by default; all displayed content in our tests ended up leaking.
Instead, session recording companies expect sites to manually label all personally identifying information included in a rendered page. Sensitive user data has a number of avenues to end up in recordings, and small leaks over several pages can lead to a large accumulation of personal data in a single session recording.
4. Recording services may fail to protect user data. Recording services increase the exposure to data breaches, as personal data will inevitably end up in recordings. These services must handle recording data with the same security practices with which a publisher would be expected to handle user data.
We provide a specific example of how recording services can fail to do so. Once a session recording is complete, publishers can review it using a dashboard provided by the recording service. The publisher dashboards for Yandex, Hotjar, and Smartlook all deliver playbacks within an HTTP page, even for recordings which take place on HTTPS pages. This allows an active man-in-the-middle to injecting a script into the playback page and extract all of the recording data. Worse yet, Yandex and Hotjar deliver the publisher page content over HTTP — data that was previously protected by HTTPS is now vulnerable to passive network surveillance.
“No boundaries: Exfiltration of personal data by session-replay scripts“, 11/15/2017, Freedomtothinker.com.
Wikipedia, Replay session.