Gaming data is personal data. What you play, how you play, what you achieved, everything is often collected, stored and analysed by gaming companies. This tracking helps game developers to know what works and what did not. But how our data privacy rights are respected when we driving over NPCs in Grand Theft Auto or chatting with friends in the Fortnite lobby?

Metavers is in the discussions of many industrials as well as policy makers. They see this new buzz-word a way to not be qualified as outdated. In the contrary, I believe that the metavers topic has nothing new to offer: the combination of virtual reality, blockchain and social media is not very emergent. The issues we already face with these technologies will simply be the same for what we call metavers. Despite John Perry Barlow’s Declaration of the Independence of Cyberspace, laws apply in the internet as they apply in real world. (And internet is real world, actually).

Level 1: right to access

One of first right brought by GDPR is the right to know which data is collected and get access to them. Your personal data are always yours even if they are in servers hundreds of kilometres from you. Brighting them back to you is called the right to access. You can write to any of your favourites web services and ask them to get a copy of the personal data they have on you.

Unfortunately, the gaming industry is ages to fully-comply with basic rules. I tried to get a full copy of my personal data from four gaming companies: Ubisoft, Electronic Arts, Take-Two/Rockstar and Nintendo. I could not get a full copy with three of them. Only Electonic Arts gives me a quite large amount of data after multiple requests to the support.

Between the four companies, practices were quite different in terms of right to access. Ubisoft and EA provide a way to automatically download an archive of your personal data (in the parameters of your account). Funnily, both Ubisoft and EA gave me another export of data when I ask them. I could determine that the data sent by Ubisoft was incomplete because the export did not included data about achievements or statistics that I could find in the Uplay website (online shop of Ubisoft). For Take-Two and Nintendo, I had to directly contact the support as they were not providing the personal data download button. It was also interesting to note that Take-Two seems to have transmitted my request to their subcontractors as I received several responses from companies I did not contacted (saying they do not have any data).

For Electronic Arts, after many emails with the support, I finally received an archive of personal data with game data! The archive is composed of several files including some of game data like the this one shown below. I played quite a lot to Star Wars Battlefront 1, so the file is 4000 lines long. Let’s analyse the first 30 lines.

Screenshot of Battlefront 1 game data export

The file has a unclear structure. When you have a « Description » line, it is actually meant that you enter a new group of data. In the file, after the platform and my pseudonyme, we have « User Small Storage ». Here again, it is not very clear but we have another subgroup of data that is described by the line « Key ». We have « Customization » and then « StarCardHands ». In order, they stand for the customization of my in-game avatar and my weapons hands (in the game we set favorite weapons into hands). After, we have another « Description » line which mean we move to another group of data called here « Player Information », but let’s jump directly to « Player alltime player_core Stats ». Here we have my personal ranking information. This measure how I performed into the game. The next group of data is « Player alltime player_stats Stats » and seems to be about statistics, but for this group of data and for all other 4000 lines after, we have incomprehensible data: numbers with start of line that do not make sense (like « C Payre Oxc Gatt »).

Game over for privacy

It was very difficult, if not impossible, to get personal data collected by games. In the data provided by Electonic Arts, we can also see here a limitation of the right to access. Some pieces of data are understandable, but many are incomprehensible. Although, article 12 of GDPR state that information should be delivered « in a concise, transparent, intelligible and easily accessible form ». Furthermore, some contents were redacted because of industrial secret which make the data more incomprehensible.

Another issue of right to access is that you can only exercise this right if the data is process on external servers. If games are saving data on you local computer or smartphone, it is still personal data, but the data is on your side. You already have the data somewhere on you computer, but here comes the question on how to retrieve this data. You can browse files on your PC, but it is quite difficult to do on consoles or smartphones.

Games are exfiltrating not only game’s saves (your progression, your inventory…), but also analytics. As I showed in my 2020 documentary « Niveaux suivants » (in french), a lot of tools can be used by games to gather data and statistics. For instance, Google with Firebase Analytics is providing ready-to-use tools for in-game tracking.

In their 2021 paper « Surveilling the Gamers:Privacy Impacts of the Video Game Industry », Kröger et al. detailed what kind of data can be collected. From user inputs to hardware infos with in-between everything related to gameplay, the authors alert on threats to privacy with « illegitimate surveillance and user profiling ».

Schema representing a gamer playing on home console.
List of data type:
- User/Environmental Input
- Hardware & Software
- Playtime
- Game Selection
- Gameplay Data
A classification of data types commonly collected by video games (Kröger et al., 2021)

Gamers are being tracked, that’s a fact. But how all of these data and statistics are used for? Of course, privacy policies should detail what data is collected. It is not often the case unfortunately. A way to know how data is used by game companies is to have a look at some « Data analyst » job offers published by them. For instance, here is below a job offer I found on the Ubisoft website. It is mentioning the monitoring of performance but also « business behaviors, such as player engagement, retention and monetization » and the goal of their data analytics seems to be to drive business decisions. More interesting, they wrote that the candidate will have to do cohort analyses but also « forecast acquisition, retention and monetization at player level ». So we are really talking about tracking and analysis at player level and not only within groups of players (i.e. cohorts).

JOB DESCRIPTION
Ubisoft, a global leader in the video games and entertainment software industry, is currently seeking a full-time Data Analyst to join the Player Analytics team. She/He is responsible for developing data mining and statistical modeling solutions to understand game performance and key business behaviors, such as player engagement, retention, and monetization for a portfolio of games.
The ideal candidate is a high-performing individual with an analytical mind that thrives in delivering solutions to complex problems, has an affinity for data and analytics in support of driving business decisions. She/He has a strong aptitude for problem solving and creative solutions.
Responsibilities:
- Address advance analytics questions regarding player behavior and in-game monetization needs for our Paid & Free to Play offerings and effectively communicate findings to both technical and non-technical audiences
- Explore player data proactively and suggest new opportunities to measure and assess the performance of games. Identify meaningful relationships, patterns, or trends from complex data sets
- Build cohort analyses to identify trends in player behavior and measure impact of content release on activity and monetization. Build and evolve in-game measurement and experimentation methodologies
- Develop models to forecast acquisition, retention, and monetization at player level
- Collaborate with the other members of the Player and Finance Analytics teams to identify strategic business questions, key metrics, and actionable insights
- Work in collaboration with Publishing Data Intelligence Team on ad hoc analysis

For Android games, we can also cite the tool Exodus privacy which let you know what tracking libraries are included in Android apps and games. Studies have been conducted using it. In their paper « The Price to Play: a Privacy Analysis of Free and Paid Games in the Android Ecosystem » (Laperdrix et al., 2022), researchers have shown that even in paid games data collection can occurs. This is in line with the Ubisoft’s job offer mentioned just before where is it mentioned « monetization needs for Paid & Free to Play [games] ».

Data in games: the Just Dance use-case

Investigating scientific research, privacy policies or job offer can give us some hints on how data can be used in gaming, but let’s go practical with a use-case. In a 2022 conference, a data scientist from Ubisoft presented how they use personal data and machine learning in their game Just Dance. Thanks to this talk we will discover how advance the industry is in terms of the use of player data as the data scientist describe Just Dance as a cutting edge in this field.

Just Dance in a game where the player(s) have to dance with a motion-sensor controller that will detect if you do the right choreography. Surprisingly, we learnt in the talk that Just Dance is using machine learning only for level recommendation. With their latest game Just Dance Unlimited, more that 700 songs are available. It could be painful to search and find the song you want to dance in a list as long as this.

Just Dance use a Netflix-like algorithm in order to recommand songs. It is interesting to learn how this was deployed. The Ubisoft’s data scientist clearly stated that the machine learning algorithm was not ran on the player’s device, but on Ubisoft’s servers. Then, this data is subject to right to access.

The console is tracking data (Player, Session, Map) that is sent to the Sirius server (Hadoop) then to the data processiong and model computation (PySpark) and finally models back to console.
Freeze frame of the Just Dance talk showing a data flow chart

If this use-case only use data for levels recommendation, the data scientist admits that Ubisoft and the Just Dance team believe that gameplay customisation is powerful and could be used directly while playing in a more diegetic way.

Frontiers of privacy

With the Just Dance use-case, we saw that the gaming industry is not yet using data extensively. Although we should keep an eye on them as tracking is unfortunately trendy nowadays.

If we cannot predict what will come next from the industry, we can take a few step back and draw a desire future for data in gaming. I wanted to emphasis on an utopian wish: real-time access to personal data. This feature would allow users to access their personal data when it is collected. Third party tools could be interfaced with the service allowing innovative uses. Some services like social networks already made available APIs (application programming interface) for developers. In the gaming field, modding allows gamers to modify some parts of the games they play.

During one year of PhD, I explored how game data could be used for accessibility purposes. For instance, I made a prototype with the Microsoft’s tracking library Playfab which exfiltrate data that was then used to automatically create audio description for visually impaired gamers. This is a perfect example on how privacy investigation and data rights can overlap with other customers rights and that both could be improved!

There is a lot of interesting question to tackle in terms of privacy in gaming. Does the use of data by game editors is really a legitimate interest? Does it really respect the minifying principle? In multiplayer online games, there is also by design a transfer of data between players. What happens if an European gamer is playing with a US player in regard of cross-border data transfers? What happens if we allow players to process data with mods? In games like Minecraft, you can also connect to servers not hosted by the game editor but by third parties. Who is the data controller? Definitively, we should focus on existing questions in the area of gaming before trying to find new ones for the metavers.