There is More to Data than what Meets the Eye

First published:

Last Edited:

Number of edits:

I have been studying the subject for years. I have implemented my own data analysis algorithms and analytics service

When raising the concern about user privacy online and data collection , one of the biggest challenges is replying to a comment such as:

But... what's the big deal? The "data" you're referring to is simply that you clicked on a link, that they sent you, because you signed up..... I don't get it.

And I believe that these reactions stem from poor communication, especially fueled by the bigger players around, because they strive at creating misconceptions.

When presented with the misery their platforms are creating (as well as other moderation-adjacent problems, like perceived bias) companies often say more technology is the solution. During his hearings in front of congress last year , for example, Zuckerberg cited artificial intelligence more than 30 times as the answer to this and other issues.

The issue with data collection is not necessarily the data itself, which may be innocent: what browser I use, what fonts I have installed, when did I click on a newsletter article, but what questions can be answered with that data. That is why, the most important thing is to understand what is a machine learning algorithm and roughly how it works. Maybe there is a correlation between the seemingly innocent data and my sexual orientation, or political views.

From an algorithm perspective, it does not matter whether I explicitly state my political position. The correlation can be found using other's people data and extrapolated to the innocent information that was collected from me The two examples of sexual orientation and political inclination are already a problem in today's world. There are countries in which homosexuality can be punished or is highly un-approved by society. In some places political dissent is forbidden.

But let's not focus solely about "bad actors" such as repressive governments. Imagine we can predict whether a woman is pregnant by looking at her chocolate consumption 1 . It is reasonable to think that there will be a correlation between chocolate consumption and extended absence at work. If we were a recruiting agency optimizing for employee engagement, we wouldn't even need to know the gender of the candidates, we would check the chocolate consumption and bias our presented candidates towards men that can't get pregnant.

The example above is highly reductionistic, but it exemplifies what happens also within the most technologically advanced platforms. Twitter had its racist scandal , but so did Google identifying black people as gorillas . And we have no way of knowing how many other examples belong to non-public platforms. And these are only the problems we see today .

Data sources are many, and they are for purchase (see: Commercially available private data and security services ). User data either publicly shared (such as this article), or passively collected (such as clicks on a newsletter ) is not going anywhere, it is being accumulated and aggregated without interruptions. The same data that today seems meaningless, tomorrow can be a predictor of behavior the political power of the time does not consider correct.

Data that today is not aggregated because it belongs to different companies, can be merged easily. Facebook acquired Whatsapp and after some time both databases were merged. Just by checking Substack privacy policy you can see that they contemplate the idea of selling user data as "an asset", and we have no way of knowing whether they are already doing it, whether our clicks, times to check e-mail, locations, etc. are already in the hands of other aggregators.

Therefore, when I talk about user privacy online , I am not only thinking about the consequences massive data collection already has today. I am also concerned about the consequences it will have in the future. We can't guarantee that the freedoms we have today will be the same tomorrow. And we can't know what will be possible to extract from the data we already generated, even if we go completely off-grid from now on.

There are some individual actions that can create a healthier internet , such as acknowledging the power of web developers . However, the only path to change is to generate a collective consciousness of the problems that are arising, and understanding that the people who is part of the problem cannot be the solution .

The internet needs to know less about us, not more. Just because it’s possible to track someone doesn’t mean we should.

  1. https://www.forbes.com/sites/kashmirhill/2012/02/16/how-target-figured-out-a-teen-girl-was-pregnant-before-her-father-did/

  2. https://www.theverge.com/2019/2/27/18242724/facebook-moderation-ai-artificial-intelligence-platforms

Aquiles Carattino
Aquiles Carattino
This note you are reading is part of my digital garden. Follow the links to learn more, and remember that these notes evolve over time. After all, this website is not a blog.
Subscribe to the newsletter

Get the weekly reflections of a curious mind

© 2020 Aquiles Carattino
Privacy Policy
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.