projecten & onderzoeken / minor_ethics-data-science
user photo

ethics

Published on
minor cyber body of knowledge ยท 3 min read
Duration: 1 day

Ethics - Data Science

To begin with, people tend to view data as objective by nature. We tend to forget that data is only as accurate and objective as the people and processes used to generate and collect it.

Second, modern machine learning techniques are so complex that they are difficult for people to understand. This makes it difficult to determine what the right inputs are and what the ethical implications of the results are. It is almost as if the answer comes to us from a magic box, which we do not fully understand but blindly trust.

Personal data such as passwords, photos and location data can fall into the wrong hands. Predictive models used for policing and sentencing can reinforce stereotypes and have negative racial or socioeconomic consequences. One can think of the benefits affair, where models had labeled a population as fraudulent. Here, a statistical analysis is done on data. The algorithms make a correlation between data which can cause inequalities and/or prejudice. In this case it was people of a certain origin who were marked because according to static models they would cause more fraud.

Quality of research and finding false correlations

The scientist makes the data readable and in doing so the scientist has a great responsibility on correct interpretation.

A good example of this is the correlation between consumption of "ice cream" and "shark attacks." Here a correlation is measurable but he this is not correct.

There is also such a thing as a "bias", where there is a systematic bias towards a particular group in the data set. Suppose one wants to do a study on how well students perform across the country. If one collects data from only the Fontys and then does an analysis on this, the data is erroneous. The sample (Fontys students) is not representative for the population (students in the whole of the Netherlands). A better way to investigate the school performance of students is to take a random sample of students throughout the Netherlands and ask them about their performance at school. This is just a small example but in the world of science this is very common.
Often this is also combined with other studies because questionnaires are not always reliable (again, different biases can occur).

Predictive models only "see" the world through the data used for training. In fact, they do not "know" any other reality.
When that data is biased, the accuracy and fidelity of the model are compromised.

An awful lot can go wrong in data analysis in this the scientist has a great ethical responsibility.


Recourse: The Good, The Bad, and The Creepy: Why Data Scientists Need to Understand Ethics

Thank you for reading this topic about ethics I hope it was interesting any feedback is always welcome. Hope to see you in the next topic,
Byee! ๐Ÿ‘‹๐Ÿบ

TL;DR What standards and values are involved in analyzing data. How to avoid prejudice.