How do social media platforms detect forbidden content? - TMC (en) Shape caret-double-left caret-double-right caret-down caret-left caret-right-circle caret-right Shape close dropdown expand more facebook Logo linkedin logo-footer logo-mark logo-mobile mail play search twitter youtube instagram
Menu Close
article

How do social media platforms detect forbidden content?

Recently the news came out that images of Black Pete are no longer allowed on Facebook and Instagram. This is not the first banned content on social media platforms. According to the company's guidelines, violent content, nude images and terrorist propaganda, for example, are also prohibited. But how are the posts of more than three billion monthly users checked for banned content? Romain Huet and Willem van der Geest, data scientists at TMC, explain how this works.

Reporting vs. using algorithms

In the case of Black Pete, users can report the content. This is checked and deleted by employees of Facebook. It is also possible to develop algorithms that recognize (a large part of) the content and - in case of violation of the guidelines - automatically delete it. This already happens for various categories such as child pornography, violence or terrorist propaganda. We will explain two categories of models that can be used for this purpose:

Object detection

With object detection, a model is trained to recognize certain objects in an image. In the case of Black Pete, a model is trained to recognize characteristics that Black Pete typically has - such as a dark skin tone, red lips, golden earrings, curly hair, a hat with a feather and possibly even a roe. An employee has to manually mark the location of Black Pete in images; this is called image labeling. After a sufficient number of images have been labeled, the model is trained with this information.

When a new image is posted that contains many of Black Pete's characteristics, the image will be classified as suspicious and may then be removed.

Binary classification

This is a second category that can be applied. We take Black Pete as an example again. If a lot of images of a certain subject are collected, we manually enter whether this is a Black Pete: yes or no? This is called labeling. Just like object detection, the model is trained with the labeled images. Once this is done, it will eventually recognize when an image contains a Black Pete and it can be removed.

Object detection and binary classification are sometimes combined. In that case, the object detection model does the first scan for all images on 'suspicious' objects (such as Black Pete). The binary classification model is specialized in the final judgement (is there a Black Pete or not)?

Once an algorithm is implemented, it saves a lot of time. In addition, it replaces the sometimes terrible work of moderators who can be exposed to shocking content. The fact that reporting is used at Black Pete instead of an algorithm can have several reasons.

When to report anyway?

In the past, (painful) mistakes have been made, which can lead to reputational damage. In 2015, for example, Google removed a label from their image-labeling technology because dark people were classified as gorillas. These types of mistakes are very sensitive and can even be considered racist. Therefore, it may be a choice to manually classify and remove images to prevent reputational damage.

As we already explained, an algorithm requires a lot of data that needs to be labeled manually. It is possible that Facebook wants to develop an algorithm for Black Pete content as well, but simply does not have enough input for this yet. In addition, there is never a guarantee that an algorithm will work perfectly.

What’s your next step? We can help you with that

Ask your question