Learn with AI: is it news or an ad?

An interview with Peter van der Putten from Leiden University

05.01.2022  |  by Cato Verhoeven  |  Reverb Channel

Distinguishing advertorials from regular articles is hard. This study shows that only 7% of the participants could accurately tell them apart. Fortunately, AI can tell the difference in up to 90% of the cases. Timo Kats, Peter van der Putten and Jasper Schelling created models for the Reverb Channel research program that can accurately predict whether an article is advertorial or editorial content.

Peter van der Putten

Last November, the project and paper were presented at the BNAIC/BENELEARN in Luxembourg. To give an insight into how this paper came together, we interviewed Peter van der Putten.

Peter is assistant professor at the LIACS, the Leiden Institute of Advanced Computer Science at the Leiden University. He is both a researcher and teacher and supports bachelor, master or PhD students in their thesis research. His role in this particular project was to coach his student Timo Kats, who graduated on this project.

“Timo and I wanted to examine the role of AI in the media landscape. It is often said that AI locks people into echo chambers and filter bubbles. But can we reverse the role of AI to use it in a more positive way? We could use it to uncover everyday issues in our media landscape and ask questions about these.

In the case of Timo, I am quite fortunate that he is very creative and takes a lot of initiative. That is great for a supervisor. You can get more into the actual research topic, and less into the ‘you have to do this now and that next’. We could really work on the research together.

We trained the AI models on examples of advertorials and regular news articles. The predictive models can distinguish advertorials from editorials up to 90% of the time. To better understand what the models have learned, Timo also made a lexicon of words that are typically used in advertorials and editorial content, and a word cooccurrence graph to visualize which combinations of words are often used together in this content.

This is interesting, because these keywords aren’t restricted to the topics an article discusses. It’s also about the way these topics are being written about. For example, free and inspiring are both very popular adjectives in advertorials. And yes, words like cabinet and minister are indeed often found in news articles. But so are the four W’s: what, when, where, why. That is really interesting, as journalists are trained to answer these key questions specifically for news pieces.

It’s also interesting to see that for some news sites it is a lot harder to distinguish advertorials from regular content. Commercial and editiorial content in the business to business publication De Ondernemer (the Entrepreneur) is hardest to differentiate. I don’t think that is indication of any bad intent on behalf of the publisher, but it says more about the nature of the beast of this medium. In a business to business magazine, this content gets intermingled more easily, as editorial content tends to adopt more commercial language.

With this AI technology, you can make a lot of things visible in the news. For example, sponsored content that is not marked as such. Or in regular news, when you read a few newspapers, you can often find the same article. That’s not just because it all comes from press agencies like ANP, but because companies, public organizations and other interest groups send lots of press releases to newspapers. That’s not bad, per se. But it is good to make it visible to the readers.

The use of AI and machine learning for these purposes is not limited to a small group of academic experts. I think lots of people can and should do an investigation like this. The idea of Reverb Channel is to create a community of designers, researchers and journalists. All these people in our community could use their different views to create even better versions of this project. We want to create more media literacy. And this is a step in the right direction.”

For a full writeup of Timo and Peter's research have a look at the paper that was published at the 33rd Benelux Conference on Artificial Intelligence and the 30th Belgian Dutch Conference on Machine Learning (BNAIC/BENELEARN 2021), Luxembourg, November 10-12, 2021.

About the Author

Cato Verhoeven


Cato Verhoeven

Intern Journalism

Jasper Schelling

Programme Lead

Peter van der Putten

Research Supervisor

Timo Kats