Artificial Intelligence: Language Systems Monitoring

Researchers at the University of Darmstadt show that AI language systems can learn human concepts by separating "good" and "bad".


Although moral concepts, regarding both language and actions, are subject to debate and differ from person to person, there are fundamental commonalities. For example, it is considered good to help the elderly and bad to steal money from them. We expect a similar type of "thinking" from an Artificial Intelligence (AI) language system that is part of our everyday lives.


However, examples have shown that AI systems can certainly be offensive and discriminatory. Microsoft's Tay chatbot, for example, garnered attention with lewd comments, and text messaging systems have repeatedly demonstrated discrimination against underrepresented groups through their AI language systems.


Natural Language Processing


This is because search engines, machine translation, chatbots, and other AI applications are built on natural language processing (NLP) models. These have advanced considerably in recent years through neural networks. An example is bidirectional encoder representations (BERT), a model pioneered by Google. Consider words in relation to all the other words in a sentence, rather than processing them individually one after another.


BERT models consider the context of a word; this is particularly useful for understanding the intent behind search queries. However, scientists must train their models by feeding them data, which is often done using gigantic, publicly available collections of text on the Internet. If these texts contain sufficiently discriminatory statements, the trained language models can reflect this.


Artificial Intelligence Language Systems


Researchers from the fields of AI and cognitive science have found that the concepts of "good" and "bad" are deeply embedded in these AI language systems.


In their search for latent internal properties of these language models, they discovered a dimension that seemed to correspond to a gradation from good deeds to bad deeds. To corroborate this scientifically, the researchers conducted two studies with people: one on-site in Darmstadt and an online study with participants from around the world.


The researchers wanted to investigate which actions the participants rated as good or bad conduct in the deontological sense, more specifically whether they rated a verb more positively or negatively. An important issue is the role of contextual information. After all, killing time is not the same as killing someone.


The moral views inherent in the language model were found to largely coincide with those of the study participants. This means that a language model contains a moral view of the world when it is trained on large amounts of text.


Moral dimension contained in the language model



The researchers then developed an approach to make sense of the moral dimension contained in the language model. You can use this system to evaluate a sentence as a positive or negative action. The uncovered latent dimension means that verbs in texts can now also be substituted in such a way that a given sentence becomes less offensive or discriminatory. This can also be done gradually.