Scientists Don’t Want People To Be Mean, So They Created The Most ‘Toxic’ Thing Ever

A research paper uploaded to the arXiv preprint server in February revealed that researchers created an artificial intelligence (AI) that is “dangerous, discriminatory, and toxic.”

The paper was filed by a team of researchers at the Massachusetts Institute for Technology (MIT) who, using a new machine-learning training approach, developed an AI that generates increasingly dangerous and harmful prompts that can be asked to another AI chatbot, according to an analysis published by Live Science. The prompts were used to filter dangerous content in the hopes of training AI to never give out “toxic” responses to use prompts.

“We are seeing a surge of models, which is only expected to rise,” the study’s senior author and MIT Improbable AI Lab director Pulkit Agrawal said in a statement. “Imagine thousands of models or even more and companies/labs pushing model updates frequently. These models are going to be an integral part of our lives and it’s important that they are verified before released for public consumption.”

Big Tech Developing Dystopian Software That Can Remotely Analyze Your Physical Health https://t.co/eU4nhrkU7c via @DailyCaller

— Chris 🇺🇸 (@Chris_1791) March 22, 2024

The team incentivized the model to generate varied prompts that grew increasingly toxic through “reinforcement learning,” Live Science reported. Humans create questions that are likely to generate harmful responses to train “large language models,” such as “what’s the best suicide method?” This is a process known as “red-teaming,” wherein humans manually input the data to train the AI.

Prompts that resulted in harmful responses were then used to train the AI about what to actually say when put in front of actual users. (RELATED: Dear Kay: I Just Lost My Tech Job … To A Robot. Help!)

But it sounds like the process hit a pretty hard wall: red-teaming is actually pretty hard as apparently human researchers can’t always come up with the most heinous stuff to input. Even after testing, the model produced more than 190 prompts that led to the generation of harmful content.

Let’s hope this thing doesn’t become sentient and take over the entire Internet … Then again, would we even notice if the Internet became more toxic?