Grok’s ‘white genocide’ responses show how generative AI can be weaponized

Grok's 'white genocide' responses show how generative AI can be ...

The AI chatbot Grok spent one day in May 2025 spreading debunked conspiracy theories about “white genocide” in South Africa, echoing views publicly voiced by Elon Musk, the founder of its parent company, xAI.

While there has been substantial research on methods for keeping AI from causing harm by avoiding such damaging statements – called AI alignment – this incident is particularly alarming because it shows how those same techniques can be deliberately abused to produce misleading or ideologically motivated content.

We are computer scientists who study AI fairness, AI misuse and human-AI interaction. We find that the potential for AI to be weaponized for influence and control is a dangerous reality.

The Grok incident

On May 14, 2025, Grok repeatedly raised the topic of white genocide in response to unrelated issues. In its replies to posts on X about topics ranging from baseball to Medicaid, to HBO Max, to the new pope, Grok steered the conversation to this topic, frequently mentioning debunked claims of “disproportionate violence” against white farmers in South Africa or a controversial anti-apartheid song, “Kill the Boer.”

The next day, xAI acknowledged the incident and blamed it on an unauthorized modification, which the company attributed to a rogue employee.

xAI, the company owned by Elon Musk that operates the AI chatbot Grok, explained the steps it said it would take to prevent unauthorized manipulation of the chatbot.

AI chatbots and AI alignment

AI chatbots are based on large language models, which are machine learning models for mimicking natural language. Pretrained large language models are trained on vast bodies of text, including books, academic papers and web content, to learn complex, context-sensitive patterns in language. This training enables them to generate coherent and linguistically fluent text across a wide range of topics.

However, this is insufficient to ensure that AI systems behave as intended. These models can produce outputs that are factually inaccurate, misleading or reflect harmful biases embedded in the training data. In some cases, they may also generate toxic or offensive content. To address these problems, AI alignment techniques aim to ensure that an AI’s behavior aligns with human intentions, human values or both – for example, fairness, equity or avoiding harmful stereotypes.

There are several common large language model alignment techniques. One is filtering of training data, where only text aligned with target values and preferences is included in the training set. Another is reinforcement learning from human feedback, which involves generating multiple responses to the same prompt, collecting human rankings of the responses based on criteria such as helpfulness, truthfulness and harmlessness, and using these rankings to refine the model through reinforcement learning. A third is system prompts,…

Access the original article

Subscribe
Don't miss the best news ! Subscribe to our free newsletter :