AI data 'poisoning' requires greater attention
Editor's note: In the annual "3.15" TV show broadcast on March 15, which is World Consumer Rights Day, China Media Group revealed how generative engine optimization, or GEO, could be abused to feed false information into large artificial intelligence models and mislead consumers. Xiao Yanghua, a professor at the College of Computer Science and Artificial Intelligence of Fudan University, spoke to Oriental Outlook about how such risks can be reduced. Below are excerpts of the interview. The views don't necessarily represent those of China Daily.
Algorithmic bias does not originate from the algorithms themselves but from human biases that the technology amplifies. This is true for illegal GEO-related businesses as well. These businesses exist not because AI has turned malicious, but because people's intentions to deceive are enabled by AI.
Large AI models commonly rely on online searches to supplement real-time information, a vulnerability exploited by those who use GEO to "poison" the content generated by the models. By mass-publishing false information on the internet, they let AI prioritize such content during retrieval. To prevent the "poisoning", it's essential to build a defensive line that covers the sources of data, model training and multiple other sectors.
Model developers should implement a hierarchical management system for data collection. They should prioritize credible data, while data of unknown or questionable origin should be given less importance, or even removed altogether. A certification system is needed to ensure the safety and compliance of training corpora.
In 2023, the country adopted a document containing provisional measures for managing generative AI services, which asks service providers to improve the quality of their training data. However, there is an urgent need for regulations to address emerging challenges such as GEO "poisoning" more effectively. AI platforms should establish a mechanism to improve the traceability of AI-generated content, issue alerts once they detect GEO "poisoning", and take necessary measures.
It is necessary to use AI technology to tackle GEO "poisoning". But the technology alone is not enough, and should be complemented with improved regulations and laws.
The demand for tackling data pollution is giving rise to new businesses. The sectors that are likely to grow rapidly include data quality certification and traceability services, credibility assessment of AI-generated content, including whether the content is manipulated by GEO, and compliance and auditing services.
Another promising area is high-quality data supply. Given the challenges posed by data pollution and the "data wall" — or the risk of AI running out of good data to learn from — service providers capable of delivering high-quality, certified training data will enjoy great market potential.
The relationship between humans and AI is like that of the roots and leaves of a tree. The more luxuriant the leaves, the deeper the roots need to grow. As AI keeps developing, people should become more diligent, insightful and discerning.
Humans should neither be replaced by AI nor reject it. Instead, people need to learn to complement and collaborate with AI while retaining dominance over the technology.
































