The game of cat and mouse poisoning AI data: this time, IT will win

The IT community has recently panicked AI data poisoning. For some, this is a sneaky mechanism that could serve as a backdoor into company systems by surreptitiously infecting data. major language models (LLM) are trained and then integrated into business systems. For others, it is a way to combat LLMs that attempt to circumvent trademark and copyright protection.

Simply put, these two fears boil down to data poisoning being 1) an attack tool for cyber thieves and cyber terrorists. Or 2) a defense tool allowing artists and companies to protect their intellectual property.

In reality, AI data poisoning isn’t a big threat in either case, but IT people really like to panic.

It is the defense tactic that is we get a lot of attention these dayswith people downloading a pair of free apps from the University of Chicago called Black nightshade And icing.

These types of defensive data poisoning applications work by manipulating the targeted file to fool the LLM training function. With Nightshade, it typically manipulates code around an image. The image may be a desert scene with cacti (or cacti, if you want to understand all the Latin), but the labeling is changed to indicate that it is an ocean with waves. The idea is that someone asks the LLM for ocean images, the modified image will appear. But since it’s clearly a desert scene, it will be rejected.

Glaze acts more directly on the image, essentially blurring it to make it less desirable. Regardless, the goal is to reduce the likelihood that the protected image will be used via LLM.

This technique, although imaginative, probably won’t work for long. It won’t be long before LLMs learn to understand these defensive techniques.

“To protect your works, you have to deface them,” said George Chedzhemov, a cybersecurity strategist at data firm BigID. “I’m going to bet that companies with billion-dollar systems and workloads will have a better chance of coming out on top in this cat-and-mouse game. In the long run, I just don’t think it’s effective.”

The offensive technique is potentially the most worrying, but it is also unlikely to be effective, even in the short term.

The offensive technique works in two ways. First, it attempts to target a specific company by making educated guesses about the type of sites and materials they would like to train their LLM with. The attackers then target not this particular company, but the many places where it is likely to go for training. If the target is, say, Nike or Adidas, attackers could attempt to poison the databases of various university athletic departments housing high-profile sports teams. If the target was Citi or Chase, the bad guys could target the databases of key Federal Reserve sites.

The problem is that this plan of attack could easily be thwarted by both sides. University sites could detect and block manipulation efforts. For the attack to work, the inserted data would likely need to include malware executables, which are relatively easy to detect.

Even if the threat actors’ goal were simply to introduce incorrect data into target systems – which, in theory, would make them parse erroneously – most LLM courses absorb such a large number of data sets that it The attack is unlikely to work properly.

“The planted code would end up being extremely diluted. Only a tiny amount of malicious code would likely survive,” Chedzhemov said.

The other malicious AI data poisoning tactic amounts to a spray-and-pray mechanism. Instead of targeting a specific company, malicious actors would attempt to infect a massive number of sites hoping that the malware would somehow end up at a company with valuable data to steal.

“We would have to contaminate tens of thousands of sites everywhere,” Chedjemov said. “And then they have to hope that the LLM model will somehow focus on one of them.”

Chedjemov argued that the only viable approach would be to “choose an extremely esoteric area for which there is not much, something very specialized.”

The tech industry is familiar with these countermeasures, and they rarely work for long, if ever. Think about antivirus programs that released definitions and then the bad guys changed the technique. Then, audiovisual players looked for patterns rather than specific definitions, and so on. Or think about search engine robots and their battles with robot.txt scripts that told them to go away. Or Youtube against ad blockers.

LLM data poisoning is a problem that IT needs to be aware of and guard against. But in this competition, I think computer science has almost all the advantages. How refreshing and rare.

Latest News

Anadolu Ajansi, Turkey’s communications director, calls for global unity in developing ethical frameworks for AI. “We must ensure that AI systems are designed and implemented in a way that prioritizes fairness, inclusiveness and accountability,” says…

AI Crypto Market Cap Rises Over 25% Amid Major Developments in the Sector

Developing ethical use of AI in primary and secondary education

Trends for 2025 and beyond

New urgent Gmail security warning for billions as attacks continue

AI and Cybersecurity Industry in Middle East and Africa to See Tremendous Success

AI, 5G and Quantum: risks linked to innovation and cybersecurity

Trends for 2025 and beyond

New urgent Gmail security warning for billions as attacks continue

AI and Cybersecurity Industry in Middle East and Africa to See Tremendous Success

AI, 5G and Quantum: risks linked to innovation and cybersecurity

AI Crypto Market Cap Rises Over 25% Amid Major Developments in the Sector

Smarter AI agents are boosting the digital entertainment industry

ChatGPT and AI tools gain ground in the search market

AI is great, but agencies need to remember that in 2025 they will be in marketing

AI Crypto Market Cap Rises Over 25% Amid Major Developments in the Sector

Smarter AI agents are boosting the digital entertainment industry

ChatGPT and AI tools gain ground in the search market

AI is great, but agencies need to remember that in 2025 they will be in marketing

Sriram Krishnan, Donald Trump’s AI chief, fights to remove country caps on green cards; here’s why it’s good news for Indians

7 Google AI announcements from October

Instagram concerned about challenge of distinguishing real images from AI-generated images, Apple to launch foldable iPhone by 2026 and beyond: Consumer Tech News (Dec. 16-20) – Apple (NASDAQ: AAPL), Amazon.com (NASDAQ:AMZN)

AI is bad news for the Global South

Sriram Krishnan, Donald Trump’s AI chief, fights to remove country caps on green cards; here’s why it’s good news for Indians

7 Google AI announcements from October

Instagram concerned about challenge of distinguishing real images from AI-generated images, Apple to launch foldable iPhone by 2026 and beyond: Consumer Tech News (Dec. 16-20) – Apple (NASDAQ: AAPL), Amazon.com (NASDAQ:AMZN)

AI is bad news for the Global South

Machine learning at the Flatiron Institute

Exploring the Power of AI and ML in Smart Grids: Advances, Applications and Challenges

Unsupervised ML 17 — Future Trends in Unsupervised Machine Learning: What’s Next? | by Ayşe Kübra Kuyucu | December 2024

FrontiersMachine learning applications in search of life beyond EarthMachine learning (ML) and artificial intelligence (AI) have moved beyond niche applications to become transformative and essential tools for analyzing data….2 days

Machine learning at the Flatiron Institute

Exploring the Power of AI and ML in Smart Grids: Advances, Applications and Challenges

Unsupervised ML 17 — Future Trends in Unsupervised Machine Learning: What’s Next? | by Ayşe Kübra Kuyucu | December 2024

FrontiersMachine learning applications in search of life beyond EarthMachine learning (ML) and artificial intelligence (AI) have moved beyond niche applications to become transformative and essential tools for analyzing data….2 days

Latest News

Subscribe to Updates

The game of cat and mouse poisoning AI data: this time, IT will win

Subscribe to Updates