ICML Rejects 497 Papers After Catching Reviewers Using AI Through Hidden Watermarks

The International Conference on Machine Learning (ICML) just caught researchers red-handed using AI to write their peer reviews - and punished them by rejecting their own papers.

ICML 2026, scheduled for Seoul in July, desk-rejected 497 papers after detecting that their authors had violated the conference’s policy against using large language models to review other researchers’ work. That’s roughly 2% of all submissions thrown out for what amounts to academic dishonesty.

The detection method was clever: organizers hid instructions inside PDF watermarks that would only be triggered if the document was fed to an LLM.

How the Trap Worked

The watermarking technique used a dictionary of 170,000 phrases. For each paper, the system selected two random phrases - a combination so unique the probability of collision was less than 1 in 10 billion.

These phrases were embedded as invisible instructions in the PDFs distributed to reviewers. If a reviewer copied the paper into an LLM to generate their review, the model would dutifully follow the hidden instructions and include the telltale phrases in its output.

The success rate for detection exceeded 80%. Every flagged review underwent human verification before any action was taken.

The Numbers

The final count was damning:

795 reviews (~1% of all reviews) violated the no-LLM policy
506 unique reviewers were caught despite explicitly agreeing not to use AI
51 reviewers (10% of violators) had more than half their reviews flagged
497 papers were desk-rejected from 398 researchers who broke the rules

ICML operates on a reciprocal review system - if you submit a paper, you’re expected to review others. When organizers caught authors using LLMs to write reviews they’d agreed to write themselves, they responded by rejecting the authors’ own submissions.

The Two-Policy System

ICML 2026 actually gave reviewers a choice. During registration, they could select between two policies:

Policy A (Conservative): No LLM use permitted at any stage of reviewing. Period.

Policy B (Permissive): LLMs allowed to help understand papers and polish review text.

The key word there is “explicitly agreed.” These weren’t researchers who accidentally ran afoul of an ambiguous rule. They checked a box saying they wouldn’t use AI, then immediately used AI.

Why This Matters

The irony of an AI conference punishing AI use isn’t lost on anyone. But the issue isn’t whether AI tools are useful - it’s about trust.

Peer review depends on human experts actually engaging with the work they’re evaluating. When a reviewer feeds a paper into ChatGPT and submits whatever comes out, they’re not reviewing - they’re laundering a decision through an automated system that doesn’t understand scientific context, can’t verify claims, and has no stake in the field’s integrity.

The ICML organizers put it plainly: “Uniform enforcement of LLM policies is essential to preserving the integrity and fairness of the peer-review process for all participants.”

The Reaction

Response to ICML’s enforcement has been mixed. Many researchers on X applauded the move, with some suggesting other conferences adopt similar measures. A few even called for banning the caught reviewers from future submissions entirely.

Not everyone agreed. Zhengzhong Tu, a computer scientist at Texas A&M University, told Nature the policy would backfire: “It will only demotivate all the reviewers. They will avoid routes banning AI use and will use LLMs to generate meaningless reviews.”

What This Signals

ICML’s enforcement is the most aggressive action any major conference has taken against AI-assisted peer review. It signals that academic institutions are willing to impose real consequences - lost publication opportunities, public embarrassment - on researchers who cut corners.

The technique is also replicable. Other conferences could implement similar watermarking systems without much difficulty. If ICML’s approach spreads, researchers gaming peer review with AI tools will face an increasingly hostile detection environment.

For now, 497 researchers learned the hard way that agreeing to rules and actually following them are different things - and that the AI community is watching itself more carefully than you might expect.