Skip to main content
Back to Blog

EU AI Act & Website Compliance: What Publishers Need to Know

Rob, CEO & Founder5 min read

The EU AI Act Is Here — And It Affects Your Website

The EU AI Act entered into force on 1 August 2024 and becomes fully applicable on 2 August 2026. While most coverage focuses on high-risk AI systems, the Act contains specific obligations that directly impact publishers, website owners, and anyone whose content is consumed by AI crawlers.

If you run a website in the EU — or serve EU audiences — these rules apply to you, both as a potential deployer of AI tools and as a content creator whose work feeds AI models.

Key Deadlines That Matter

The AI Act rolls out in phases:

  • February 2025: Prohibited AI practices banned
  • August 2025: GPAI model transparency rules active
  • August 2026: Full applicability — transparency obligations under Article 50, conformity assessments, and the Code of Practice on AI-generated content

The Code of Practice on Transparency of AI-Generated Content is expected in its final form by June 2026, providing practical guidance for implementation.

What the AI Act Says About Crawlers

Article 53(1)(c) requires providers of General-Purpose AI (GPAI) models to implement copyright compliance policies. In practice, this means AI companies must:

  • Respect robots.txt — GPAI providers must use crawlers that follow the Robot Exclusion Protocol
  • Recognize opt-out signals — Beyond robots.txt, crawlers must recognize metadata and other machine-readable signals indicating rights reservations
  • Publish crawler information — AI companies must be transparent about which crawlers they operate and notify rightsholders of updates
  • Provide training data summaries — Sufficiently detailed summaries of content used for training

This is a significant shift. For the first time, a major regulatory framework gives legal weight to robots.txt and similar technical standards that publishers use to control crawler access.

What This Means for Website Owners

Before the AI Act, robots.txt was a gentleman's agreement — crawlers could ignore it without legal consequence. Under the new framework, GPAI providers who ignore your robots.txt directives face potential enforcement by the EU AI Office.

This means your robots.txt configuration is no longer just a technical preference — it's a legal declaration of your content access policy.

You Need Visibility Into AI Crawler Traffic

You can't exercise your rights if you don't know who's crawling your site. The AI Act creates a framework where publishers can hold AI companies accountable, but only if you can demonstrate:

  • Which AI crawlers are accessing your content
  • How frequently they visit
  • What content they prioritize
  • Whether they respect your access rules

Tools like HumanKey provide real-time dashboards showing exactly which AI bots crawl your pages, giving you the data you need to enforce your rights under the AI Act.

Content Marking Becomes Important

Article 50 introduces transparency obligations for AI-generated content. Providers must ensure AI-generated outputs are marked in machine-readable formats. For publishers, this creates both an obligation (if you use AI in content creation) and an opportunity (to distinguish your human-created content as premium).

Penalties Are Serious

The AI Act's enforcement framework includes substantial fines:

  • Up to 40 million EUR or 7% of worldwide turnover for prohibited practices
  • Up to 20 million EUR or 4% of worldwide turnover for transparency and data governance violations
  • Up to 10 million EUR or 2% of worldwide turnover for other non-compliance

These penalties apply to AI providers who fail to respect copyright and transparency obligations — giving publishers real leverage in enforcement discussions.

How to Prepare Your Website

1. Audit Your robots.txt

Review your robots.txt to ensure it accurately reflects your content access policy for AI crawlers. Consider separate directives for different AI bots:

# Allow search indexing
User-agent: Googlebot
Allow: /

# Allow AI crawling with monitoring
User-agent: GPTBot
Allow: /blog/
Disallow: /premium/

# Block specific AI crawlers
User-agent: CCBot
Disallow: /

2. Implement AI Traffic Monitoring

Deploy analytics that specifically track AI crawler activity. You need historical data showing crawler patterns to support any compliance claims or licensing negotiations.

3. Add Machine-Readable Rights Signals

Beyond robots.txt, consider adding structured data and metadata that clearly communicates your content licensing terms. The AI Act requires crawlers to recognize these signals.

4. Document Your Content Policies

Create a clear, public-facing AI content usage policy. This helps AI companies understand your terms and provides documentation if compliance disputes arise.

The Bigger Picture

The EU AI Act is the first comprehensive AI regulation globally. Other jurisdictions are watching closely — similar frameworks are being discussed in the UK, Canada, and other markets.

For publishers, this creates a window of opportunity. The legal framework now supports your right to control how AI systems use your content. But exercising that right requires data — specifically, detailed knowledge of which AI crawlers visit your site and what they access.

The publishers who build this monitoring infrastructure now will be best positioned when enforcement begins in August 2026.


Start monitoring AI crawler activity on your site today. Create a free HumanKey account — track 50+ AI bots with real-time dashboards.

Know Your AI Traffic

Start tracking AI crawlers visiting your website today. Free for up to 1,000 verifications per month.

Start Free Trial
EU AI Act & Website Compliance: What Publishers Need to Know | HumanKey Blog