AI Law · KW21 · English

EU AI Act & Copyright: 5 Insights Every AI Builder Must Know in 2026 (Before the Lawyers Do)

The GPAI rules have been live since August 2025. The TDM myth is dead. The Memorization trap is real. And the NYT vs. OpenAI shock has changed the game. Here is your no-fluff compliance roadmap.

Published May 20, 2026 Location Houston, TX Read time 9 minutes Topics EU AI Act, GPAI, Copyright, TDM, GEMA, OpenAI, Compliance

Let's be honest: how many of you thought the EU AI Act was just another Brussels paper tiger? The GPAI rules have been live since August 2, 2025. If you're not acting now, you're falling behind. In HR and IT strategy, ignorance is no longer just a risk — it's a career stopper. It's time to be a doer before compliance becomes your nightmare.

Insight #1: The Code of Practice — Your Bridge Until 2027

The EU AI Act is law, but the technical details — the harmonized standards — won't arrive until August 2027. To avoid operating in a legal grey zone until then, we have the Code of Practice (CoP) under Article 56. This is your roadmap for the transition period.

T Transparency: Documentation is mandatory. Who trained what, how, and on what data?

C Copyright: Compliance with EU law must be demonstrable, not just assumed.

S Safety & Security: Only for the heavyweights with systemic risk (compute above 10^25 FLOP).

Doer's Tip: Models under a Free and Open-Source license are exempt from some obligations (Art. 53 para. 1 a & b) — as long as they don't pose systemic risk. This saves you massive admin overhead while keeping you audit-proof.

Insight #2: The TDM Myth is Dead — Training is NOT "Data Mining"

TDM Analysis vs GenAI Synthesis: the critical legal distinction

TDM analyzes patterns. GenAI imitates expressive quality. That is the critical legal distinction most builders still haven't grasped.

I keep hearing: "Relax, it falls under the TDM exception!" — Wrong. The Stober/Dornis study draws a clear line here.

TDM (Text & Data Mining)	Generative AI Training
Seeks patterns and correlations (analysis)	Seeks to imitate output (synthesis)
Extracts insights from data	Uses the expressive quality of a work
Can be covered by TDM exception	Can substitute the original on the market
No copyright violation if applied correctly	TDM exception does not legally apply

"Simply scraping the web without permission is legally dangerous. Confusing synthesis with analysis means you haven't grasped the legal implications. Training GenAI without a license is building on sand."

— Stober/Dornis, Tandem Study 2025

Insight #3: The Memorization Trap — When AI "Regurgitates"

AI models don't have memory in the human sense, but they "memorize" training data in their parameters. When they spit this data back out nearly verbatim (regurgitation), you have a legal fire on your hands.

2-5x

more memorization in larger models

400M

user logs preserved in NYT vs. OpenAI

Nov 11

2025: GEMA ruling against OpenAI (LG Munich I)

Especially critical are Neural Audio Codecs. Technically, these are learned codebooks. If that codebook was trained on copyrighted music, the model itself contains the protected information. That's a "copy on steroids" — and a massive compliance problem when sharing such models.

Insight #4: The NYT vs. OpenAI Shock — Data is Never Gone

GEMA vs OpenAI: copyright liability for AI models

The LG Munich I has made it clear: storing song lyrics in model parameters constitutes copyright infringement. The "Fair Use" excuse doesn't fly in Europe.

The Preservation Order of May 13, 2025 was an earthquake. OpenAI was ordered to preserve log data from 400 million users. The argument "the user deleted it" doesn't hold in court when it comes to evidence preservation in copyright cases.

Your Doer's Checklist:

Z Zero Data Retention (ZDR) Agreements: Your most important tool. Negotiate with API providers to ensure prompts are never stored in the first place.

A Account Check: ChatGPT Enterprise and Edu customers are currently not affected by the order. Standard API users without ZDR agreements? Your logs are being preserved right now.

P Proactive Data Tagging: Tag your data internally so you immediately know what went where in case of an audit.

Insight #5: Proactive Protection — TDMRep & Tiered Documentation

Tiered Compliance Documentation: Basic, Intermediate, Advanced

The tiered approach to compliance documentation: Basic, Intermediate, and Advanced. Build the infrastructure now and save yourself in two years.

Stop waiting for the crisis to hit. If you have PDFs or reports online, use the TDM Reservation Protocol (TDMRep). Add it directly to the XMP metadata of your PDF (entry: tdm-reservation: 1). This signals to every crawler: "Hands off — my rights are reserved!"

01 Basic (Traceability): Source URLs and timestamps. The absolute minimum.

02 Intermediate (Identification): License status and technical specs. For audio: ISRC or ISWC codes.

03 Advanced (Attribution): Deploy MIR tools and active content matching via AcoustID / MusicBrainz. Copyrighted DNA without a license? Delete it.

Conclusion: Get Off the Bench

The legal landscape is complex, yes. But those who set the course now have the skill advantage. The LG Munich I ruling of November 11, 2025 (GEMA vs. OpenAI) showed that courts are serious: storing song lyrics in model parameters was ruled a copyright infringement. The "Fair Use" excuse doesn't fly in Europe.

Are you ready to make your data workflows audit-proof? Or are you blindly trusting that the tech giants will sort it out while your own logs are being legally sealed?

What's your take — are you ready for the audit check, or are you still hoping for the best?

Compliance Copyright EU AI Act GEMA GPAI KW21 OpenAI TDM

🕐