EU AI Act & Copyright: 5 Insights Every AI Builder Must Know in 2026 (Before the Lawyers Do)
The GPAI rules have been live since August 2025. The TDM myth is dead. The Memorization trap is real. And the NYT vs. OpenAI shock has changed the game. Here is your no-fluff compliance roadmap.
Let's be honest: how many of you thought the EU AI Act was just another Brussels paper tiger? The GPAI rules have been live since August 2, 2025. If you're not acting now, you're falling behind. In HR and IT strategy, ignorance is no longer just a risk — it's a career stopper. It's time to be a doer before compliance becomes your nightmare.
Insight #1: The Code of Practice — Your Bridge Until 2027
The EU AI Act is law, but the technical details — the harmonized standards — won't arrive until August 2027. To avoid operating in a legal grey zone until then, we have the Code of Practice (CoP) under Article 56. This is your roadmap for the transition period.
Doer's Tip: Models under a Free and Open-Source license are exempt from some obligations (Art. 53 para. 1 a & b) — as long as they don't pose systemic risk. This saves you massive admin overhead while keeping you audit-proof.
Insight #2: The TDM Myth is Dead — Training is NOT "Data Mining"
I keep hearing: "Relax, it falls under the TDM exception!" — Wrong. The Stober/Dornis study draws a clear line here.
| TDM (Text & Data Mining) | Generative AI Training |
|---|---|
| Seeks patterns and correlations (analysis) | Seeks to imitate output (synthesis) |
| Extracts insights from data | Uses the expressive quality of a work |
| Can be covered by TDM exception | Can substitute the original on the market |
| No copyright violation if applied correctly | TDM exception does not legally apply |
"Simply scraping the web without permission is legally dangerous. Confusing synthesis with analysis means you haven't grasped the legal implications. Training GenAI without a license is building on sand."
— Stober/Dornis, Tandem Study 2025Insight #3: The Memorization Trap — When AI "Regurgitates"
AI models don't have memory in the human sense, but they "memorize" training data in their parameters. When they spit this data back out nearly verbatim (regurgitation), you have a legal fire on your hands.
Especially critical are Neural Audio Codecs. Technically, these are learned codebooks. If that codebook was trained on copyrighted music, the model itself contains the protected information. That's a "copy on steroids" — and a massive compliance problem when sharing such models.
Insight #4: The NYT vs. OpenAI Shock — Data is Never Gone
The Preservation Order of May 13, 2025 was an earthquake. OpenAI was ordered to preserve log data from 400 million users. The argument "the user deleted it" doesn't hold in court when it comes to evidence preservation in copyright cases.
Your Doer's Checklist:
Insight #5: Proactive Protection — TDMRep & Tiered Documentation
Stop waiting for the crisis to hit. If you have PDFs or reports online, use the TDM Reservation Protocol (TDMRep). Add it directly to the XMP metadata of your PDF (entry: tdm-reservation: 1). This signals to every crawler: "Hands off — my rights are reserved!"
Conclusion: Get Off the Bench
The legal landscape is complex, yes. But those who set the course now have the skill advantage. The LG Munich I ruling of November 11, 2025 (GEMA vs. OpenAI) showed that courts are serious: storing song lyrics in model parameters was ruled a copyright infringement. The "Fair Use" excuse doesn't fly in Europe.
Are you ready to make your data workflows audit-proof? Or are you blindly trusting that the tech giants will sort it out while your own logs are being legally sealed?
What's your take — are you ready for the audit check, or are you still hoping for the best?