Navigating Japan's Copyright Labyrinth: Is Your Use of Generative AI Compliant?

The rapid proliferation of generative Artificial Intelligence (AI) has opened up unprecedented opportunities for businesses worldwide. From content creation and software development to research and customer engagement, AI tools promise transformative efficiencies. However, this technological surge also brings complex legal questions, particularly in the realm of copyright law. For US companies leveraging generative AI in connection with their Japanese operations, or using AI tools developed or trained in Japan, understanding the nuances of Japanese copyright law is crucial. This article delves into the key considerations under Japanese law concerning AI model training and the use of AI-generated outputs.

Copyright issues related to generative AI in Japan primarily arise in two distinct phases:

  1. The AI Development and Learning Phase: This involves the collection and use of vast amounts of data, often including copyrighted works (text, images, code, etc.), to train AI models. The act of copying these works into a database for machine learning purposes can constitute copyright infringement unless specific exceptions apply.
  2. The AI Generation and Utilization Phase: This concerns the outputs produced by the AI (e.g., an AI-generated image, article, or piece of software code) and whether these outputs infringe existing copyrighted works.

Navigating these two phases requires a careful understanding of Japan's Copyright Act.

The development of sophisticated generative AI models hinges on their ability to learn from massive datasets. When these datasets include copyrighted materials, the act of reproduction for training purposes directly implicates copyright.

The General Principle: Reproduction Requires Permission

Under Japanese copyright law, like in most jurisdictions, the act of reproducing a copyrighted work (e.g., copying it into a database for AI training) generally requires authorization from the copyright holder. Unauthorized reproduction constitutes copyright infringement. However, the Japanese Copyright Act contains certain limitations on copyright that can permit such uses under specific conditions.

The "Non-Enjoyment" Exception: Article 30-4

A key provision in Japan that has garnered significant attention in the context of AI is Article 30-4 of the Copyright Act. Enacted in 2018 as part of broader amendments to accommodate data analysis and technological advancements, Article 30-4 allows for the use of copyrighted works without the copyright holder's permission if the use is not for the purpose of enjoying the thoughts or sentiments expressed in the work (this is often referred to as "non-enjoyment purpose" or 非享受目的 - hi-kyōju mokuteki).

This provision generally permits the reproduction and adaptation of copyrighted works to the extent necessary for purposes such as:

  • Use in information analysis (情報解析 - jōhō kaiseki), which includes AI model training.
  • Use for technological development or practical application tests.
  • Other uses not aimed at the user personally enjoying, or allowing others to enjoy, the creative expression of the work.

The rationale is that such uses do not typically harm the primary markets for which the copyrighted works were created (e.g., reading a novel, viewing a film, listening to music). The Agency for Cultural Affairs (文化庁 - Bunka-chō) clarified in its March 2024 "Regarding a Viewpoint on AI and Copyright" (AIと著作権に関する考え方について) that "enjoyment" refers to acts aimed at satisfying intellectual or mental desires through viewing, reading, or listening to the work. Therefore, merely collecting and analyzing copyrighted works as data to train an AI model is generally considered a non-enjoyment purpose and thus permissible under Article 30-4, even if the ultimate goal of the AI development is commercial.

The Critical Proviso to Article 30-4

However, Article 30-4 contains a crucial limitation: the exception does not apply if the use "would unreasonably prejudice the interests of the copyright owner" (当該著作物の種類及び用途並びに当該利用の態様に照らし著作権者の利益を不当に害することとなる場合は、この限りでない). This proviso is the focal point of much current debate in Japan.

What constitutes "unreasonably prejudicing the interests of the copyright owner" is not explicitly defined and is subject to interpretation based on the type and use of the work, and the manner of its utilization. Legal experts and official bodies are grappling with its application to AI.

  • Narrow Interpretation: Some legal scholars argue this proviso should be interpreted narrowly. For instance, it might apply if AI training activities lead to the training data itself (containing original works) being made widely available for enjoyment, or if the training is specifically designed to create an AI that generates outputs virtually identical to, and directly competing with, a very specific, limited set of copyrighted works. The March 2024 Viewpoint from the Agency for Cultural Affairs suggests that if an AI is trained on a small number of works by a specific creator to generate outputs in that creator's distinct style, it could potentially trigger the proviso, especially if it significantly impacts the market for the original works.
  • Broader Concerns: Other discussions revolve around whether the proviso could be triggered by:
    • The use of pirated or illegally sourced works as training data. While Article 30-4 itself doesn't explicitly differentiate based on the legality of the source, using illicit data could be argued as unreasonably prejudicial.
    • Situations where a well-established licensing market for AI training data for a particular type of work already exists or is emerging. The argument here is that free use under Article 30-4 could undermine such markets. The Agency for Cultural Affairs has indicated that if a database is compiled and sold specifically for information analysis purposes, then using that database for AI training without a license could fall under the proviso.
    • Training AI models with the intent to generate content that closely mimics and saturates the market for specific styles or types of works, thereby diminishing the value of human-created works in those niches.

The Agency for Cultural Affairs' March 2024 Viewpoint also notes that if the generation of infringing outputs (i.e., dead copies or close imitations of training data) occurs frequently and systematically, it might be a factor suggesting that the AI training itself was intended to facilitate such infringing outputs, potentially implicating the proviso or suggesting the initial "non-enjoyment" purpose was not genuine or was mixed with an "enjoyment" purpose.

Coexistence of "Enjoyment Purpose"

A crucial point highlighted by the Agency for Cultural Affairs is that if the AI training activity has a coexisting "enjoyment purpose," Article 30-4 does not apply. For example, if an AI developer uses copyrighted images not only to train the AI's pattern recognition but also displays these images within the AI service for users to appreciate, this could be seen as having a dual purpose, one of which is "enjoyment." In such cases, the activity would fall outside the scope of Article 30-4.

The More Limited Scope of Article 47-5

If Article 30-4 is deemed inapplicable (for instance, due to a coexisting enjoyment purpose or the proviso being triggered), another provision, Article 47-5, might be considered. This article allows for "minor uses" of copyrighted works incidental to information analysis services, provided the works are publicly available. However, its requirements are generally considered too restrictive for the broad-scale data ingestion typical of foundational generative AI model training. The use must be "minor," and the output of the original work must be "incidental" to the provision of information analysis results. Generative AI, where the generated content is the primary output, usually doesn't fit these criteria.

Implications for Businesses Training AI Models

For businesses involved in training AI models using data that might be subject to Japanese copyright law:

  • Reliance on Article 30-4 is plausible for "non-enjoyment" training activities.
  • Careful consideration must be given to the proviso:
    • Assess the nature of the data being used – is it from legitimate sources?
    • Is there an established market for licensing this type of data for AI training in Japan?
    • Is the AI being trained in a way that is likely to lead to outputs that directly and unfairly compete with the training data in its original market?
  • Documenting the purpose and methodology of data collection and training can be important.
  • If there's a significant risk of the proviso applying, or if an "enjoyment" purpose is present, licensing the data is the safest approach.

Once an AI model is trained, the content it generates can also raise copyright infringement issues if it is substantially similar to existing copyrighted works and was "based on" those works. Japanese law, like US law, looks at two main elements: "reliance" (依拠性 - ikyosei) and "similarity" (類似性 - ruijisei).

The "Reliance" (依拠性 - Ikyosei) Requirement

For an AI-generated work to infringe an existing copyrighted work, it must have been created in "reliance" on that existing work. This means the creator of the allegedly infringing work must have had access to and actually drawn upon the original copyrighted work.

  • Traditional Context: In cases of human creation, reliance is often proven by showing the defendant had access to the plaintiff's work and that similarities exist which are unlikely to be coincidental.
  • AI Context: Proving reliance with AI-generated content presents unique challenges.
    • If a copyrighted work was part of the AI's training data, does this automatically satisfy the reliance requirement if the AI generates something similar? This is a point of ongoing discussion. Some argue that if the AI "learned" from the work, an objective form of reliance might be established.
    • Others focus on the AI user's intent. If the user specifically prompted the AI to create something in the style of, or based on, a particular copyrighted work they knew, reliance might be easier to establish against the user.
    • The March 2024 Viewpoint from the Agency for Cultural Affairs suggests that if the AI user was aware of an existing work and used prompts intended to reproduce or imitate it, reliance on the part of the user could be found. If the user was unaware, but the AI model had been trained on the work and the output is similar, the situation is more complex. The document does not provide a definitive answer but indicates that the specifics of how the AI was developed and used would be relevant.
    • Simply because a work was in the training data does not automatically mean every output is "reliant" on every piece of training data in a legally significant way for infringement of a specific work.

The "Similarity" (類似性 - Ruijisei) Requirement

Even if reliance is established, copyright infringement only occurs if the AI-generated output is "similar" to the existing copyrighted work.

  • Creative Expression, Not Ideas: Japanese copyright law protects the creative expression of thoughts or sentiments, not the underlying ideas, facts, styles, or concepts (this is the idea-expression dichotomy, similar to US law). For infringement, the AI output must be similar to the original, creative parts of the existing work. Generating a work in the same "style" as a famous artist, for example, is generally not copyright infringement if the specific creative expression of the original works is not copied.
  • Substantial Similarity: The similarity must be substantial. Minor or trivial similarities are not enough. The test is whether the essential features of the original work's creative expression are found in the allegedly infringing work. This is a qualitative assessment made on a case-by-case basis.
  • Application to AI Outputs: Applying this to AI is fact-intensive.
    • How much of an existing work's expression needs to be present in an AI output to be deemed "similar"?
    • If an AI generates an image that incorporates distinct, creative elements from a copyrighted photograph or illustration, it could be infringing.
    • The March 2024 Viewpoint reiterates this traditional approach, stating that the judgment of similarity and reliance for AI-generated works is made in the same way as for human-created works.

Authorship and AI-Generated Works

A related question is whether AI-generated content can itself be copyrighted. Under Japanese law (Article 2(1)(i) of the Copyright Act), a "work" is a production in which "thoughts or sentiments are expressed in a creative way." The prevailing view, supported by the Agency for Cultural Affairs, is that if an AI generates content autonomously without creative human intervention in the generation process, the output is not a "work" under copyright law and thus has no author and no copyright protection.

However, if a human provides specific, creative instructions or makes significant creative modifications to the AI's output, the resulting work (or the human contribution to it) may be eligible for copyright protection, with the human as the author. This is crucial because if an AI output is deemed an infringing copy, the lack of its own copyright status doesn't negate the infringement of the pre-existing work.

Who is Liable for Infringement by AI Output?

If an AI-generated output is found to infringe copyright, the question of who is liable arises. Potential parties include:

  • The AI Developer/Provider: Liability could attach if they designed the AI in such a way that it is inherently likely to produce infringing content, or if they had specific knowledge and control over the generation of infringing outputs.
  • The AI User: The user who prompted the AI to generate the infringing content and then used it (e.g., published, sold, or distributed it) is often seen as a direct infringer. Their knowledge or intent regarding the original work can be key, especially for the "reliance" element.

Japanese law will likely assess liability based on who committed the infringing act (e.g., reproduction, public transmission) and their degree of fault or control over that act.

Recent Developments and Official Views in Japan

The legal landscape for AI and copyright in Japan is actively evolving.

  • Agency for Cultural Affairs' Viewpoint (March 2024): This is a significant document providing the government's current interpretation of existing copyright law in relation to AI. It emphasizes that Article 30-4 is the primary provision for the AI learning phase, focusing on the "non-enjoyment purpose." For the output phase, traditional tests of "similarity" and "reliance" apply. It also touches upon the importance of considering the user's intent and instructions when assessing infringement by AI outputs.
  • Ongoing Discussions: The Japanese government, through various councils and task forces, continues to study the implications of AI. The "AI Strategy Council" (AI戦略会議) and discussions within the Cabinet Office's Intellectual Property Strategy Headquarters (知的財産戦略本部) are shaping future policy. There is an ongoing effort to balance innovation in AI with the protection of creators' rights.
  • Litigation: While Japan has not yet seen the volume of AI-copyright lawsuits emerging in the US, the potential for such cases is recognized. For instance, there have been some reports, though often concerning international or non-copyright specific AI misuse cases, that highlight the judiciary's initial encounters with AI-related disputes. Businesses should monitor any emerging Japanese case law.

Practical Risk Mitigation Strategies for Businesses

Given the complexities, businesses using or developing generative AI with a nexus to Japan should consider the following:

For AI Development (Training Models):

  1. Data Provenance and Legality: Strive to use training data from legitimate sources. While Article 30-4 may permit use, relying on clearly pirated material could increase the risk of the "unreasonably prejudicial" proviso being invoked.
  2. Understand Article 30-4 and its Proviso: Carefully assess whether the training activities genuinely lack an "enjoyment purpose" and whether they might "unreasonably prejudice" copyright holders' interests, considering the nature of the works and the training methodology.
  3. Consider Licensing Where Appropriate: For datasets known to be sensitive, or where a licensing market exists (e.g., specialized databases created for analysis), or if the AI is being fine-tuned to replicate specific styles very closely, explore licensing options.
  4. Internal Record-Keeping: Maintain records regarding the purpose of AI development, data sources, and training methodologies.

For Using AI-Generated Content:

  1. Assess Similarity and Reliance: Before commercially using AI-generated content, evaluate its similarity to existing works, particularly if specific prompts were used that might point to reliance on known sources.
  2. Human Review and Modification: Implement human review of AI outputs, especially for high-risk uses. Significant creative modification by humans can reduce the risk of similarity and may also create new copyright in the modified work.
  3. Understand AI Tool Terms of Service: Review the terms of service of the AI tools being used regarding ownership of outputs and any indemnification offered (or disclaimed) for copyright infringement.
  4. Be Cautious with Prompts: Avoid prompting AI systems with instructions that are clearly designed to replicate specific copyrighted works. The Agency for Cultural Affairs suggests keeping records of prompts used.
  5. Indemnification and Insurance: Consider contractual indemnification from AI service providers and explore cyber/IP insurance options that may cover AI-related infringement risks.

Conclusion

Japan's approach to copyright and generative AI, particularly its Article 30-4, currently offers a relatively flexible environment for AI model training compared to some other jurisdictions. However, the scope of this flexibility, especially the interpretation of the "unreasonably prejudicial" proviso, is still being fleshed out. For AI-generated outputs, traditional copyright infringement principles of reliance and similarity apply, though their application in the AI context will continue to be tested and clarified.

The legal framework is dynamic, with ongoing discussions and potential for new guidelines or judicial interpretations. Businesses operating in this space must remain vigilant, conduct thorough due diligence, and adopt proactive risk management strategies to navigate Japan's evolving copyright labyrinth successfully.