Using Copyrighted Works for AI and Data Analysis in Japan: What Does the Law Allow?

The rapid advancements in Artificial Intelligence (AI), machine learning, and big data analytics have opened up unprecedented opportunities across industries. However, these technologies often rely on the ingestion and processing of vast amounts of data, much of which may include copyrighted works such as text, images, audio, and video. This raises a critical question: how can businesses legally utilize such copyrighted materials for purposes like training AI models or conducting large-scale data analysis, especially in jurisdictions with specific copyright frameworks like Japan?

In 2018, Japan significantly amended its Copyright Act (著作権法 - Chosakuken-hō) to introduce several "flexible limitations" (柔軟な権利制限規定 - jūnan na kenri seigen kitei). These reforms were, in part, a response to the growing need to accommodate new technological uses of copyrighted works that don't fit neatly into traditional copyright exceptions, particularly in the fields of AI and data analysis. Two key provisions in this regard are Article 30-4, concerning the exploitation of works not for the primary purpose of "enjoying" their expressive content, and Article 47-5, which permits minor, incidental uses of works in connection with certain computer-based information processing services.

The Need for "Flexible Limitations"

Traditional copyright limitations in Japan are typically specific and narrowly defined (e.g., private reproduction, quotation for criticism). While these serve important purposes, they were often ill-suited for the novel ways in which data-driven technologies interact with copyrighted content. For instance, using thousands of images to train an image recognition AI is not about "enjoying" each image in the conventional sense, nor is it a quotation or private use. The 2018 amendments aimed to provide clearer legal ground for such uses, fostering innovation while still considering the legitimate interests of copyright holders.

Article 30-4: Exploitation of Works Not for Enjoyment (Non-Enjoyment Use)

Article 30-4 of the Japanese Copyright Act is a groundbreaking provision that permits the exploitation of copyrighted works, by any means, provided the purpose is not to "enjoy or cause others to enjoy the thoughts or sentiments expressed in such works" (著作物に表現された思想又は感情を自ら享受し又は他人に享受させることを目的としない場合 - chosakubutsu ni hyōgen sareta shisō mata wa kanjō o mizukara kyōju shi mata wa tanin ni kyōju saseru koto o mokuteki to shinai baai).

Core Principle and Rationale

The underlying rationale is that copyright primarily protects the economic and moral interests associated with the expressive content of a work as it is perceived and appreciated by humans. If a use does not engage with this "enjoyment" aspect, it is less likely to harm the core interests copyright law seeks to protect. This provision, therefore, allows for uses that are often purely technical or analytical.

Scope of Application:

  • "Not for Enjoyment" Purpose: This is the cornerstone. "Enjoyment" (享受 - kyōju) in this context refers to the intellectual or spiritual satisfaction derived from perceiving the work's creative expression. If the primary purpose of the use is, for example, to extract statistical data, train an AI model on patterns, or test a technology, rather than to read, view, or listen to the work for its artistic or literary merit, this condition may be met.
    • The presence of an ancillary or incidental enjoyment purpose alongside a primary non-enjoyment purpose could complicate the assessment. Legal commentaries suggest that if the main and substantial purpose is non-enjoyment, the provision may still apply, but this can be a nuanced factual determination.
  • Types of Works: Article 30-4 applies broadly to all types of copyrighted works, whether text, images, audio, video, or software. It also applies irrespective of whether the work has been made public.
  • Permitted Acts: The provision permits exploitation "by any means," which can include reproduction, adaptation, public transmission, etc., as long as the non-enjoyment purpose is maintained and other conditions are met.
  • Who Can Use It: Any person or entity can potentially rely on this limitation, provided the purpose criterion and other conditions are satisfied.

Enumerated Examples and the Catch-All Clause:

Article 30-4 lists three specific examples of non-enjoyment uses, and also includes a general catch-all for other similar situations:

  1. For use in tests for developing or putting into practical use technology related to the recording, reproduction or other exploitation of works (Article 30-4(i)). This could cover, for example, using music tracks to test the efficacy of a new audio compression algorithm.
  2. For use in information analysis (情報解析 - jōhō kaiseki) (Article 30-4(ii)). This is a particularly vital clause for AI and big data. "Information analysis" is broadly defined as extracting information or knowledge concerning languages, sounds, images, or other elements constituting such information from a multiplicity of works or a large volume of other such information, and comparing, classifying, or otherwise statistically analyzing such information or knowledge. This clearly covers many Text and Data Mining (TDM) activities, including using a corpus of texts to analyze linguistic patterns or a dataset of images to train an object recognition model.
  3. For use by a computer in a manner that does not involve human perception of the creative expression of the work (excluding uses falling under the preceding two items) (Article 30-4(iii)). This covers backend processing or other computer uses where the work is processed algorithmically without the expressive content being made available to human senses.

The provision also includes a catch-all for "other cases" where the purpose is not enjoyment, providing a degree of flexibility for future technological developments. Examples given in official commentaries for uses potentially falling under Article 30-4 include reverse engineering of computer programs (for analytical, non-enjoyment purposes) or using artworks to test the quality of cameras or printers.

Conditions and Limitations:

Even if a use is for a non-enjoyment purpose, two critical conditions apply:

  • "To the extent deemed necessary" (必要と認められる限度において - hitsuyō to mitomerareru gendo ni oite): The exploitation must be proportionate to the non-enjoyment purpose. For example, if analyzing a small feature of many works, reproducing entire works might be deemed unnecessary if access to only relevant portions would suffice, although for many AI training purposes, access to complete works is often technically necessary.
  • The Proviso (ただし書 - tadashigaki): The limitation does not apply if the act "unreasonably prejudices the interests of the copyright owner in light of the type and purpose of the work and the manner of its exploitation." This is a crucial safeguard. For instance, if a database was specifically created and marketed for information analysis purposes (e.g., a commercial, curated dataset for AI training), then using that specific database for information analysis under Article 30-4 without a license might be deemed to unreasonably prejudice the copyright owner's interests, as it directly competes with their intended market.

Article 47-5: Minor Exploitation Incidental to Computer Data Processing and Provision of its Results

While Article 30-4 deals with the input side (using works for analysis or processing without enjoyment), Article 47-5 addresses certain aspects of the output side, specifically for services that use computers to process information and then provide the results of that processing to the public. This provision allows for minor, incidental uses of copyrighted works when presenting such results.

Core Principle and Rationale:

Many valuable information services (e.g., search engines, plagiarism detection services) function by processing large volumes of data (which may include copyrighted works) and then presenting useful results to users. These results often need to include small portions of the original works to be meaningful (e.g., a search snippet or a thumbnail image). Article 47-5 aims to facilitate such socially beneficial services by permitting these minor, incidental uses without requiring individual licenses for each snippet or thumbnail, provided it does not unfairly compete with the original works.

Scope of Application:

  • Eligible Service Providers: The limitation applies to persons who conduct certain types of services (listed below) involving the creation of new knowledge or information through computer-based information processing, and who do so in accordance with standards set by Cabinet Order (政令 - seirei).
  • Types of Works: The works being incidentally used must be "works already offered or presented to the public" (公衆提供提示著作物 - kōshū teikyō teiji chosakubutsu), and further limited to those that have been "published" or "made transmittable" (e.g., available online).
  • Permitted Acts: The provision allows for "minor exploitation" (軽微な利用 - keibi na riyō) that is "incidental to" the act of providing the results of the information processing.

Enumerated Services (Article 47-5(1)):

  1. Location Search Services (Article 47-5(1)(i)): This covers services that, in response to a user's request, provide information on the location (e.g., URL) of a work, along with a small portion of the work (like a text snippet or thumbnail image) to help the user identify or assess its relevance. This is primarily aimed at internet search engines. This provision replaced and broadened the former Article 47-6.
  2. Information Analysis Result Provision Services (Article 47-5(1)(ii)): This allows services that analyze information and provide the results, along with minor portions of the works analyzed, to illustrate or support those results. Examples could include plagiarism detection services showing matching text segments, or sentiment analysis tools quoting phrases.
  3. Other Services Designated by Cabinet Order (Article 47-5(1)(iii)): This allows for future expansion to other similar types of services. (Currently, no services are designated under this item).

Key Conditions and Limitations:

  • "Incidental" and "Minor" Use:
    • Incidental (付随して - fuzui shite): The minor use of the copyrighted work (e.g., snippet) must be subordinate to the main act of providing the information processing result (e.g., providing a search link or an analytical summary). Presenting only snippets without the primary information service would likely not qualify.
    • Minor (軽微な - keibi na): The extent of the exploitation must be minor. This is assessed based on objective factors such as "the proportion of the part of said publicly offered or presented copyrighted work exploited... its quantity, the precision of its display when exploited..." A small image thumbnail or a brief text excerpt would typically be considered minor.
  • "To the extent deemed necessary for the purpose of said acts": The minor use must be proportionate to the purpose of the information service.
  • Provisos:
    1. Known Infringing Source: The limitation does not apply if the publicly offered or presented copyrighted work was itself made public through an act of copyright infringement (or would be infringement if done in Japan, for foreign works), AND the service provider knows this at the time of their minor exploitation.
    2. Unreasonable Prejudice (The Tadashigaki): As with Article 30-4, this limitation does not apply if the minor exploitation would "unreasonably prejudice the interests of the copyright owner in light of the type and purpose of said publicly offered or presented copyrighted work and the manner of said minor exploitation." Examples of uses that might cause unreasonable prejudice include displaying snippets from a dictionary that effectively replace the need for the dictionary, or showing key plot-revealing scenes from a movie as thumbnails.

Preparatory Acts (Article 47-5(2)):

Recognizing that to provide the services under Article 47-5(1), service providers often need to first collect and process large amounts of data, Article 47-5(2) permits the reproduction, public transmission (including making transmittable), or distribution of copies of publicly offered/presented works "to the extent deemed necessary for preparing for the acts listed in the items of the preceding paragraph." This is crucial as it allows, for example, a search engine to make full copies of web pages for indexing purposes, before then providing minor snippets in search results. This preparatory use is also subject to an "unreasonable prejudice" proviso.

Comparing and Contrasting Articles 30-4 and 47-5

While both Articles 30-4 and 47-5 facilitate data-driven uses of copyrighted works, they address different stages and types of activities:

  • Purpose:
    • Article 30-4 focuses on uses where the expressive content is not being "enjoyed" by humans (e.g., input for AI training, technical analysis).
    • Article 47-5 focuses on the output of computer information processing services, permitting minor, incidental presentations of works to humans as part of delivering a result (e.g., search results).
  • Scope of Permitted Use:
    • Article 30-4 allows for potentially extensive exploitation (e.g., full reproduction for analysis) as long as it's non-enjoyment and necessary.
    • Article 47-5(1) is explicitly limited to "minor" exploitation. However, its preparatory clause (Article 47-5(2)) allows for more extensive initial copying.
  • Type of User/Activity:
    • Article 30-4 is broader and can apply to anyone undertaking non-enjoyment uses.
    • Article 47-5 is more targeted towards specific types of information service providers.

Implications for AI Development and Data Analysis

These flexible limitations provide significant legal avenues for businesses engaged in AI and data analysis in Japan:

  • Training AI Models: Article 30-4(ii) (information analysis) is key for using copyrighted texts, images, and other data to train machine learning models, as this is typically a non-enjoyment use aimed at extracting patterns and knowledge. The preparatory copying of data for this purpose would also likely fall under this.
  • Developing AI-Powered Tools: Developing AI for tasks like translation, summarization, or image recognition often involves processing copyrighted works for analytical purposes, potentially covered by Article 30-4.
  • Providing AI-Driven Information Services: If an AI service provides results that include minor, incidental displays of copyrighted works (e.g., an AI-powered search engine or a tool that identifies objects in images and shows a small thumbnail), Article 47-5 might apply.
  • Risk Assessment: Despite these provisions, the "unreasonable prejudice" proviso in both articles requires careful risk assessment. Businesses must consider whether their use, even if for non-enjoyment or as a minor part of a service, could unduly harm the primary market for the copyrighted works.

Brief International Context

Other jurisdictions are also grappling with copyright issues in AI and TDM:

  • United States: Such uses are often analyzed under the Fair Use doctrine, with courts considering factors like the purpose of the use (transformative uses for AI training often favored) and the effect on the market.
  • European Union: The Directive on Copyright in the Digital Single Market (DSM Directive) includes specific exceptions for Text and Data Mining for scientific research and for general TDM purposes, subject to certain conditions (e.g., lawful access).

Japan's approach, with specific yet somewhat flexible provisions like Articles 30-4 and 47-5, represents its own distinct path to balancing innovation with copyright protection.

Conclusion

The 2018 amendments to the Japanese Copyright Act, particularly the introduction of Articles 30-4 and 47-5, have provided a much-needed framework for the use of copyrighted works in AI development, data analysis, and related information services. These "flexible limitations" allow for a range of activities essential to these fields, such as using works for machine learning or providing information search results with illustrative snippets. However, they are not a carte blanche. The conditions, especially the overarching proviso against "unreasonably prejudicing the interests of the copyright owner," require careful consideration and ongoing interpretation as these technologies continue to evolve. Businesses operating in this space in Japan must stay abreast of these provisions and their application to ensure their innovative activities remain on solid legal footing.