Artificial intelligence (AI) and copyright will certainly be a topic that will occupy us for the next few years. However, it is not just a question of whether works created with AI may be subject to copyright or copyrights. The use of copyrighted data for AI training must also be questioned, as well as the other works that result.
Keiko Nagaoka, Japan&s minister of education, culture, sports, science and technology, reiterated existing Japanese legislation that the use of data collected on the Internet for non-commercial and commercial purposes also applies in the context of generative AI, according to The-Decoder, in a hearing with Japanese politician Takashi Kii in April.
Thus, under current law, the policy permits the use of virtually any data "regardless of whether it is used for nonprofit or commercial purposes, whether it is an act other than reproduction, or whether it is content obtained from illegal sites or otherwise."
In short, in Japan, data sets subject to copyright protection may be used to train generative AI models. This currently represents one of the most extreme "pro-AI positions" recently announced by public institutions.
An almost equally liberal AI position on copyright is currently held by Israel:
A 2022 position paper published by the Israeli Ministry of Justice says that AI training is "typically" subject to the fair use doctrine, allowing "incidental use of copyrighted material" if the copyrighted works are deleted at the end of the training process.
Data sets that are specifically trained with works of individual authors in order to subsequently compete with them are excluded. That is, with the possibility and/or intention of directly copying an author&s style. In addition, the output of the systems could infringe copyrights regardless of the training process.
Since the underlying data has been transformed into a single large weight matrix (the so-called latent space) after training, it can no longer be specifically deleted from the model.
As long as individual countries in the world legally allow copyrighted material to be added to the latent space of a model, other countries in the world would have to control the cross-border use of these models in their own countries accordingly. Which seems difficult to do factually.
In addition, it must be fundamentally clarified whether the output of AI systems can be considered plagiarism at all.
Our personal view is: We will not get around a new copyright law that takes into account the specifics of AI-trained models. And that will still take quite a while...