A couple of months ago, Microsoft added generative AI features to Windows 11 in the form of a taskbar-mounted version of the Bing chatbot. Starting this summer, the company will be going even further, adding a new ChatGPT-driven Copilot feature that can be used alongside your other Windows apps. The company announced the change at its Build developer conference alongside another new batch of Windows 11 updates due later this year. Windows Copilot will be available to Windows Insiders starting in June.
Like the Microsoft 365 Copilot, Windows Copilot is a separate window that opens up along the right side of your screen and assists with various tasks based on what you ask it to do. A Microsoft demo video shows Copilot changing Windows settings, rearranging windows with Snap Layouts, summarizing and rewriting documents that were dragged into it, and opening apps like Spotify, Adobe Express, and Teams. Copilot is launched with a dedicated button on the taskbar.
Windows is getting an upgraded Clippy, one that shares its name with the biggest copyright infringement and open source license violation in history. In fact, some of the Windows Copilot features are built atop the Github Copilot, such as the new “AI” features coming to Windows Terminal. Now you can get other people’s code straight into your terminal, without their permission, and without respecting their licenses. Neat!
I wonder how long it’ll take for someone in EU to sue Microsoft for anti-competitive practices again. Hopefully this time it’ll happen before all the damage is done.
Thom Holwerda,
It’s unclear to me how a copyright judge would rule on this and I have to confess I’m not sure how I feel about it myself either.
AI is very transformative and I’m hesitant to support a blanket ban on AIs learning from public content and works available online. After all, we don’t apply that standard to humans. As a human developer, I can read copyrighted books and open source code and learn how to technically accomplish something as a result of reading that work, I can and will use that knowledge in my work. Am I wrong for doing so? Outside of things that are trade secrets, it’s traditionally implied that readers have the right to learn and use their newfound knowledge without any permission at all. So to dictate that AI may not learn anything from existing works without the author’s permission seems to excessive. It creates one standard for humans and another for AI. I’m not comfortable with this. Obviously if an AI implementation is actually violating copyrights, then it should be addressed. But it seems wrong to me that we should cordon off the internet and other public works with “human-only” yellow tape. Even school textbooks, medial journals, science journals, news papers, and osnews comments would be off limits since they’re copyrighted without explicit permission given to AI.
The result of such restrictions would lead to AI training being heavily skewed towards public domain sources from a century ago with huge gaps in modern knowledge and expertise. I’m not enamored by microsoft, but just in terms of AI I don’t know this would be the best outcome for society.
There is a huge difference between learning from someone else’s code and copying it line for line while violating its license. The former is one of the strengths of open source and Free software, the latter is what Copilot is doing right now.
As such, Copilot (whether it is capable or not) simply does not respect FOSS licenses. You, as a human, can make a decision to respect a license if you do decide to copy code verbatim, and you, as a human, can be sued for violation of a license’s terms if you break them. That’s cut and dry, there is case law. What it comes down to is the fact that Copilot is not learning from open source code (which would be fine), it is literally copying and pasting the code when you invoke it, and is ignoring the license attached to that code. Personally I think it shows the true limitations of AI at this point; while it makes for a “wow cool!” demonstration, it’s literally not learning anything but which code to copy and where to apply it. It’s not generating original code based on concepts it learns from reading existing code (which is what we humans are capable of), it’s just a fancy rubber stamp.
Morgan,
Sure, and I agree up to this point. Infringing cases need to be addressed. However where do we draw the line: should ALL training without permission be unlawful? From what I gather I think that is Thom’s position. Or is it only unlawful when AI implementations end up creating carbon copies?
We can agree that cases of actual infringement should be prohibited, but prohibiting AI from acquiring and using public knowledge by default like in Scenario B really rubs me the wrong way.
I’ve seen some of the chatgpt coding sessions and I don’t think that’s a totally fair assessment. Yes, it is able to reproduce the original source in some cases when prompted with rather specific queries, and I’d agree that’s an implementation flaw that needs to get addressed. However chatgpt is able to write and revive code much as a human developer can and it’s quite fascinating to see it in action.
I honestly don’t think expressions are limitation of AI even at this point. As sukru was doing a few weeks ago, you can feed it paragraphs of your expressions, and it will take your ideas and rewrite those ideas endlessly using new expressions.
Obviously there is be room for improvement and I don’t want to negate that, but in general it’s actually doing what we want the AI to do: taking ideas and creating new ways of expressing them. Infringing edge cases aside, what the AI is doing is normally permissible under copyright when done by humans.
revive code->revise code
BTW I”d be curious to hear what everyone’s opinions are: Is scenario A better? Is scenario B better?
Copyright law isn’t just for exact copies. If the original is transformed “as if by a mechanical process” then it can be extremely different to the original and still violate the original copyright.
For a simple example; if you grab Stephen King’s latest novel, shove it through OCR to get raw text, shuffle paragraphs randomly, send that to a speech synthesizer, then add a drum beat to it; the original copyright applies to the resulting audio (even though it may be nothing like the original).
This is what the courts will have to figure out – not if the output is too similar to copyrighted work, but whether CodePilot’s output is “copyrighted work transformed by a mechanical process”.
Of course humans aren’t machines, so a real person learning from copyrighted work (and then creating something from what they learned) is not a mechanical process and therefore not a copyright problem.
Brendan,
I disagree with you on the “extremely different” part – if it’s not similar, the case for infringement would become exceptionally weak IMHO.
IMHO your example is very much like the original and a judge/jury would immediately spot the similarity. Take an audio book, the fact that it takes a lot of work to create the audio book doesn’t mean it isn’t similar and derivative.
Ignoring the point of similarity though, I feel you have a valid point about a “mechanical process”. One rebuttal for that is that all of physics is technically a mechanical process and if that’s our litmus test for banning AI, then we have to ban humans as well. Another rebuttal is that as mechanical processes becomes more and more complex, as the case might be when training deep neural nets, we may find emergent intelligence that behaves less and less like a trivial mechanical process and increasingly develops characteristics of intelligent thought.
Of course we are machines
Our DNA contains the instructions for our cells to build us from the ground up, All of that is a mechanical process leading up to the point where we can read copyrighted works in order to add others’ ideas and knowledge to our own.
“The source wasn’t a trade secret, it was openly published for the entire public to access. ”
I’m sorry, but the license still applies, you can’t just take code and incorporate in your own (corporate) work.
The Windows source code has been leaked multiple times and still you can’t just take code from it and incorporate it on your own (corporate) work. If you are found out Microsoft or even Open Source/Free Software projects will come after you.
What Open Source/Free Software a lot of the time does allow: take their code and keep the original author references and a lot of the time you need to publish your changes again (well, strictly speaking you need to distribute the code to the people you distribute your program to).
Lennie,
Please understand that I wasn’t suggesting otherwise. Maybe I wasn’t clear,. but I’m not saying that copyright doesn’t apply, however I am strongly suggesting that in the end it should apply consistently to both humans and AI.
If we end up with copyright policy such that a human is allowed to do X, but AI is NOT allowed to do X. Then we end up with a hypocritical copyright policy that exposes our biases against AI. We cannot have justice when the rules being applied change based on who’s being judged.
Yes, although I don’t know why you are telling this.
Practically without exception, every human developer will study/learn/reference sources that are not public domain, be it in print form or online. As a legal matter, humans don’t need a license to learn and recreate code or even fully memorize code and use the acquired knowledge in their own work, but what’s being suggested here is that AI should be judged by a different standard.
This is the philosophical meaning behind those female figures holding up scales with blindfolds. It represents blind justice.
https://tgb.com.au/wp-content/uploads/Blind-Justice.jpg
Morgan,
We should not confuse bugs with design decisions.
ChatGPT/Copilot/Whatever reproducing code line by line is generally a bug. Especially if coming from a license that is not extremely free (like public domain).
Them learning from large amounts of codebases, and reproducing established “patterns”, on the other hand, is actually useful, and in my opinion would be considered “transformative”. (Not a lawyer, so take it with a grain of salt).
Today, there might be issues. In short time, these will be ironed out, and all that discussion will be moot.
Interesting, I thought the true learning/applying knowledge phase was still years away based on my limited testing. I have given ChatGPT prompts for simple problems in Python and JavaScript, and every time I can copy/paste its “generated” code into a search engine and find almost line-for-line examples on StackOverflow and Github that are months to years older than ChatGPT itself. There have been a few instances where it technically isn’t line-for-line copying, but all it has done is reorder functions or rename variables (i.e. the variable “number” becomes “n”).
Or maybe since I’m not a programmer, I’m overlooking the fact that there are only a few ways to solve simple problems in code, and I would need to step up to more complex problems to see true learning and application of knowledge acquired?
Beyond that, I still don’t think Copilot in particular takes licenses into consideration, or if it does I have not been able to find documentation to support that assertion. Microsoft skirted around the issue of licenses entirely when it issued a statement about the pending lawsuit[1].
1. https://www.courthousenews.com/microsoft-and-github-ask-court-to-scrap-lawsuit-over-ai-powered-copilot/
> I have given ChatGPT prompts for simple problems in Python and JavaScript, and every time I can copy/paste its “generated” code into a search engine and find almost line-for-line examples on StackOverflow and Github that are months to years older than ChatGPT itself.
Expected, since its not any intelligence but a much better search engine for particular problems. It understands very well, what you are looking for.
Now, there are 2 options for a response:
1) paste the copied code. Nothing wrong with that, because any human programmer would just do the same.
2) offer slightly modified code. Nothing wrong with that, because any human programmer caring about licenses will do that same.
In a nutshell, the licensing and copy-right situation has not changed. Only the automation is new.
Further, I have doubts that ChatGPT or Co-Pilot will really break any license because all the results so far have been working only after heavy (interactive) correction and modification.
Example: https://manticore-projects.com/JSQLParser/javadoc_snapshot.html
The Floating TOC Sidebar has been written by ChatGPT, but it took 20-25 iterations until it worked. Almost 2 hours. If I would have written it from scratch, it would have taken me 2 days (since I don’t write HTML/JS/CSS). So, this thing certainly started as a template under a copyright, but the final result is as genuine as I would have written it by myself.
Morgan,
Yes, without knowing how exactly the training data was generated, it would be difficult to make assertions about the licenses.
However there are two possible ways to look at this:
1) Any private use of open source (in many licenses) is explicitly allowed. And as long as you don’t copy or translate the code verbatim, it should be okay. That means using large amounts of public code (with select licenses) is kosher.
(Again as long as you take precautions not to generate code verbatim)
2) “Learning” is no different than copying, even when you transform all the source code and algorithms, hence it is illegal to do so without explicit attribution.
(It might still be possible to say: “this snippet is based on main.cc from openldap codebase with Apache license”, etc)
The final decision could be somewhere in between. Personally, I am on camp (1), but as I am not a lawyer, this is just an opinion.
“ Now you can get other people’s code straight into your terminal, without their permission, and without respecting their licenses. Neat!”
You clearly have no idea how this actually works. Install it, use it, and then write about it, is my recommendation.