Vox Media and The Atlantic sign content deals with OpenAI

Thom Holwerda 2024-05-29 In the News 31 Comments

Speaking of The Verge, its parent company Vox Media, along with The Atlantic, have signed a deal with OpenAI.

Two more media companies have signed licensing agreements with OpenAI, allowing their content to be used to train its AI models and be shared inside of ChatGPT. The Atlantic and Vox Media — The Verge’s parent company — both announced deals with OpenAI on Wednesday.
Emilia David at The Verge

In the case of Vox Media, the deal was made and announced without informing their staff, which obviously doesn’t sit well with especially Vox’ writers. By making deals like this, upper management gets to double-dip on the fruits of their workers’ labour – first, the published content generates ad revenues, and second, OpenAI pays them to use said content for training and other purposes.

And once the “AI” gets good enough, more and more of the writers will be fired, leaving only a skeleton crew of lower-paid workers to clean up the “AI” output. With this deal, the writing is on the wall for every journalist at Vox Media – you’re currently contributing to your own obsolescence, and your bosses are getting paid for it.

As far as I know, OSAlert’ owner, David, has not yet been contacted by OpenAI. Regardless, I’ll sell the past 20-odd years of my terrible takes for 69 million euros, after deducting Swedish taxes. And since OpenAI is run by billionaires: taxes are this thing where normal people pay a portion of their income to the government in return for various government services.

It’s wild, I know.

About The Author

Thom Holwerda

Follow me on Mastodon @[email protected]

31 Comments

2024-05-29 6:41 pm

Alfman verbose=1
Thom Holwerda,

In the case of Vox Media, the deal was made and announced without informing their staff, which obviously doesn’t sit well with especially Vox’ writers. By making deals like this, upper management gets to double-dip on the fruits of their workers’ labour – first, the published content generates ad revenues, and second, OpenAI pays them to use said content for training and other purposes.
…
As far as I know, OSAlert’ owner, David, has not yet been contacted by OpenAI.

I know I’ve been disagreeing with you quite a lot over the semantics and I really don’t think copyrights are up to the task of regulating AI generation of human-like expressions. But I do share your high level concerns.

IMHO copyrights aren’t the reason that AI is dangerous for society. In cases like this Vox Media and The Atlantic (and I’m sure many more) the copyright issue is already being solved by AI firms. Yet I’m confident this does nothing to appease anyone against AI, AI copyright infringement was never the main threat with AI. It’s really about the potential to disrupt so many human lives – even in instances where copyright isn’t being violated.

The business case for AI to replace humans in upcoming years is increasing, but so many people are still in denial. Many people are only going to realize this the hard way when people around them start being personally effected. Everyone will look their employers to keep our jobs, but with few exceptions, employers will not be the heros, they will let greed pave the way forward. Some jobs will remain, good jobs even, but the AI productivity boost means there will be fewer positions. Historically people dealt with displacement through further education and intellectual skills since these were hard to automate, however this time around AI competing for intellectual and creative jobs. I am still predicting this will be a rude awakening for the masses. We’d better have our social safety nets in place before this happens.

2024-05-29 7:24 pm

andyprough
>”Historically people dealt with displacement through further education and intellectual skills”

Actually, I’m pretty certain that historically most people dealt with displacement by ultimately dying penniless and hungry.

2024-05-29 8:23 pm

Alfman verbose=1
andyprough,

Actually, I’m pretty certain that historically most people dealt with displacement by ultimately dying penniless and hungry.

Ok, I was referring more to automation displacing jobs in modern times. More education has generally been a good bet in the past century, But AI changes the assumptions of what remains safe from mechanization.

2024-05-29 7:19 pm

andyprough
>” I’ll sell the past 20-odd years of my terrible takes for 69 million euros”

Nice!
2024-05-29 7:25 pm

fulgheri
It’s extremely scary, as a scenario. I’ve been on OSAlert (or BeOSAlert) for quite a while, and I got a triad of trusted IT sites: OSAlert, TheReg and Ars (which is becoming more and more like a PR outlet – long gone are the days of Hannibal and Caesar).

I respect your work and I try to support it to the best of my possibilities (ie: yesterday I had a very stupid car accident, thankfully nobody was hurt, me included, but I still lost EUR 28k because of that half a second…)

I’ll try to contribute more, your independent journalism and opinions are invaluable even when I don’t agree with them (remember Voltaire?) but they must continue to be independent, those LLMs are a ridiculous bunch of crap, and nowhere near a true “AI”.

Keep up the great work!

Mat

2024-05-29 8:27 pm

Alfman verbose=1
fulgheri,

I’ll try to contribute more, your independent journalism and opinions are invaluable even when I don’t agree with them (remember Voltaire?) but they must continue to be independent, those LLMs are a ridiculous bunch of crap, and nowhere near a true “AI”.

LLMs are “true ‘AI'”. What they are not is true AGI “artificial general intelligence”. AGI is the correct computer science term most people actually mean.

2024-05-29 9:42 pm

fulgheri
I stand corrected.

Thanks!
2024-05-29 10:12 pm

fulgheri
I apologize, it’s the hour of the dead here.

Apart from that LLM/AI mistake, my point was: original content, such as Thom’s articles or my own novels, should be protected, and unavailable to train LLM models unless explicitly approved by the author. I’m thinking of something like a GPLv3 here, where knowledge (or contents) can be shared by the author(s) if all of them agree to those terms. Writing is incredibly difficult, figuring out the story itself is the easiest part. And unless your name begins with a “Stephen” and ends with a “King” you’ll print it all, correct most mistakes *on paper*, re-edit your ODT file, print it all out *again* and maybe, if you’re very good or very lucky you’ll just edit that same file for the last time. Before printing it all again and then send the file to the publisher. Then wait a year to be paid less than a standard taxi fare. That content, that text, is mine and mine alone. The simple fact alone that it is published doesn’t allow *anyone* to use it to train *any* bloody LLM, period.

2024-05-29 11:18 pm

Alfman verbose=1
fulgheri,

I apologize, it’s the hour of the dead here.

We’ve all been there

my point was: original content, such as Thom’s articles or my own novels, should be protected, and unavailable to train LLM models unless explicitly approved by the author.

I understand your point and it isn’t unreasonable. However following that train of thought always leads me to a contradiction: copyright law allows us to use protected works to train our meaty neural nets, but artificial neural nets are off limits? What aspect of copyright law allows us to make infringement depend on this? It kind of feels like we’re trying to create a new form of discrimination with different standards imposed for humans and AI.

I’m thinking of something like a GPLv3 here, where knowledge (or contents) can be shared by the author(s) if all of them agree to those terms.

This one really confuses me because the GPL authors already gave permission for downstream derivative works. Adding new restrictions is expressly prohibited. In GNU’s own words…

https://www.gnu.org/licenses/gpl-faq.html#GPLCommercially

I’d like to license my code under the GPL, but I’d also like to make it clear that it can’t be used for military and/or commercial uses. Can I do this? (#NoMilitary)

No, because those two goals contradict each other. The GNU GPL is designed specifically to prevent the addition of further restrictions. GPLv3 allows a very limited set of them, in section 7, but any other added restriction can be removed by the user.

More generally, a license that limits who can use a program, or for what, is not a free software license.

While I do understand why an author may not like their code being used to train AI, the GPL does not permit said author to restrict how the code can be used. This means software authors who do not want their code being used to train AI should not license their code under the GPL (or most current FOSS licenses). The only constraint the GPL imposes on derivative code is that derivative code most also abide by the GPL license. I don’t know whether copilot specifically complies with these GPL terms, however in principal it seems that GPL code can be used to train AI without any additional permission, the GPL already permits it.

2024-05-29 11:56 pm

Alfman verbose=1
Edit:

I’m thinking of something like a GPLv3 here, where knowledge (or contents) can be shared by the author(s) if all of them agree to those terms.

On rereading it occurs to me what you were saying may have been different from what I thought you were saying. If so then I apologize to you

I believe you are calling for more explicit permissions for AI training. That does make sense. Although copyright would still have to come up with an answer to whether generalized representations (without literal copies) constitute copyright infringement.
2024-05-30 12:42 am

fulgheri
What I write isn’t code, it’s copyrighted material. I was referring to GPLv3 as a mere example.

In theory, existing copyright laws are already there, in place, perfectly established and somewhat correctly enforced. But *all* LLMs (OpenAI, Copilot etc etc…) literally drag up whatever’s on the Net, regardless of the original license. I mean, I can write a short story and publish it on my blog (Don’t worry, I don’t have one) under one of many open / free to use licenses. It is *still* copyrighted work.

“Hey Siri, tell me a story about Sardinia or Liguria or Milan.
“Hello Jack, I’ve found the following: 1, 2, 3, 4… etc etc. Would you like me to read you the stories that are publicly available or would you like to purchase and audio-book?”
“The most recent one. Is it publicly available or do I have to buy an audible?”

That would be perfectly acceptable.

But this isn’t the direction taken ATM by the whole “AI” biz…
2024-05-30 12:47 am

fulgheri
Yep, you’re right – my apologies, I replied straight away before reading your second comment.

I think we both agree that:

1) my ref to GPLv3 was only an example

2) there’s already an established (by law) difference between code and copyrighted text

3) cheers, a good day to you!
2024-05-30 7:57 am

worsehappens
> It kind of feels like we’re trying to create a new form of discrimination with different standards imposed for humans and AI.

This might be naive, but why couldn’t we ? It’s not like it’s going to hurt anyone’s feelings, at this stage (apart maybe from the investors that hoped to put a good chunk of the population out of jobs). And I really don’t feel like we’re collectively gonna get something worth the misery it’s going to cause… Even though I’m pretty sure I’m not in the group that’s going to lose the most from it, I’m also one of these people that believe society needs some balance to not spiral down into hatred and violence, and I don’t see how AI will not put the fragile thing we have these days into even more peril…
2024-05-30 9:18 am

Alfman verbose=1
worsehappens,

> It kind of feels like we’re trying to create a new form of discrimination with different standards imposed for humans and AI.

This might be naive, but why couldn’t we ?

I think you can make that case, but nobody is doing it. It just seems that everybody is eager to jump to the conclusion that AI needs to be banned while skipping the intermediate logical steps needed to get there.

It’s not like it’s going to hurt anyone’s feelings, at this stage (apart maybe from the investors that hoped to put a good chunk of the population out of jobs). And I really don’t feel like we’re collectively gonna get something worth the misery it’s going to cause…

I’ll share my own thoughts, but I acknowledge this is such a tricky debate. AI isn’t sentient, and AI does not exist for AI’s sake but rather AI has the awesome potential to do good for society at large. It is the key to a society based on leisure instead of work…however the problem is our entire economy is centered around work. In other words, we are dependent on work. Given our current social structures, making workers redundant could realistically put millions more into poverty. This is not AI’s fault, it’s capitalism’s fault.
Regardless, given that capitalism is the reality for most of us, the question becomes whether AI is a net benefit or not, and I honestly don’t know the answer. If a less job focused utopia is to exist, I suspect it would probably have to start outside of the US where wall-street strongholds will predictably pursue AI as a means of continuing their own greedy ways while hurting everyone else.

Even though I’m pretty sure I’m not in the group that’s going to lose the most from it, I’m also one of these people that believe society needs some balance to not spiral down into hatred and violence, and I don’t see how AI will not put the fragile thing we have these days into even more peril…

I don’t disagree with your points. I personally believe that AI will keep evolving whether the public wants it to or not because those protesting aren’t really the ones pulling the levers in government and boardrooms. And this factors into some of my views, including my opinion that we should be focusing on what we want AI to become rather than banning it.

2024-05-30 3:55 am

drstorm
What is AI, Thom?

2024-05-30 4:28 am

fulgheri
Artificial Illegality.
2024-05-30 9:04 am

fulgheri
When you read something from me or Thom or Everyman-Jack it might inspire your own built-in Neural Network, which is usually called “brain”. Which is perfectly natural. A quote would be welcome but legally you don’t have to.
Original contents should be protected from the likes of M$, Google, Apple, OpenAI etc.
Otherwise it’ll be a corporate-controlled Far-friggin-West.

2024-05-30 9:26 am

Alfman verbose=1
fulgheri,

When you read something from me or Thom or Everyman-Jack it might inspire your own built-in Neural Network, which is usually called “brain”. Which is perfectly natural. A quote would be welcome but legally you don’t have to.
Original contents should be protected from the likes of M$, Google, Apple, OpenAI etc.
Otherwise it’ll be a corporate-controlled Far-friggin-West.

So, just to be clear, are in favor of new copyright laws to allow natural NN training and prohibit artificial NN training? Out of curiosity what would these laws look like? I think there needs to be a debate, so how exactly do you propose copyright be amended to differentiate these?

2024-05-30 10:06 am

fulgheri
Fairly simple. You and I have a brain, by definition a Natural Neural Network. Should you read a story written by me, and should that story inspire you to write something else, I welcome it! Really, I do!
A Meta (or whatever)-owned and operated supercomputer does NOT have a brain. GPUs *can’t* be inspired, just coded to steal (badly) from your or my works (endeavors) of intellect – “put glue on a pizza”.

Dickens and Collins were the first writers to fight against stolen/copied original stories. Almost 150 years ago. And Collins was a lawyer.

I might be a crappy writer, but since it’s been published with an ISBN code, you can’t just feed it into a nVIDIA GPU, period. The Law is on the authors’ side, otherwise I could steal your Ducati 999 simply because I bought the same model before you did.

2024-05-30 12:14 pm

Alfman verbose=1
fulgheri,

Fairly simple. You and I have a brain, by definition a Natural Neural Network. Should you read a story written by me, and should that story inspire you to write something else, I welcome it! Really, I do!

I wrote an admittedly simple AI years ago, I literally called it “brain”. In a sense computers have brains too, and LLMs are a kind of static brain, arguably even more intelligent than real insect brains. So I’d suggest just having a brain is insufficient, you probably wonder to consider full blown sentience.

We can agree that LLMs are not sentient. However the issue I have with the sentience argument is that if LLMs are not sentient then it it just a tool. But a tool does not commit copyright infringement, it’s sentient users who would have to commit the infringement. But since they are sentient, copyright should not bar them from using the tools for non-infringing uses. We’re back to square one. Do we accuse a camera or computer of committing infringement? No, these things are allowed. the responsibility lies with sentient users to use the tools in ways that do not infringe, which they can do with LLM. Banning the tools themselves when there are non infringing use cases seems excessive to me.

Dickens and Collins were the first writers to fight against stolen/copied original stories. Almost 150 years ago. And Collins was a lawyer.

That’s fair, but I think we need to look at the difference between literal copying and generalizing. If we generalize the contents of a book without copying the original expressions, that is different than copying the contents of the book using the author’s original expressions. Traditionally, copyright only considers one of these to infringe, at least as applied to humans. It seems like you might want generalizations to become infringement too, but can you agree that would be really controversial?

I might be a crappy writer, but since it’s been published with an ISBN code, you can’t just feed it into a nVIDIA GPU, period. The Law is on the authors’ side…

Technically transient copies are made all the time, from packet switching networks, hardware buffers, ram, bounce buffers, swap, cache, etc. Copyright does not consider these copies to infringe even though the copies are real and took place without permission because they are temporary and ultimately serve a permissible use. It’s not the transient copies that copyright is concerned with, but the end result. Honestly I think copyright gets this right, it is the only reasonable way to handle it.

I want to be clear, I don’t find any of your ideas aren’t unreasonable, but there’s a lot of nuance and that’s keeping me from agreeing with everything flat out. The subject is complex and I’m trying to find a balance that makes sense for me. It’s not been trivial.
2024-05-30 12:23 pm

Alfman verbose=1
I want to be clear, I don’t find any of your ideas aren’t unreasonable

LOL. Man am I bad at editing

2024-05-30 5:52 am

j0scher
I’m not worried about the “fruit of their labour” being “stolen” but the fact that the Vox brainrot will influence the answers the ‘AI’ will give, especially for political topics.

2024-05-30 2:30 pm

andyprough
Training AI’s on the archives of Vox, Reddit, Quora – what could possibly go wrong? I mean other than AI encouraging people to eat glue on their pizza and gasoline in their spaghetti recipes.

2024-05-30 8:35 am

kurkosdr
And once the “AI” gets good enough, more and more of the writers will be fired, leaving only a skeleton crew of lower-paid workers to clean up the “AI” output. With this deal, the writing is on the wall for every journalist at Vox Media – you’re currently contributing to your own obsolescence, and your bosses are getting paid for it.

The fact most “journalists” out there (aka glorified bloggers pretending to be journalists) can be replaced by LLMs says a lot about the quality of journalism out there: it’s all puff pieces and opinionated garbage.

It’s the reason most political articles today are heavily biased opinion pieces, with the aforementioned “journalists” proudly displaying their bias as if it’s some kind of badge of honor.

It’s the same thing with tech journalism. There’s a reason the quality of most products today is terrible: screens can have massive backlight leakage, components can have massive coil whine and gamepad analog sticks can exhibit massive drift, but the aforementioned “journalists” will “ooh” and “aah” over the latest shiny without mentioning the issues (or mention them in passing and not let them affect the final score) and give the companies that make this junk their glowing reviews (puff pieces) anyway.

It’s the same thing with automotive reviews, with car “journalists” flown to exotic locations and wined and dined there on five-star hotels so they will write good reviews for the car no matter what (and if they don’t, there is someone else to do the job). You’ll read glowing reviews on cars that are known unreliable pieces of junk made by brands with known awful reliability stats on Consumer Reports (Land Rover, Alfa Romeo, VW, Dodge to name the worst).

So, I say: Replace them all with AI, it’s not like AI is going to do a worse job. For the few good journalists out there, there is Patreon and Google ads.

2024-05-30 9:09 am

fulgheri
Put yourself in their shoes: you get fired because an algorithm is cheaper and faster – but way way worse in terms of quality. Or do you really want to put glue on your pizza because the mozzarella won’t stick?

2024-05-30 9:59 am

kurkosdr
The problem that modern “journalists” face is not that they are slower or more expensive than LLMs, it’s that they are not any better. Here is an example from the tech sector (which is something we all know here): Did you know that iPads sold as “creator’s tablets” had massive backlight leakage spanning generations? Of course not, none of those “tech journalists” told you about it in their reviews. They’ll gush over the latest shiny with no research or thorough review to uncover any defects. Similarly, did you know that Alienware 17 R1 laptops equipped with high-end Kepler GPUs (GTX 780M and GTX880M) had overheating issues because Dell underspec’ed the cooling solution? (Dell even told people to not use Furmark because it would damage the laptop and void the warranty) Of course not. The review cycle goes like this: Gush, make some measurements that are irrelevant to daily use (nobody cares about 5% difference in FPS), give a glowing review to keep those free devices coming, repeat. In the 90s and 2000s, the above issues would have been worthy of extensive coverage similar to the one given to the capacitor-plague, especially considering the brand name they were attached to. So, why should this kind of “journalism” be preserved? Why should anyone be entitled to such a job which is the textbook description of professional fluffer?

So, I say: replace them with AI. At least LLMs make an attempt to present information that represents the sum of their training. For example, the LLM you mention told people to add non-toxic glue, which is a real thing used by chefs, and if you want your mozzarella to stick to the base, it is in fact the proper solution (though the wording was ambiguous I admit).

2024-05-30 10:13 am

fulgheri
Totally agree. Gruber could be easily replaced that way (unless he already is an Apple-controlled bot ).
But I do read (and support) The Guardian every day, and there you can see the difference between good writers and cheap ChatGPT…

Sad but rue.
2024-05-30 10:19 am

fulgheri
About the glue on the pizza. Be careful. 1) it’s a blasphemy, pure and simple, it means you don’t know how to properly prepare and cook one; 2) in Rome or Naples you might * actually* get shot for saying something like that! No kidding!
Just having a bit of fun here… Unless you’re one of those guys who put cream on Spaghetti alla Carbonara (I’ve been a Chef as well)… Ciao!

2024-05-30 10:56 am

Alfman verbose=1
fulgheri,

Put yourself in their shoes: you get fired because an algorithm is cheaper and faster – but way way worse in terms of quality.

I really sympathize with those who are displaced. However I feel that AI may be used as an easy scapegoat for issues that are actually more systemic. Even prior to AI, journalism faced wave after wave of consolidation and elimination. I’d argue that real journalism died years ago and IMHO corporations deserve the blame.

https://www.youtube.com/watch?v=_fHfgU8oMSo

2024-05-30 3:50 pm

fulgheri
I agree with you, Large Language Models are not the problem, but the people or the Big Companies behind them are.
AI or LLM, they’re just tools. They can be useful or devastating.

For example, I personally hate weapons, but in every Country on Earth the Beretta is never guilty. The man or woman pulling the trigger is.

I understand this is a bit extreme, I was just trying to make a point. I’d immediately lose AI rather than good journalism and writers…

At first I thought it was just a fad, now I’m not sure… Cheers fellas!

/me :ironically sad face:
2024-05-30 3:55 pm

fulgheri
The Guardian is still a piece of great (and gratis if you can’t afford it) journalism. The NYT lets you read half a dozen articles before you smash your head against the paywall, IIRC.