Reddit signs $60M content licensing deal with AI company
74 points by tonystubblebine 1 year ago | 79 comments- leobg 1 year agoHow can they license something that they didn’t author? Yes, they have TOS. But training generative AI wasn’t something that existed when ~99% of Reddit’s content was created, hence users could not possibly have consented to it. Besides, at least in Germany, TOS cannot contain regulations that are “surprising” or “unexpected”. Using my content to serve ads is one thing I might expect. But licensing it out for a fee to third parties? I don’t think so.
- SAI_Peregrinus 1 year ago> When Your Content is created with or submitted to the Services, you grant us a worldwide, royalty-free, perpetual, irrevocable, non-exclusive, transferable, and sublicensable license to use, copy, modify, adapt, prepare derivative works of, distribute, store, perform, and display Your Content and any name, username, voice, or likeness provided in connection with Your Content in all media formats and channels now known or later developed anywhere in the world. This license includes the right for us to make Your Content available for syndication, broadcast, distribution, or publication by other companies, organizations, or individuals who partner with Reddit. You also agree that we may remove metadata associated with Your Content, and you irrevocably waive any claims and assertions of moral rights or attribution with respect to Your Content.
From the oldest version of their ToS[1]. This is unchanged in the newest versions even for the EEA[2]. It seems pretty clearly that whatever AI training is doing is covered by "use, copy, modify, adapt, prepare derivative works of, distribute, store, perform, and display" in "media formats and channels now known or later developed anywhere in the world" (emphasis mine).
[1] https://www.redditinc.com/policies/user-agreement-october-15...
[2] https://www.redditinc.com/policies/user-agreement-february-1...
- ySteeK 1 year agoAt least in Germany, such agreements are afaik invalid and without a severability clause, possibly all others too. Simply because something like copyright cannot be assigned in Germany. Secondly, there are ways to use Reddit without ever having agreed to the ToS.
- SAI_Peregrinus 1 year agoThat clause does not assign copyright. You explicitly keep your own copyright (in the previous clause, I didn't reproduce it above). You just grant them a license to use your content in the ways they listed.
- fragmede 1 year agohow do you make a comment without creating an account which requires you to agree to the tos?
- 1 year ago
- SAI_Peregrinus 1 year ago
- miohtama 1 year agoThis is (hopefully) the major difference between web 2.0 and Web3. In the latter, the goal is to build services where you actually own your content.
Remains to see if this actually can happen.
- ySteeK 1 year ago
- neom 1 year agoPeople have actually been doing stuff like this since way before the LLM thing, I've bought books containing collections of stories from websites.
Craigslist Confessional: A Collection of Secrets from Anonymous Strangers https://www.amazon.com/Craigslist-Confessional-Collection-Se...
PostSecret: Extraordinary Confessions from Ordinary Lives https://www.amazon.com/PostSecret-Extraordinary-Confessions-...
Stoned, Naked, and Looking in My Neighbor's Window: The Best Confessions from GroupHug.us - https://www.amazon.com/Stoned-Naked-Looking-Neighbors-Window... (actually a great book)
People can and will profit from things you do in life for free, I feel like we accepted that a very long time ago?
- nroets 1 year agoPerhaps it isn't legally a license deal, but rather unlimited scraping i.e. database access.
The AI company just trains their models on that and aren't creating derivative work in the legal sense.
- FireBeyond 1 year agoThat's not going to fly in any court.
"We didn't license it to them for the express purpose of training your model on this data, we only gave them database access for the express purpose of training their model on this data."
- FireBeyond 1 year ago
- AznHisoka 1 year agoFWIW, they already have been licensing their data for years to social media management platforms (ie SproutSocial, Sprinklr)
- PM_me_your_math 1 year agoBecause it is reddit, one of the most vile companies on the planet.
- andrewinardeer 1 year agoSays the guy with the Reddit meme username on Hacker News.
- andrewinardeer 1 year ago
- rakoo 1 year ago> How can they license something that they didn’t author?
Capitalism in a nutshell
- mtillman 1 year agoIt’s an American company. Bullets didn’t exist when the 2nd amendment was created but they’re still protected as arms. Also, basic internet concept, if you do something on someone else’s property, it’s theirs. This site is also a source of training material for ML and has been for a very long time.
- bewaretheirs 1 year ago> Bullets didn’t exist when the 2nd amendment was created
Uh, what? Wikipedia dates the first bullet to the 13th century:
> Fire lance barrels made of metal appeared by 1276.[30] Earlier in 1259 a pellet wad that filled the barrel was recorded to have been used as a fire lance projectile, making it the first recorded bullet in history
- api_or_ipa 1 year agoI think they’re referring to modern cartridge bullets, as opposed to little lead balls stored separately from the powder and whatever else is rammed down the barrel in preparation for the next short.
Indeed the first integrated cartridges were developed around 1808 https://en.m.wikipedia.org/wiki/Cartridge_(firearms)
- api_or_ipa 1 year ago
- bewaretheirs 1 year ago
- SAI_Peregrinus 1 year ago
- jelder 1 year agoSeems like a good time to stop posting original content on Reddit for free.
- pavel_lishin 1 year agoYeah - I was going to keep my existing stuff up, but I think I'll clear out my account's previous posts and comments.
- cobertos 1 year agoI deleted all the content on my main account. Most of it wasn't useful, but a couple of posts and threads ranked well on Google and I like to think that hurts the site just a little bit.
- jtriangle 1 year agoI used a rust port of shreddit to delete all my posts across all my accounts, such that when logged in, there are no posts on my profile, however, I spent about a month on and off googling my username and finding posts that didn't stay deleted but wouldn't show up on my profile.
So I'd guess that reddit somehow restored some number of posts, and it seems to occasionally continue to do so.
In terms of hurting the site, it absolutely does. Reddit is a ghost town, and I find deleted posts constantly that google thinks are relevant. It's a shame it had to be this way.
- jtriangle 1 year ago
- hugi 1 year agoI've been working on deleting my reddit posts over the past year. The site now feels like it's almost 100% bots, which I find more than a little sad.
- al_borland 1 year agoI wrote myself a little python script to do this a while back. I’m not sure that it will still work due to their API changes.
A longer while back I wrote a little JS bookmarklet to do it. It could just do a page at a time, which was annoying, but not too bad. However, when they would change the site, it would stop working and need to be fixed.
Remember to edit the comment before you delete it. From what I read, deleting a comment just sets a flag on the comment as delete, so it’s still in the DB for them to sell. Making it garbage text will kill the value of the comment in the DB as well, and probably really screw with the AI trying to train from it.
- nickthegreek 1 year agoThere are scripts that can do this all for you in an instant.
- al_borland 1 year ago
- aldarisbm 1 year agogood luck on actually "deleting" your data
- FireBeyond 1 year agoThe best that can be done, where possible, is to edit each comment down to whitespace, save that, and then delete it. But yeah, probably still not technically good enough.
- FireBeyond 1 year ago
- cobertos 1 year ago
- 1 year ago
- wetpaws 1 year ago[dead]
- pavel_lishin 1 year ago
- goles 1 year agoReuters article cites Bloomberg article here: https://www.bloomberg.com/news/articles/2024-02-16/reddit-is...
https://news.ycombinator.com/item?id=39404051 (15 hours ago, 29 points)
- m0guz 1 year agoI am glad I trashed my reddit posts/comments before deleting accounts with shreddit [0].
- jtriangle 1 year agoI did the same, but, it seems reddit has some capability to restore posts anyway, as I keep finding original posts of mine via google while my logged in profile remains blank...
It's to the point where I search for them every few weeks and take time to edit and delete them manually, after which they seem to stay gone.
My best guess is that they can detect mass deletion and have some sort of automation that restores posts at (seemingly ) random. Either that or their platform is broken enough that editing or deleting posts isn't reliably committed to disk.
- m0guz 1 year agoI had tried manually delete post/comments before shreddit, half of the deleted post/comments returned after refreshing the page. Checking the requests, many of them would return 500 status code.
Later move to shreddit and created a cronjob to delete the entries, and kept shreddit running a week or so. As you suggested, you will hit Reddit's rate limit soon after start mass deleting or your account shadow-banned.
Just checked two of my deleted account, can't see any post or comment. I wish I didn't delete it, just overwrite them with random sentences from local AI
- 1 year ago
- GRISELDA 1 year agoHacker News users love to act like they are the most intelligent people on the internet but in reality they have no idea what they are talking about if it isn't about some obscure programming language.
There is no conspiracy to restore deleted comments lol. You can only retrieve 1000 items with Reddit's API so when you use Shreddit to delete your posts only 1000 are deleted. Everything else before that remains untouched. Use PullPush to really delete everything.
- jtriangle 1 year agoYou misunderstand, likely because you were too excited to be snarky.
My reddit profile was empty, I'm aware of the api limitations, I read the readme.md, shreddit ran on a loop for 24 hours, each time pulling 1000 posts, editing, and deleting them. My profile is still empty, and currently my google results are empty, but in another few weeks some will pop up again, but still won't show in my reddit profile.
Clear enough for you?
- jtriangle 1 year ago
- m0guz 1 year ago
- 1 year ago
- jtriangle 1 year ago
- CamelCaseName 1 year ago> worth $60 million on an annualized basis
Reddit is pulling out all the stops for their upcoming IPO and it still amounts to nearly nothing.
Bringing back r/place to juice user count, killing the API, destroying their mobile site, and that's just the start.
That's why they are only planning a "very small float", there's simply no interest.
It seems even at just $5 billion, the valuation is too rich.
- data-ottawa 1 year agoI'm not surprised they made this deal, but $60M feels low for the cost of killing their 3P APIs.
I can only assume the price is because the license is non-exclusive and they think they can get other big fish to bite.
They can offer higher quality and more timely data than scrapers can, so there is a value proposition there.
- ametrau 1 year agoKilling the api was a massive mistake. They should have worked with appolo et al to broker a deal. They had a the best of both worlds, other people tackling the app development for them and they just needed to take a reasonable rent.
- antisthenes 1 year agoI'm actually curious if Reddit's going to be the first successful example of monetizing what is essentially a community into an IPO fast enough before normies get on it and start flooding it with their low-effort takes and celebrity gossip drivel.
I know this is possible to do by other communities like paid forums (SomethingAwful), or grow into an adjacent e-commerce supported community (e.g. bodybuilding forums), but is it possible to do it on Reddit's scale? We'll see.
I have no horse in the race either way, most of my posts on there are from the Reddit vs Digg era, so I'm not really invested.
- data-ottawa 1 year ago
- 34679 1 year agoReddit's MO is snark and sarcasm. It's hard for me to imagine a scenario where a LLM trained on reddit would be useful for anything serious. How do they propose to seperate fact from fiction?
- pier25 1 year agoThat's just the tip of the iceberg.
There's a reason plenty of people append "reddit" to their google searches.
- data-ottawa 1 year agoThe main reasons are to get local content, or to avoid SEO spam sites which provide overly verbose listicles or "review" sites that link out to Amazon affiliate links.
Reddit is still one of the few sites left that provide user content openly. FB+Instagram+Twitter are entirely inaccessible if you don't have an account, and a lot of forums do things like only show images to logged in users.
I've found the Reddit experience much worse with recent changes. When you land on Reddit from a Google search comment threads are only 1 level deep with a max of only a few replies shown, so you have to load a new page for each response you want to read. It's one of the worst UX I've seen considering that landing from search is probably the most financially lucrative use patterns Reddit has.
- fragmede 1 year agowhich is really fascinating in the face of TikTok being Googleable. like, it's still not there, but being able to Google for a TikTok but not an Instagram reel is something.
- fragmede 1 year ago
- data-ottawa 1 year ago
- anotherhue 1 year agoYou needn't wonder, though it's a little behind the times now.
- AznHisoka 1 year agoI don’t know if it’s just me but Reddit is probably the most toxic, unfriendly community I’ve been in. Anytime I make a comment, it’s immediately downvoted to oblivion if it doesn’t agree with the hive mind, or is unpopular. And if I ask a simple question, it gets downvoted if it’s perceived as “too stupid” or obvious. Maybe it’s just me though
- bjelkeman-again 1 year agoThere are a number of smaller subreddits with good moderation and good discussions. Much of the bigger once are not good though. IMHO.
- CuriouslyC 1 year agoNo, that's pretty much most reddit subs.
- bjelkeman-again 1 year ago
- isthatafact 1 year agoI am really hoping someone makes an AI version of reddit where each user can easily control and adjust the type of posts and interactions they experience.
Just imagine how great reddit would be if not for the other users, the moderators, the admins, and the CEO.
- pavel_lishin 1 year agoThere are a lot of subreddits, and they vary wildly. Something trained on r/askhistorians could spew out a lot of very plausible sounding bullshit.
- pier25 1 year ago
- tonystubblebine 1 year agoI’m not surprised. There had been some talks among various platforms to form a coalition and the reason we (Medium) thought they broke down was because people were trying to cut individual deals. I think this is overall bad for the internet because it cuts creators out of the decision and compensation.
- cuckatoo 1 year agoThey could've saved themselves $60M by just downloading the torrent of the data.
- bryan0 1 year agoNot if you include expected future legal fees. This also seems to imply if you train on Reddit data without a license Reddit will sue you.
- timeon 1 year agoBut there is lot of content on Reddit wich is not original. Often just screen-shoted/reposted licensed content. Will this be 'reddit-washing' of original licenses?
- timeon 1 year ago
- bryan0 1 year ago
- selivanovp 1 year agoReddit became a weird place in recent months. Rampant propaganda wars in most subreddits, recommending system pushing ridiculous stories to front page feed from subreddits I've never visited before. Even the niche technical subreddits I used to enjoy becoming a battlefield regularly.
- SleepilyLimping 1 year agoIt's a US election year /tinfoil
- SleepilyLimping 1 year ago
- 1 year ago
- fallingknife 1 year agoAm I the only one who couldn't care less if something I write is used to train an LLM? Much better than the current tracking that is the norm.
- pier25 1 year agoGood time to create a community for artists that doesn't allow AI scraping.
- bjelkeman-again 1 year agoMaybe something like https://www.royalroad.com
- bjelkeman-again 1 year ago
- vinni2 1 year agoTime to stop posting stuff on Reddit.
- blindriver 1 year agoThis is exactly my point for many years now. Reddit exists because of the free work of the various subreddit communities and their moderators. And in return they get nothing. But the company, the investors and employees will all become rich off of it. It’s weird how no one cares that Reddit isn’t trickling down anything to the people that make the site a success.
- chollida1 1 year agoEvery year, I pay to play hockey. I pay for new equipment, jerseys, referees and ice time.
And the arena gets paid by selling food to people who come to my games, sell advertising on the boards, beer to fans.
What do I get and why is none of this money trickling down to me?
Clearly I get the enjoyment of being Canadian and playing hockey.
Reditors are no different. None of them were tricked, they participate on the site by writing comments or submitting articles because the get enjoyment out of it.
I don't expect to see a dime from the arena when I play hockey and they make money from it and no redditor expects to see a dime when they participate on the site and the site makes money form it.
If you think its weird that a for profit company is making money and no one is complaining, its because everyone went into the deal knowing exactly what's what. No one was tricked or deceived.
- mandmandam 1 year ago> None of them were tricked
They were tricked from day one, when the founders pretended to be different people to make the site look busier than it really was.
They are tricked every day by bots, troll farms, spammers, astroturfers, bought-out moderators, corrupt admins, etc.
Go look at the founders page, where Aaron Swartz used to be.
Look into who maxwellhill probably was (first Redditor to a million karma and mod of some huge and deranged subs like worldnews).
Look into how certain keywords get shadowbanned.
Look into the mod and admin cabals with their private agendas.
Look into the way many national subs were taken over in quiet coups.
There are nice things about Reddit, even today, but the idea that users know what they're getting into is deeply naive.
- chollida1 1 year agoI mean, those are all things that happened on reddit but that has nothing to do with what I said.
I was very specifically talking about users not being tricked into thinking they were going to be paid for posting content on reddit.
Your comment, as far as I can tell, has nothing at all to do with that.
Did you mean to reply to a different comment?
- chollida1 1 year ago
- FireBeyond 1 year agoAgreeing to write content for free to participate in a community is one thing.
How many Reddit users knew they were agreeing in the future for Reddit to sell the content they wrote to make money for Reddit, not them?
> no redditor expects to see a dime when they participate on the site and the site makes money form it.
Plenty of Redditors would disagree with you, and I'm not sure why you're acting like this is obvious. If I hadn't already deleted all my content and left because of the last debacle, I would be doing so for this.
- yanderekko 1 year ago>How many Reddit users knew they were agreeing in the future for Reddit to sell the content they wrote to make money for Reddit, not them?
Despite my low opinion of Redditors, I believe that on some level they are aware of the principle that if the product is free, then you are the product.
If you presented the regular users with the choice between "pay a subscription fee and opt out or let us use your data in these ways", the vast majority will end up choosing the latter and we all know it.
- chollida1 1 year ago>> no redditor expects to see a dime when they participate on the site and the site makes money form it.
> Plenty of Redditors would disagree with you, and I'm not sure why you're acting like this is obvious. If I hadn't already deleted all my content and left because of the last debacle, I would be doing so for this.
Really?
Ok, let's say reddit only sold adds for revenue.
What percentage of redditors do you think would feel justified to a percentage of reddit's ad revenue? Because the only reason the ads have value is because of the redditors themselves?
- yanderekko 1 year ago
- rglullis 1 year ago> If you think its weird that a for profit company is making money and no one is complaining, its because everyone went into the deal knowing exactly what's what. No one was tricked or deceived.
The deal never included that they would appropriating the community content and selling it as their own, but that could try to make money from user-generated content, and in turn they would keep it as open as possible: an easy to use API, third-party clients, RSS feeds for every subreddit and even posts and comments, etc.
They changed the deal. People are right to be upset with the new terms.
- mandmandam 1 year ago
- stainablesteel 1 year agothe social ostracization is particularly strong on reddit, any revenue sharing would immediately come into a conflict with the intelligence agencies that run their psyops there
and its even worse because its basically the best formatted social media with the worst demographics now, aka most potential and worst execution, so its a dataset of decreasing size compared to other social media now
when it comes to training its hypothetically a particularly great dataset because you can choose to include or exclude text topics as input based on subreddit or thread, its so well organized
- nutate 1 year agoAnd to celebrate the free for free internet of the past, google is finally finishing its acquisition of dejanews by shutting down their usenet indexing. https://support.google.com/groups/answer/11036538?visit_id=6...
- awb 1 year agoMods/users get access to a massive user base and a world-class platform at no acquisition or operational cost to them.
- 1 year ago
- tuwtuwtuwtuw 1 year agoI don't see how that is weird at all. Were you paid by ycombinator for your message? People often do things they enjoy doing without receiving a payment for it.
- collingreen 1 year agoI think this analogy is flawed - the hn equivalent of posting is the same as Reddit and then it changes and gets weird if hn starts selling your writing.
I'm not sure why folks are trying to say that going to Reddit and doing activity where the price of admission is ads is the same as doing activity where the price of admission is they own your writing and sell it. You may be fine with it but they seem clearly distinct to me -- enough to be worth talking about instead of dismissing.
- tuwtuwtuwtuw 1 year agoWhat is stopping Hackernews from selling the content people post here?
- tuwtuwtuwtuw 1 year ago
- collingreen 1 year ago
- 1 year ago
- s1k3s 1 year agoYou're saying it as if someone from Reddit came into your house and forced you to become a moderator or user who submits content. Is that true? Of course not, people are willingly creating communities and willingly submit content to the site.
Reddit exists because millions of people like it. Reddit also exists because hundreds of developers created it while other people are paying for its infrastructure.
- chollida1 1 year ago