Reddit signs $60M content licensing deal with AI company

74 points by tonystubblebine 1 year ago | 79 comments
  • leobg 1 year ago
    How can they license something that they didn’t author? Yes, they have TOS. But training generative AI wasn’t something that existed when ~99% of Reddit’s content was created, hence users could not possibly have consented to it. Besides, at least in Germany, TOS cannot contain regulations that are “surprising” or “unexpected”. Using my content to serve ads is one thing I might expect. But licensing it out for a fee to third parties? I don’t think so.
    • SAI_Peregrinus 1 year ago
      > When Your Content is created with or submitted to the Services, you grant us a worldwide, royalty-free, perpetual, irrevocable, non-exclusive, transferable, and sublicensable license to use, copy, modify, adapt, prepare derivative works of, distribute, store, perform, and display Your Content and any name, username, voice, or likeness provided in connection with Your Content in all media formats and channels now known or later developed anywhere in the world. This license includes the right for us to make Your Content available for syndication, broadcast, distribution, or publication by other companies, organizations, or individuals who partner with Reddit. You also agree that we may remove metadata associated with Your Content, and you irrevocably waive any claims and assertions of moral rights or attribution with respect to Your Content.

      From the oldest version of their ToS[1]. This is unchanged in the newest versions even for the EEA[2]. It seems pretty clearly that whatever AI training is doing is covered by "use, copy, modify, adapt, prepare derivative works of, distribute, store, perform, and display" in "media formats and channels now known or later developed anywhere in the world" (emphasis mine).

      [1] https://www.redditinc.com/policies/user-agreement-october-15...

      [2] https://www.redditinc.com/policies/user-agreement-february-1...

      • ySteeK 1 year ago
        At least in Germany, such agreements are afaik invalid and without a severability clause, possibly all others too. Simply because something like copyright cannot be assigned in Germany. Secondly, there are ways to use Reddit without ever having agreed to the ToS.
        • SAI_Peregrinus 1 year ago
          That clause does not assign copyright. You explicitly keep your own copyright (in the previous clause, I didn't reproduce it above). You just grant them a license to use your content in the ways they listed.
          • fragmede 1 year ago
            how do you make a comment without creating an account which requires you to agree to the tos?
            • 1 year ago
            • miohtama 1 year ago
              This is (hopefully) the major difference between web 2.0 and Web3. In the latter, the goal is to build services where you actually own your content.

              Remains to see if this actually can happen.

            • neom 1 year ago
              People have actually been doing stuff like this since way before the LLM thing, I've bought books containing collections of stories from websites.

              Craigslist Confessional: A Collection of Secrets from Anonymous Strangers https://www.amazon.com/Craigslist-Confessional-Collection-Se...

              PostSecret: Extraordinary Confessions from Ordinary Lives https://www.amazon.com/PostSecret-Extraordinary-Confessions-...

              Stoned, Naked, and Looking in My Neighbor's Window: The Best Confessions from GroupHug.us - https://www.amazon.com/Stoned-Naked-Looking-Neighbors-Window... (actually a great book)

              People can and will profit from things you do in life for free, I feel like we accepted that a very long time ago?

              • nroets 1 year ago
                Perhaps it isn't legally a license deal, but rather unlimited scraping i.e. database access.

                The AI company just trains their models on that and aren't creating derivative work in the legal sense.

                • FireBeyond 1 year ago
                  That's not going to fly in any court.

                  "We didn't license it to them for the express purpose of training your model on this data, we only gave them database access for the express purpose of training their model on this data."

                • AznHisoka 1 year ago
                  FWIW, they already have been licensing their data for years to social media management platforms (ie SproutSocial, Sprinklr)
                  • PM_me_your_math 1 year ago
                    Because it is reddit, one of the most vile companies on the planet.
                    • andrewinardeer 1 year ago
                      Says the guy with the Reddit meme username on Hacker News.
                    • rakoo 1 year ago
                      > How can they license something that they didn’t author?

                      Capitalism in a nutshell

                      • mtillman 1 year ago
                        It’s an American company. Bullets didn’t exist when the 2nd amendment was created but they’re still protected as arms. Also, basic internet concept, if you do something on someone else’s property, it’s theirs. This site is also a source of training material for ML and has been for a very long time.
                        • bewaretheirs 1 year ago
                          > Bullets didn’t exist when the 2nd amendment was created

                          Uh, what? Wikipedia dates the first bullet to the 13th century:

                          > Fire lance barrels made of metal appeared by 1276.[30] Earlier in 1259 a pellet wad that filled the barrel was recorded to have been used as a fire lance projectile, making it the first recorded bullet in history

                          https://en.wikipedia.org/wiki/Gun#Transition_to_true_guns

                          • api_or_ipa 1 year ago
                            I think they’re referring to modern cartridge bullets, as opposed to little lead balls stored separately from the powder and whatever else is rammed down the barrel in preparation for the next short.

                            Indeed the first integrated cartridges were developed around 1808 https://en.m.wikipedia.org/wiki/Cartridge_(firearms)

                      • jelder 1 year ago
                        Seems like a good time to stop posting original content on Reddit for free.
                        • pavel_lishin 1 year ago
                          Yeah - I was going to keep my existing stuff up, but I think I'll clear out my account's previous posts and comments.
                          • cobertos 1 year ago
                            I deleted all the content on my main account. Most of it wasn't useful, but a couple of posts and threads ranked well on Google and I like to think that hurts the site just a little bit.
                            • jtriangle 1 year ago
                              I used a rust port of shreddit to delete all my posts across all my accounts, such that when logged in, there are no posts on my profile, however, I spent about a month on and off googling my username and finding posts that didn't stay deleted but wouldn't show up on my profile.

                              So I'd guess that reddit somehow restored some number of posts, and it seems to occasionally continue to do so.

                              In terms of hurting the site, it absolutely does. Reddit is a ghost town, and I find deleted posts constantly that google thinks are relevant. It's a shame it had to be this way.

                            • hugi 1 year ago
                              I've been working on deleting my reddit posts over the past year. The site now feels like it's almost 100% bots, which I find more than a little sad.
                              • al_borland 1 year ago
                                I wrote myself a little python script to do this a while back. I’m not sure that it will still work due to their API changes.

                                A longer while back I wrote a little JS bookmarklet to do it. It could just do a page at a time, which was annoying, but not too bad. However, when they would change the site, it would stop working and need to be fixed.

                                Remember to edit the comment before you delete it. From what I read, deleting a comment just sets a flag on the comment as delete, so it’s still in the DB for them to sell. Making it garbage text will kill the value of the comment in the DB as well, and probably really screw with the AI trying to train from it.

                                • nickthegreek 1 year ago
                                  There are scripts that can do this all for you in an instant.
                                • aldarisbm 1 year ago
                                  good luck on actually "deleting" your data
                                  • FireBeyond 1 year ago
                                    The best that can be done, where possible, is to edit each comment down to whitespace, save that, and then delete it. But yeah, probably still not technically good enough.
                                • 1 year ago
                                  • wetpaws 1 year ago
                                    [dead]
                                  • goles 1 year ago
                                    • m0guz 1 year ago
                                      I am glad I trashed my reddit posts/comments before deleting accounts with shreddit [0].

                                      [0] https://github.com/andrewbanchich/shreddit

                                      • jtriangle 1 year ago
                                        I did the same, but, it seems reddit has some capability to restore posts anyway, as I keep finding original posts of mine via google while my logged in profile remains blank...

                                        It's to the point where I search for them every few weeks and take time to edit and delete them manually, after which they seem to stay gone.

                                        My best guess is that they can detect mass deletion and have some sort of automation that restores posts at (seemingly ) random. Either that or their platform is broken enough that editing or deleting posts isn't reliably committed to disk.

                                        • m0guz 1 year ago
                                          I had tried manually delete post/comments before shreddit, half of the deleted post/comments returned after refreshing the page. Checking the requests, many of them would return 500 status code.

                                          Later move to shreddit and created a cronjob to delete the entries, and kept shreddit running a week or so. As you suggested, you will hit Reddit's rate limit soon after start mass deleting or your account shadow-banned.

                                          Just checked two of my deleted account, can't see any post or comment. I wish I didn't delete it, just overwrite them with random sentences from local AI

                                          • 1 year ago
                                            • GRISELDA 1 year ago
                                              Hacker News users love to act like they are the most intelligent people on the internet but in reality they have no idea what they are talking about if it isn't about some obscure programming language.

                                              There is no conspiracy to restore deleted comments lol. You can only retrieve 1000 items with Reddit's API so when you use Shreddit to delete your posts only 1000 are deleted. Everything else before that remains untouched. Use PullPush to really delete everything.

                                              • jtriangle 1 year ago
                                                You misunderstand, likely because you were too excited to be snarky.

                                                My reddit profile was empty, I'm aware of the api limitations, I read the readme.md, shreddit ran on a loop for 24 hours, each time pulling 1000 posts, editing, and deleting them. My profile is still empty, and currently my google results are empty, but in another few weeks some will pop up again, but still won't show in my reddit profile.

                                                Clear enough for you?

                                            • 1 year ago
                                            • CamelCaseName 1 year ago
                                              > worth $60 million on an annualized basis

                                              Reddit is pulling out all the stops for their upcoming IPO and it still amounts to nearly nothing.

                                              Bringing back r/place to juice user count, killing the API, destroying their mobile site, and that's just the start.

                                              That's why they are only planning a "very small float", there's simply no interest.

                                              It seems even at just $5 billion, the valuation is too rich.

                                              • data-ottawa 1 year ago
                                                I'm not surprised they made this deal, but $60M feels low for the cost of killing their 3P APIs.

                                                I can only assume the price is because the license is non-exclusive and they think they can get other big fish to bite.

                                                They can offer higher quality and more timely data than scrapers can, so there is a value proposition there.

                                                • ametrau 1 year ago
                                                  Killing the api was a massive mistake. They should have worked with appolo et al to broker a deal. They had a the best of both worlds, other people tackling the app development for them and they just needed to take a reasonable rent.
                                                  • antisthenes 1 year ago
                                                    I'm actually curious if Reddit's going to be the first successful example of monetizing what is essentially a community into an IPO fast enough before normies get on it and start flooding it with their low-effort takes and celebrity gossip drivel.

                                                    I know this is possible to do by other communities like paid forums (SomethingAwful), or grow into an adjacent e-commerce supported community (e.g. bodybuilding forums), but is it possible to do it on Reddit's scale? We'll see.

                                                    I have no horse in the race either way, most of my posts on there are from the Reddit vs Digg era, so I'm not really invested.

                                                    • 1 year ago
                                                  • 34679 1 year ago
                                                    Reddit's MO is snark and sarcasm. It's hard for me to imagine a scenario where a LLM trained on reddit would be useful for anything serious. How do they propose to seperate fact from fiction?
                                                    • pier25 1 year ago
                                                      That's just the tip of the iceberg.

                                                      There's a reason plenty of people append "reddit" to their google searches.

                                                      • data-ottawa 1 year ago
                                                        The main reasons are to get local content, or to avoid SEO spam sites which provide overly verbose listicles or "review" sites that link out to Amazon affiliate links.

                                                        Reddit is still one of the few sites left that provide user content openly. FB+Instagram+Twitter are entirely inaccessible if you don't have an account, and a lot of forums do things like only show images to logged in users.

                                                        I've found the Reddit experience much worse with recent changes. When you land on Reddit from a Google search comment threads are only 1 level deep with a max of only a few replies shown, so you have to load a new page for each response you want to read. It's one of the worst UX I've seen considering that landing from search is probably the most financially lucrative use patterns Reddit has.

                                                        • fragmede 1 year ago
                                                          which is really fascinating in the face of TikTok being Googleable. like, it's still not there, but being able to Google for a TikTok but not an Instagram reel is something.
                                                      • anotherhue 1 year ago
                                                        You needn't wonder, though it's a little behind the times now.

                                                        https://www.reddit.com/r/SubSimulatorGPT2/

                                                        • AznHisoka 1 year ago
                                                          I don’t know if it’s just me but Reddit is probably the most toxic, unfriendly community I’ve been in. Anytime I make a comment, it’s immediately downvoted to oblivion if it doesn’t agree with the hive mind, or is unpopular. And if I ask a simple question, it gets downvoted if it’s perceived as “too stupid” or obvious. Maybe it’s just me though
                                                          • bjelkeman-again 1 year ago
                                                            There are a number of smaller subreddits with good moderation and good discussions. Much of the bigger once are not good though. IMHO.
                                                            • CuriouslyC 1 year ago
                                                              No, that's pretty much most reddit subs.
                                                            • isthatafact 1 year ago
                                                              I am really hoping someone makes an AI version of reddit where each user can easily control and adjust the type of posts and interactions they experience.

                                                              Just imagine how great reddit would be if not for the other users, the moderators, the admins, and the CEO.

                                                              • pavel_lishin 1 year ago
                                                                There are a lot of subreddits, and they vary wildly. Something trained on r/askhistorians could spew out a lot of very plausible sounding bullshit.
                                                              • tonystubblebine 1 year ago
                                                                I’m not surprised. There had been some talks among various platforms to form a coalition and the reason we (Medium) thought they broke down was because people were trying to cut individual deals. I think this is overall bad for the internet because it cuts creators out of the decision and compensation.
                                                                • cuckatoo 1 year ago
                                                                  They could've saved themselves $60M by just downloading the torrent of the data.
                                                                  • bryan0 1 year ago
                                                                    Not if you include expected future legal fees. This also seems to imply if you train on Reddit data without a license Reddit will sue you.
                                                                    • timeon 1 year ago
                                                                      But there is lot of content on Reddit wich is not original. Often just screen-shoted/reposted licensed content. Will this be 'reddit-washing' of original licenses?
                                                                  • selivanovp 1 year ago
                                                                    Reddit became a weird place in recent months. Rampant propaganda wars in most subreddits, recommending system pushing ridiculous stories to front page feed from subreddits I've never visited before. Even the niche technical subreddits I used to enjoy becoming a battlefield regularly.
                                                                  • 1 year ago
                                                                    • fallingknife 1 year ago
                                                                      Am I the only one who couldn't care less if something I write is used to train an LLM? Much better than the current tracking that is the norm.
                                                                      • pier25 1 year ago
                                                                        Good time to create a community for artists that doesn't allow AI scraping.
                                                                      • vinni2 1 year ago
                                                                        Time to stop posting stuff on Reddit.
                                                                        • blindriver 1 year ago
                                                                          This is exactly my point for many years now. Reddit exists because of the free work of the various subreddit communities and their moderators. And in return they get nothing. But the company, the investors and employees will all become rich off of it. It’s weird how no one cares that Reddit isn’t trickling down anything to the people that make the site a success.
                                                                          • chollida1 1 year ago
                                                                            Every year, I pay to play hockey. I pay for new equipment, jerseys, referees and ice time.

                                                                            And the arena gets paid by selling food to people who come to my games, sell advertising on the boards, beer to fans.

                                                                            What do I get and why is none of this money trickling down to me?

                                                                            Clearly I get the enjoyment of being Canadian and playing hockey.

                                                                            Reditors are no different. None of them were tricked, they participate on the site by writing comments or submitting articles because the get enjoyment out of it.

                                                                            I don't expect to see a dime from the arena when I play hockey and they make money from it and no redditor expects to see a dime when they participate on the site and the site makes money form it.

                                                                            If you think its weird that a for profit company is making money and no one is complaining, its because everyone went into the deal knowing exactly what's what. No one was tricked or deceived.

                                                                            • mandmandam 1 year ago
                                                                              > None of them were tricked

                                                                              They were tricked from day one, when the founders pretended to be different people to make the site look busier than it really was.

                                                                              They are tricked every day by bots, troll farms, spammers, astroturfers, bought-out moderators, corrupt admins, etc.

                                                                              Go look at the founders page, where Aaron Swartz used to be.

                                                                              Look into who maxwellhill probably was (first Redditor to a million karma and mod of some huge and deranged subs like worldnews).

                                                                              Look into how certain keywords get shadowbanned.

                                                                              Look into the mod and admin cabals with their private agendas.

                                                                              Look into the way many national subs were taken over in quiet coups.

                                                                              There are nice things about Reddit, even today, but the idea that users know what they're getting into is deeply naive.

                                                                              • chollida1 1 year ago
                                                                                I mean, those are all things that happened on reddit but that has nothing to do with what I said.

                                                                                I was very specifically talking about users not being tricked into thinking they were going to be paid for posting content on reddit.

                                                                                Your comment, as far as I can tell, has nothing at all to do with that.

                                                                                Did you mean to reply to a different comment?

                                                                              • FireBeyond 1 year ago
                                                                                Agreeing to write content for free to participate in a community is one thing.

                                                                                How many Reddit users knew they were agreeing in the future for Reddit to sell the content they wrote to make money for Reddit, not them?

                                                                                > no redditor expects to see a dime when they participate on the site and the site makes money form it.

                                                                                Plenty of Redditors would disagree with you, and I'm not sure why you're acting like this is obvious. If I hadn't already deleted all my content and left because of the last debacle, I would be doing so for this.

                                                                                • yanderekko 1 year ago
                                                                                  >How many Reddit users knew they were agreeing in the future for Reddit to sell the content they wrote to make money for Reddit, not them?

                                                                                  Despite my low opinion of Redditors, I believe that on some level they are aware of the principle that if the product is free, then you are the product.

                                                                                  If you presented the regular users with the choice between "pay a subscription fee and opt out or let us use your data in these ways", the vast majority will end up choosing the latter and we all know it.

                                                                                  • chollida1 1 year ago
                                                                                    >> no redditor expects to see a dime when they participate on the site and the site makes money form it.

                                                                                    > Plenty of Redditors would disagree with you, and I'm not sure why you're acting like this is obvious. If I hadn't already deleted all my content and left because of the last debacle, I would be doing so for this.

                                                                                    Really?

                                                                                    Ok, let's say reddit only sold adds for revenue.

                                                                                    What percentage of redditors do you think would feel justified to a percentage of reddit's ad revenue? Because the only reason the ads have value is because of the redditors themselves?

                                                                                  • rglullis 1 year ago
                                                                                    > If you think its weird that a for profit company is making money and no one is complaining, its because everyone went into the deal knowing exactly what's what. No one was tricked or deceived.

                                                                                    The deal never included that they would appropriating the community content and selling it as their own, but that could try to make money from user-generated content, and in turn they would keep it as open as possible: an easy to use API, third-party clients, RSS feeds for every subreddit and even posts and comments, etc.

                                                                                    They changed the deal. People are right to be upset with the new terms.

                                                                                  • stainablesteel 1 year ago
                                                                                    the social ostracization is particularly strong on reddit, any revenue sharing would immediately come into a conflict with the intelligence agencies that run their psyops there

                                                                                    and its even worse because its basically the best formatted social media with the worst demographics now, aka most potential and worst execution, so its a dataset of decreasing size compared to other social media now

                                                                                    when it comes to training its hypothetically a particularly great dataset because you can choose to include or exclude text topics as input based on subreddit or thread, its so well organized

                                                                                    • nutate 1 year ago
                                                                                      And to celebrate the free for free internet of the past, google is finally finishing its acquisition of dejanews by shutting down their usenet indexing. https://support.google.com/groups/answer/11036538?visit_id=6...
                                                                                      • awb 1 year ago
                                                                                        Mods/users get access to a massive user base and a world-class platform at no acquisition or operational cost to them.
                                                                                        • 1 year ago
                                                                                          • tuwtuwtuwtuw 1 year ago
                                                                                            I don't see how that is weird at all. Were you paid by ycombinator for your message? People often do things they enjoy doing without receiving a payment for it.
                                                                                            • collingreen 1 year ago
                                                                                              I think this analogy is flawed - the hn equivalent of posting is the same as Reddit and then it changes and gets weird if hn starts selling your writing.

                                                                                              I'm not sure why folks are trying to say that going to Reddit and doing activity where the price of admission is ads is the same as doing activity where the price of admission is they own your writing and sell it. You may be fine with it but they seem clearly distinct to me -- enough to be worth talking about instead of dismissing.

                                                                                              • tuwtuwtuwtuw 1 year ago
                                                                                                What is stopping Hackernews from selling the content people post here?
                                                                                            • 1 year ago
                                                                                              • s1k3s 1 year ago
                                                                                                You're saying it as if someone from Reddit came into your house and forced you to become a moderator or user who submits content. Is that true? Of course not, people are willingly creating communities and willingly submit content to the site.

                                                                                                Reddit exists because millions of people like it. Reddit also exists because hundreds of developers created it while other people are paying for its infrastructure.