SQL co-creator embraces NoSQL

39 points by damethos 1 year ago | 53 comments
  • hn_throwaway_99 1 year ago
    There was an article from over a decade ago that, for many developers, "NoSQL" was really about "No DBA": https://martinfowler.com/bliki/NoDBA.html.

    That is, in many companies, DB schema changes require a painful, slow, "multiple approvals required" process. But devs found out that DB admins didn't really care about NoSQL data stores for all the reason Martin Fowler talks about in that article. So they'd bring in NoSQL data stores specifically to hack around slow internal processes.

    I definitely found that to be the case at at least 1 previous company I worked at. These days, I can certainly understand the rationale to use a transient caching layer like Redis, but Postgres with JSON columns is going to be a better choice 95% of the time vs. what people used to use, say, MongoDB for.

    • chaps 1 year ago
      Indeed. Having worked in change management for a time and having to use change management software at larger companies... fuck. It's all so bad. Slow and buggy because each time you fill out a field, it has to validate it before you can even move onto the next step (don't hit tab before the validation!). And made worse with single-page-webapp-tabs that wipe your changes from the previous tab if you don't speak the correct incantation. It's all like that!

      DBA changes were always the worst because it requires basically every team to fill out one of those forms. So DBAs basically turn into de facto change managers. Not sure I blame them for trying to find ways to get around all that nonsense.

      • chuckadams 1 year ago
        Vaguely apropos: I used to have a T-shirt that listed all 9 OSI layers: Physical, Data Link, Network, Transport, Session, Presentation, Application, Financial, and Political.
      • pull_my_finger 1 year ago
        Some important context..

        > A long-time IBMer, Chamberlin is now semi-retired, but finds time to fulfill a role as a technical advisor for NoSQL company Couchbase.

        • yodon 1 year ago
          I suspect your comment is meant to imply his role at a NoSQL company invalidates his opinion.

          With a resume and history like his, he can get a role as a technical advisor pretty much anywhere he wants at any time.

          The fact that he chose a NoSQL firm is an extremely strong vote of confidence in NoSQL.

          • candiddevmike 1 year ago
            Vote of confidence in NoSQL to do what? There will never be a "winner" between NoSQL, SQL, NewSQL, etc, because they each fulfill a different purpose and have different trade offs.
            • alex_lav 1 year ago
              "Invalidates" certainly seems like a stretch, but with all opinion-posts I definitely think it's important to understand a person's biases and incentives.
              • hu3 1 year ago
                It's important to disclose potential bias.

                Life's full of nuances.

                • tanelpoder 1 year ago
                  Or the fact that Couchbase wants to get even deeper into building a SQL layer (SQL++) on top of their datastore.
              • sverhagen 1 year ago
                While the startup I worked at, a few years back, was decidedly dabbling in the NoSQL space, using Elasticsearch, OrientDB, and Neo4j, the rockstar dba that we hired to help our crowd of mostly Java folks, was very convincing that everything those newlings could do, SQL could do perfectly well too. There are many optimizations that those Java devs couldn't dream of, and then there are recursive SQL queries, or JSON columns, to address specific features that NoSQL or graph database are supposed to be good at. I think there's a place for both, but it certainly has made me more critical about blunt assertions that SQL's time is over. Meanwhile, investors have stopped asking for NoSQL (and for blockchains, too) and NoSQL databases probably are less of a "resume builder" anymore either. So, we can go back to using the best database for the job? (In most of my cases... PostgreSQL.)
                • dghlsakjg 1 year ago
                  > Donald Chamberlin, who first proposed the language with IBM colleague Raymond Boyce in a 1974 paper [PDF], explains that NoSQL databases and their query languages could help perform the tasks relational systems were never designed for.

                  So the news is that a database expert, in the employ of a noSQL database company, is saying not to use SQL databases for the things they aren't made for?

                  • andoando 1 year ago
                    I still don't understand the advantage of NoSQL over just defining a JSON column and additionally defining columns for the fields you want to query. Postgres lets you do queries over json documents too iirc.
                    • jchanimal 1 year ago
                      In the same way SQL encourages you to normalize your data for DRY representation, NoSQL encourages you to denormalize for fast retreival. I wrote about this more than a decade ago, and the basics haven't changed: https://web.archive.org/web/20160312130429/http://blog.couch...

                      [Note] I'm a cofounder of Couchbase, my new project is a realtime embedded JavaScript database with local-first sync: https://fireproof.storage

                      • SJC_Hacker 1 year ago
                        I guess materialized views are the middle ground here.

                        I had a DB with like 30 joins and it was dog slow. Granted it was SQLite. While SQLite didn't have MATERIALIZE VIEW, what I did was CREATE TABLE AS, in effect denormalizing my heavily normalized DB.

                        • zepolen 1 year ago
                          30 joins, immediately I'm thinking your schema was garbage.

                          But assuming it wasn't, and you actually needed to join across thirty(!) tables (did I mention that's insane btw), it still shouldn't be a problem assuming you had the correct indexes and are querying in a sane manner.

                        • daelon 1 year ago
                          Just a heads-up, your website title/headers are way too big on mobile and its kinda breaking the layout.
                          • jchanimal 1 year ago
                            Thanks, we recently started working with a designer, I can't way to share the new homepage soon.
                          • andoando 1 year ago
                            Nothing stops you from denormalizing in SQL though, you have the power to do both. The NoSQL paradigm has its merits, it just seems like it can be applied in SQL with far less limitations.
                          • Thaxll 1 year ago
                            JSON in PG is vastly inferior to MongoDB, that being performance and query friendliness from code.

                            Something that most people don't know, any time you add a new field in a jsonb doc, PG has to rewrite the entire document.

                            • wild_egg 1 year ago
                              Does MongoDB still throw safety out the window to win benchmarks?

                              I'll give you that the postgres query operators for JSON can be cumbersome, depending on your library support, but JSONB performance has been rock solid for a long time now and I've never had reason to fear my data suddenly vanishing

                              Maybe mongo reliability has improved in the last few years but they did enough reputational damage in the decade prior that I'll never have trust in systems built on top of it

                              • simonw 1 year ago
                                I would have thought that the fact that the entire jsonb value is rewritten if you add a new field to it would be clear to anyone who gave it a few seconds of thought. How else could that work?

                                If you don't want to pay the cost of updating the full JSON value, that's what columns are for.

                                • vlovich123 1 year ago
                                  There are obviously ways you could accomplish it. RocksDB’s Merge operator uses this as one of the canonical examples - your update records just the new value through a merge record. When you go to retrieve, you have to keep going through delta updates to reconstruct what the value is now & you only rewrite the value as part of a large compaction process. This way the cost of the write is the cost of the delta, the read cost is impacted although in practice negligible if you use a row cache, and the cost of rewriting the document is amortized into a much larger operation.
                                  • koolba 1 year ago
                                    > If you don't want to pay the cost of updating the full JSON value, that's what columns are for.

                                    You pay the full cost for writing the entire new row in PostgreSQL if only a single column changes. There’s some saving for indexes, but the row data is always written in full for the new row even if just one column changes.

                                    Now that’s no reason not to use actual columns. The value of a well defined schema is knowing the types and constraints of your data.

                                    • brigadier132 1 year ago
                                      > I would have thought that the fact that the entire jsonb value is rewritten if you add a new field to it would be clear to anyone who gave it a few seconds of thought. How else could that work?

                                      We are talking about postgres, unless someone went into the manual to look up the performance characteristics of jsonb columns why wouldn't they assume that it works in the most optimal way?

                                    • mbreese 1 year ago
                                      In my experience, you’re going to pay for that complexity at some point. You either pay it at the write end or the read/query end.

                                      I don’t see a way around that, so if PG explicitly makes it so that you pay it once on the write, I’m good with that.

                                      The alternative is a more complicated data storage layout that makes queries more complicated. That’s okay too, but it just depends on your use case.

                                      I prefer the stability and “correctness” of PG, but if more flexible queries are important to you, have at it. It’s nice to have more options for JSON storage.

                                      • srcreigh 1 year ago
                                        Wdym rewrite the entire document?

                                        AFAICT MongoDB’s storage layer uses 4kb pages, so every modification has to write at least 4kb.

                                    • richbell 1 year ago
                                      This feels like a submarine article for Couchbase.

                                      https://paulgraham.com/submarine.html

                                      • SJC_Hacker 1 year ago
                                        Maybe I never got deep enough into "NoSQL" DBs such as Mongo, but they always struck me as a bit lazy with a "design as you go" philosophy. The functionality seems to focus around fetching individual records, not the deeper analytics / queries you find frequently with RDBMS. The query languages I've seen are imperative, which encourages and processing after the fetch on the client.

                                        And since it encourages "winging" it, the DBs tend to be poorly structured/documented. Which is OK for some applications I guess.

                                        • brigadier132 1 year ago
                                          I don't think NoSQL vs SQL is a real debate. SQL is great and should be used wherever performance allows for it. Whenevre performance does not allow for it, use nosql.

                                          I have some highly normalized data and every time i do a get i have to do a bunch of joins. I have to do a ton of reads on this data. So what I do is have a nosql db in conjunction with my sql db and the user can hit a "publish" button that stores a denormalized version of data in the nosql db that is automatically replicated all over the world and gets are super cheap.

                                          • fngjdflmdflg 1 year ago
                                            >SQL is great and should be used wherever performance allows for it

                                            I feel like this is worded poorly. SQL has better performance than most noSQL databases (eg. document databases) for many types of queries. The issue with SQL has historically been cost and scaling, (both of which have been at least partially solved in recent years)

                                            • zepolen 1 year ago
                                              Caches are good, but calling them a database gets on my nerves.
                                              • 5Qn8mNbc2FNCiVV 1 year ago
                                                Yeah KV stores really are amazing at storing computation results from the same input
                                              • vlovich123 1 year ago
                                                I think what people actually want is SQL layer that runs on an engine that has NoSQL performance characteristics that supports a flexible data schema. As S3 has shown, a non-trivial amount of data applications are very basic get/put. Yes yes, now that it’s so dominant, that’s how people are designing access patterns which is skewing things, but it’s success should tell you something about what people want.
                                                • lIIllIIllIIllII 1 year ago
                                                  NoSQL is way harder than SQL to get right if you're trying to write something performant - and you don't want to refactor everything down the line. I pick SQL because I'm lazy, not the other way round.

                                                  The only upside of NoSQL is performance. Almost everything else is harder and worse than SQL. Sure you can get your denormalised schema and never have to join anything to get all your data. But, you need to know all of your access patterns at design-time and when you need to refactor the data model in some way you're fucked and need to, I dunno, rewrite an entire table, sometimes on the fly while the system is running, which is like changing a tyre on a moving car.

                                                  • 1 year ago
                                                    • morkalork 1 year ago
                                                      Developers love it. The poor data analysts that come along later, not so much..
                                                      • FridgeSeal 1 year ago
                                                        Lazy devs love it.

                                                        Everyone who has to come along after them despises them for it.

                                                        KV stores and noSQL stuff have their place, being a dumping ground does them no favours.

                                                        • kbolino 1 year ago
                                                          I've returned to SQL after years of "lazily" enjoying NoSQL and I gotta say, SQL would do well to stop relying on a thesaurus. It feels like I'm writing COBOL. I don't understand why I need to feed English to a Zork parser when I know exactly what operation to perform. An opcode with some arguments would be preferable to writing a poem and hoping it gets interpreted how I meant.
                                                    • gregw2 1 year ago
                                                      This technical arguments in this article aren't that coherent in my view. I'm not sure if this is the reporter's fault or some Couchbase marketing film-flam (err, which would still end up being the reporter's fault.) It could be me I suppose; judge for yourself:

                                                      The first half suggests Chamberlain ("SQL co-creator") supports NoSQL which avoids the relational model and in many cases is oriented around more of a key-value store with eventual consistency for better handling certain scalability use cases. Slightly eyebrow-raising that the SQL co-creator agrees, but I don't think there's a huge technical dispute that you can scale better if you allow eventual consistency. Whether it's wise is debatable but of course there will always be a use case where it is arguably wise.

                                                      The second half talks about SQL++ (which Chamberlain apparently worked on) which AWS calls PartiQL whose value proposition is better/easier parsing of JSON.

                                                      I am not clear how relaxing ACID properties to scale is related to JSON other than that a bunch of early NoSQL databases (MongoDB? Couchbase?) tried to do both.

                                                      I am no PartiQL/SQL++ expert but I have used PartiQL and it is pretty much SELECT... FROM... etc SQL with some fancy features for JSON handling. With standard transactional semantics. You can do similar JSON-type handling in Postgres and Snowflake and other tooling from what I can tell, although I'd be a bit surprised if some of the SQL++/PartiQL dot notation and array subscripting for working with JSON data is ANSI SQL standardized. I don't keep up with this but SQL:2016 JSON support can be seen a bit in places like: https://modern-sql.com/blog/2017-06/whats-new-in-sql-2016#js... and SQL:2023 JSON support (esp T801 through T882... oh, it looks like the "simplified JSON accessor" that I liked from PartiQL seems to be there) is a bit covered at: http://peter.eisentraut.org/blog/2023/04/04/sql-2023-is-fini... )

                                                      • endisneigh 1 year ago
                                                        nosql is great. What’s not great are people who think it’s the best at everything all the time. Same with SQL for that matter.

                                                        Turing complete isn’t the proper phrase, but I assert any data model can be modeled performantly both ways (normalized or not).

                                                        • burcs 1 year ago
                                                          Yeah I agree on not being overly opinionated on this. SQL, NoSQL, for most applications as long as the data is stored somewhere reliably it doesn't matter.

                                                          Although, he's kind of losing me on this SQL/C++ hybrid, not sure I can get behind that one.

                                                          • sverhagen 1 year ago
                                                            Yeah, reliably... I think that ACID SQL databases are still silently saving many peoples' butts on a daily basis, with safeguards that NoSQL databases are a lot more selective about.
                                                          • 1 year ago
                                                            • Gbotex 1 year ago
                                                              [dead]