Silkenweb Example: Hackernews Clone

PostgreSQL is eating the database world

108 points by janpio 1 year ago | 53 comments

kstrauser 1 year ago
Always has been.
I've been using PostgreSQL for a decades, and I feel so spoiled. It always Just Works. Not to say there've never been bugs, but compared to anything else with that much surface area, it's a brilliant piece of engineering.
It's astonishing how often it's a perfectly fine stand-in for the "right" solution. Need a K-V store to hold a bunch of JSON docs indexed by UUID? Fine. Want to make an append-only log DB? Why not. Should you do those things? Probably not, but unless you specifically need to architect for global-scale concurrent usage, it's likely to work out just fine.
For me, it's the default place to stick data unless I have a specific requirement that only something else can meet. I've never once regretted using it to launch a production system, and only a couple of times have needed to migrate off of it due to performance demands.
Thanks, PostgreSQL team! You rock.
- CharlesW 1 year ago
  > Always has been.
  It definitely has not always been.
  https://www.quora.com/Why-is-MySQL-more-popular-than-Postgre... (2010)
  - tomnipotent 1 year ago
    Postgres wasn't even on my radar until the 9.1 release (2011) rounded out their replication story. MySQL replication had been baking for a decade at that point, despite its other glaring flaws.
    - SOLAR_FIELDS 1 year ago
      Yeah, in my personal anecdotal experience Postgres wasn’t really known as the “good enough for anything, rarely the best at anything but pretty good at it” workhorse until the mid 2010’s.
      It is great though, amazing piece of technology. Not 100% sure you need some specialized data store for your use case? (Most people aren’t). Pick Postgres. You can always switch to SpecializedDB later when pg falls over.
      I cannot undersell enough how hard it is to be at being 90% good at everything. Like, you can pick Postgres for these technologies and not get fired for it at first glance and it will carry you to medium-large scale
      - time series
      - GIS
      - Graph
      - CRUD/ Jamstack apps (eg Supabase)
      - Queues
      - and on and on and on
      If it even tangentially involves data you can trust that Postgres has some solution that will mostly work with some edge cases unless you are at disgusting levels of scale. Love it. A true jack of all trades
  - ttfkam 1 year ago
    Even today, Postgres on Windows isn't a viable target for performance reasons. Postgres is one process per connection. Spawning processes on Windows is slow compared to Unix.
    Building/running Postgres in Windows' Linux compatibility layer or within Docker is typically the better option, especially considering that every cloud vender offers Postgres running on a Linux OS, and it's best to have your dev environment match your deployment environment as closely as possible.
    With Linux as your starting point, getting up and running with apt-get install mysql vs. apt-get install postgresql was trivially similar in 2010.
    - BrainInAJar 1 year ago
      That's an argument about Windows being production ready though, not Postgres...
- ijidak 1 year ago
  I hear this often with regards to Postgre.
  Can't all of the above be said about Microsoft SQL Server as well?
  What prevents SQL Server from being used in the same cases you mention above?
  - FridgeSeal 1 year ago
    - cost
    - only recently runs on Linux
    - cost
    - all sorts of MSSQL specific features and syntax (@@ is unhinged and you can’t convince me otherwise)
    - Postgres docs are better
    - Postgres has a massive ecosystem of extensions (see PostGIS alone!)
    - did I mention the cost?
    - Postgres has wider range language support: basically every language I’ve ever used has a PG library. Not the case for MSSQL. Additionally some of the MSSQL libs are real bad, the Python one is basically like “use this ancient odbc lib lol” it’s great from .net and awful from everywhere else.
    - licensing and running costs, because these cannot be overstated.
    - features like CDC locked behind _expensive_ licenses, that you get out of the box with Postgres.
    Need I say more?
    - MissTake 1 year ago
      Need I say more?
      Yeah - cost. :)
      Cost is the reason I’m moving everything I can off our existing MSSql Servers before our next renewal comes up in a years time.
      We did our due diligence and tested all manner of existing expensive queries and whilst there are a handful we just can’t get to run as fast as MSSql Server on Postgres, they only run slightly longer on average (as in 9 or 10 minutes as opposed to 6 or 7) and only at night.
      However, given the cost differential ($50k+ vs $0), we can live with this.
    - braza 1 year ago
      Sorry if I am nitting here, but cost it’s just a single dimension in a great scheme of things in technology adoption.
      I worked with MSSQLServer 05/08/12 in some projects and the whole suite for the time being was one of the best related in terms of easiness to deploy plus due to the integrations with Integration/Analysis services you could deploy cubes and serving that directly on Excel and if you had some people creative in VBA you could reach a very sophisticated and effective way to provide reports.
      I am huge fan of PostgreSQL, but back in 2005/2012 in all organisations that I worked would be very hard to justify 3+ extra FTE because someone was idealistically about Open Source a land costs.
    - ijidak 1 year ago
      Got it. Appreciate that. That makes sense.
      Was genuinely curious.
    - somat 1 year ago
      It is cost, but it is not as simple as just cost, the ability to just spin up a new database server without the hassle and delay of dealing with getting it licensed it is extremely valuable. This is a large part of the value proposition of linux as well.
      I mean, I am sure there are companies with site licenses and it is easy to stand up a new microsoft stack, but I have never worked at any of them, for us it was always fighting with management and the purchasing department to get the licenses we needed. As none of us were really people persons this is exhausting. It was so much easier to stand up a service on linux in comparison. No bureaucracy, no delays just get it and go. So we pushed very hard into the linux and friends stack.
    - VeejayRampay 1 year ago
      since when do you have CDC out of the box with PG though?
      I mean I'm a big fan of the tech but this seems like a stretch
  - marcosdumay 1 year ago
    > Can't all of the above be said about Microsoft SQL Server as well?
    No.
    MS SQL doesn't just keep working, needs real hardware to run, doesn't handle noSql work anywhere as well, and has many small bugs that pop-up here or there. Besides, it's a quite visible expense - that would be ok if it gained you anything.
    And just as impacting but on a different dimension, it requires much more query optimization, and its language is just awful when compared to Postgres (even though it's probably the next best thing out there).
    - briHass 1 year ago
      Pure nonsense, except for the licensing cost comment.
      SQL Server has arguably the best query planner/optimizer in the RDBMS space, and it also has tons of management tools and simple integrations in a Windows environment (tight AD integration) that admin very easy. If cost were no object and you need a DB that requires essentially zero babysitting after the initial setup, SQL Server is a better choice than PG.
      TSQL takes some getting used to, but it allows for powerful/flexible stored procedures that can replace the vast majority of external business logic (for better or worse, depending on the application architecture.)
      That said, PG is definitely more flexible with things like custom types and all the extensions you can add, and SQL Server's better query/clustered index performance isn't likely to make a huge difference in most workloads. The cost of SQL Server is also outrageous, especially run on large (CPU) instances. PG is usually a better choice for cost alone, but SQL Server is not an inferior DB by any means.
  - datavirtue 1 year ago
    Eye watering licensing costs. A small company I was at increased the number of cores on the SQL Server VMs, the CFO pissed off and fired a few DBAs, Microsoft audit ensues, and we had to find an extra $1MM in the budget for SQL Server licensing.
  - leosanchez 1 year ago
    Licensing cost ?
    - kstrauser 1 year ago
      Yep. I can use PostgreSQL anywhere I want, for any purpose I want, in any configuration I want, for free. That means I can use one stack from the tiniest of projects up through giant production systems, and also that I never have to wait for budget approval before scaling or launching new staging environments or spinning up a thousand test instances.
      For me, the onus is on any other DB to convince me that I should use it instead of PostgreSQL. The few times when that's been the case, it's been because we needed something other than a relational database for various specific reasons. At this point I can't think of many reasons I'd use anything else than psql that's in the same category.
      Like, I can imagine requirements that would send me to Snowflake or Redis or DynamoDB much more easily than things that would nudge me to SQL Server or even MariaDB.
  - wiredfool 1 year ago
    Licensing per core makes it difficult to really take over the world.
- BeefWellington 1 year ago
  I would caveat the "Just Works" thing for your specific use-cases. I've seen plenty where PG's query planner falls apart and requires copious coaxing beyond what Oracle, DB2, and MSSQL require in order to do the performant thing.
  It's a fantastic product and I use it everywhere.
  My major complaint is that there's no easy way to follow defect statuses because the team has long stubbornly refused to implement a bug tracker[1] and that means I can't just subscribe to a defect to get status updates for an issue I want to watch. Instead I'm expected to follow the entire mailing list or watch changelogs like a hawk. It's incredibly dumb.
  [1]: https://www.postgresql.org/docs/current/bug-reporting.html#B...
whartung 1 year ago
The magic of Postgres (and, quite arguably, MySQL, SQLite, etc.) is simply the idea that a sophisticated RDBMS is ubiquitous.
I came from the Old Days when we had to chisel BASIC code into cooling silicon. Having something like a SQL RDBMS just sitting there, busy, or not, maybe just wasting away, ready for any weird nonsense you throw at it, is just a treasure.
I have postgres on my mac. I've had postgres on my mac since I've had a Mac, so, what, 2006? I still have DBs on there that are now pushing 17 years old (after several PG version upgrades). I have the space, no reason to delete them. Just there. Old projects, strange experiments, idle.
That I have this much capability languishing is amazing.
SQL databases used to be a Big Deal. They were large step up from hand coding B-Tree indexes. I remember once we got a call from a client complaining about performance on a system we installed. We popped in, took a look around, and, yea, we dropped the ball. Not a single index was created on their system. It was just the tables. No wonder it was slowing down. 10 minutes of mad index creation later, all was well.
If you weren't there in those days, it's remarkable that we had a system where indexes were (mostly) a performance thing, rather than a core thing the entire system was designed around. A paradigm shift in development.
SQL DBs were amazing. They were also rare, and expensive. Custom libraries to access them, etc. But also, generic query tools, no code to write to beat on the data, or dump out quick queries, just the SQL front end. Powerful. Capable. So, yea, I held them on a bit of a pedestal.
And I can now just let one of those things, with untold modern capability and range, just sit idle on my machine. Just like I can leave a Calculator window open. Waiting for whenever I deign I need to work with it some.
Extraordinary.
- CharlesW 1 year ago
  > I have postgres on my mac.
  For any Mac users who don't know about it yet, Postgres.app is amazing: https://postgresapp.com/
  - prmph 1 year ago
    Can second this. I found it's simply the most straightforward, reliable, low-maintenance way to run Postgres on a mac.
    The only small hiccup I've encountered is that sometimes you get this error on starting up the Postgres server: "Failed to prune sessions". This error is fixed by running a command like below and then restarting:
    rm /Users/<user>/Library/Application\ Support/Postgres/var-14/postmaster.pi
irrational 1 year ago
We were on Oracle for 15 years. Then the license costs became too burdensome, so we moved to Postgres. Though, it took us two years to make the move. Postgres is amazing compared to Oracle. Faster. More standards compliant. Better error messages. Far simpler to do backups and replication. Etc. Etc. It is quite astonishing how much better Postgres is than Oracle.
finnh 1 year ago
The author should remove - or at the very least _credit_, come on - the image used in "The Pendulum of Database Realm" section. It's from Martin Kleppmann's "Designing Data-Intensive Applications", a particularly good O'Reilly book with illustrations by Rebecca Demarest.
xnx 1 year ago
Dupe:
Postgres is eating the database world
https://news.ycombinator.com/item?id=39711863
5 days ago 138 comments
Edit: included the wrong link
- ydant 1 year ago
  HN link to above: https://news.ycombinator.com/item?id=39711863
eivanov89 1 year ago
That's indeed amusing! I just published a post with title "When Postgres is not enough". It seems that the true distributed databases we have evaluated aren't quite Postgres's cup of tea: https://blog.ydb.tech/when-postgres-is-not-enough-performanc...
da39a3ee 1 year ago
Small thing but postgres should deprecate the names it uses for the CLI commands like "adduser" "createuser" etc; they should have a prefix like "pg-adduser" or more modern, be subcommands of a single "pg" or "pgctl" command. Has there been a move toward that yet?
edhelas 1 year ago
Good
roynasser 1 year ago
I feel like I've seen this more than I've seen my parents since year!
1 year ago
ralusek 1 year ago
Love Postgres, but here's what I wish was different:
RDS Proxy/PG Bouncer should be default connection behavior. Ideally no persistent connection at all, more akin to https would be great.
Vacuuming is ridiculous. It doesn't make sense to me what could possibly take so long. It also doesn't make sense to me that it needs to be blocking (I understand that it's now parallelizable-ish). Using a comparatively slow interpreted language, I can iterate through millions of items, on disk, and do any number of things, within a few seconds at most. I have had databases with like, a few thousand items, somehow take hours upon hours to vacuum/analyze.
Nested transactions would be great. I know there are savepoints but it doesn't work well when dealing with anything in parallel.
And finally, my #1 complaint: Please let ME decide when to roll back/invalidate a transaction. If I want to write something like an upsert, maybe my code says "insert this record, and if I catch an unique constraint error, update the record." In Postgres, at the initial insert, because there's an error, it will just invalidate my transaction! I could have done 100 other things in this transaction so far, all invalidated because of a DB error. An error that I was expecting to catch and handle myself at the application level, and now the entire transaction needs to be rolled back. WHY?
- kstrauser 1 year ago
  I agree about PGBouncer. You absolutely want persistent connections, though: establishing a TLS connection is comparatively costly and you don't want to pay it more than you need to.
  It's been maybe 15 years since I've waited for a vacuum to finish outside of me doing a `VACUUM FULL` on an offline copy as an experiment.
  It's had subtransactions for years.
  It has an exception clause so you can catch errors and roll back. In the absence of explicit exception handling, it must roll back a transaction instead of committing who-knows-what to disk. That's the whole point of transactions.
- mdavidn 1 year ago
  You should read the documentation for INSERT ... ON CONFLICT.
  https://www.postgresql.org/docs/current/sql-insert.html#SQL-...
  I'm not sure what's happening with your VACUUM. It does not lock the table without the FULL parameter. Or perhaps your tables have too many indexes?
- koolba 1 year ago
  > RDS Proxy/PG Bouncer should be default connection behavior. Ideally no persistent connection at all, more akin to https would be great.
  That doesn't make sense. A database connection is inherently stateful as you run multiple commands in a transaction.
  > Vacuuming is ridiculous. It doesn't make sense to me what could possibly take so long. It also doesn't make sense to me that it needs to be blocking (I understand that it's now parallelizable-ish). Using a comparatively slow interpreted language, I can iterate through millions of items, on disk, and do any number of things, within a few seconds at most. I have had databases with like, a few thousand items, somehow take hours upon hours to vacuum/analyze.
  Routine vacuuming is not blocking (on VACUUM FULL to reclaim space is blocking). The entire storage approach has its warts, but works well for 99.99% of use cases. I'd argue that write amplification is a much larger problem.
  > Nested transactions would be great. I know there are savepoints but it doesn't work well when dealing with anything in parallel.
  What does it mean to work with a transaction in parallel? The A and I in ACID are for "Atomic" and "Isolation".
  > And finally, my #1 complaint: Please let ME decide when to roll back/invalidate a transaction. If I want to write something like an upsert, maybe my code says "insert this record, and if I catch an unique constraint error, update the record." In Postgres, at the initial insert, because there's an error, it will just invalidate my transaction! I could have done 100 other things in this transaction so far, all invalidated because of a DB error. An error that I was expecting to catch and handle myself at the application level, and now the entire transaction needs to be rolled back. WHY?
  That's exactly what using a SAVEPOINT does. The default of failing and trashing the connection state (until a ROLLBACK) is a sensible default. It also allows for command pipelining as you can send multiple commands and not worry about partial execution due to intermediate failure.
  If your application code is repeatedly failing then you should be fixing your application. There are many ways to perform consistent INSERT-or-UPDATE in PostgreSQL: https://www.postgresql.org/docs/current/sql-insert.html#SQL-...