Building a Django driver for Psycopg 3

171 points by Spiritus 3 years ago | 24 comments
  • parhamn 3 years ago
    I was wondering what the major changes between psycopg2 and psycopg3 were. Found a post from the maintainer here: https://www.varrazzo.com/blog/2020/03/06/thinking-psycopg3/

    Main takeaways:

    - Asyncio from the ground up

    - Uses PQexecParams to do database-side escaping and interpolation

    - ContextManager and transaction api improvements for

    - Python only fallback if the C extension fails to build

    • Waterluvian 3 years ago
      That last one is great if it’s hidden behind a flag and some loud messages saying “failed to build, consider using —-python-only with performance limitation.”

      Otherwise it’s going to be a source of mysterious performance issues with a single error message hidden somewhere in your CI or other logs.

      • acdha 3 years ago
        I'm a bit mixed on a flag because C library dependencies are a fair support burden and an awful lot of people do not have workloads where a pure Python implementation is a significant performance issue because it's lost in the actual work done by the database.

        I'd agree with something like a warning on first load or maybe something like an optional package so you could pin `psycopg3[cext]` if you want to guarantee the extension installed.

        • dvarrazzo 3 years ago
          There is a mechanism to ensure that there is no unexpected regression. Exposing the PSYCOPG_IMPL env var a program can make sure to obtain a specific implementation and import fails if it's not available. https://www.psycopg.org/psycopg3/docs/api/pq.html
        • odiroot 3 years ago
          > - Asyncio from the ground up

          > - Python only fallback if the C extension fails to build

          These two are really big! The former, for first-party async support, making it much easier to use with async frameworks.

          The latter would make it much easier to ship in containers. I assume it's still better use the C version, for performance reasons.

          • whoknowswhat11 3 years ago
            YES to this last!! Musl cross deploy can be annoying otherwise - any tips on building Alpine Linux ?
          • trulyme 3 years ago
            Just a huge thank you to Daniele for developing and maintaining psycopg! The article shows the level of professionalism and care that goes into this project. Using Django to test implementation is a genius idea. I can't wait to try the new version in a few of my projects, the changes (especially server-side parameters binding) sound great.
            • perlgeek 3 years ago
              A somebody who first started using DB code from Perl, and later learned Python, I always wondered by Python doesn't have a general database interface like DBI in Perl.

              In Perl, all the database specific modules have a DBI backend, and all the higher-level modules (django-like frameworks, for example) rely on DBI.

              In Python, SQLAlchemy has its own psycopg and cmysql integrations, and does django, and likely several other frameworks.

              (Java has a similar standard, with JDBC, I believe; though I have never used it, so I might be misunderstanding something here).

              • zzzeek 3 years ago
                Creator of sqlalchemy here, I came from the perl DBI and JDBC worlds prior to starting python. We have pep 249, but it's generally a very loose spec, projects that implement it basically choose how much they want to follow or not follow it, and then with the introduction of asyncio people are walking away from the whole thing as the spec has not evolved pretty much at all for many years. At the top is that there is no actual library in which these drivers all need to bind towards as is the case with DBI and JDBC.

                All of that said the reality is that databases are so different in how they define client interactions you're going to have these problems with either approach. The transparency of Python allows it to be ultimately easier to work through these issues, though I've always wished there was a better pep249 story.

                • prpl 3 years ago
                  I had the chance to complain to Guido once about wanting a better DBAPI, especially on the subject of result set metadata/typing (looking at you, pysqlite with your worthless cursor.description)

                  This was after a mypy/typing talk, so highly tangentially relevant I suppose.

                  While it's not perfect, I think JDBC really is the gold standard, and I wish there was a DBAPI spec that was a bit closer to that, especially something with proper prepared statements. I haven't used database/sql in Go but it seems okay too.

                  Thanks for sqlalchemy - it has become the de facto DBAPI in many projects I've worked on.

                • anewhnaccount2 3 years ago
                  There is https://www.python.org/dev/peps/pep-0249/ . I suppose part of the issue is that there is probably deviation from the standard and lots of extensions. I imagine that the standard means there is a lot of shared code between the different connectors though.
                  • nerdponx 3 years ago
                    As far as I know, there is zero or close to zero shared code.

                    PEP 249 is great to have, and it's nice to at least have a starting point for learning a new database connector library.

                    But not all database connector libraries implement the specification. One well-known library [0] explicitly and deliberately violates the spec with no spec-compliant "escape hatch". Also the style of passing parameters as a single sequence (e.g. list or tuple) tends to be a newbie trap, and having 4 different placeholder styles can be annoying.

                    That said, there are ODBC connector libraries [1] if you really do need a uniform interface. But at that point you might be better off with the SQLAlchemy "Core" query builder [2].

                    [0]: Asyncpg, https://magicstack.github.io/asyncpg/

                    [1]: https://wiki.python.org/moin/ODBC

                    [2]: https://docs.sqlalchemy.org/en/14/core/

                  • lmm 3 years ago
                    JDBC exists but I honestly think it's a mistake. Higher-level frameworks still end up with a bunch of special cases for handling different lower-level implementations, and meanwhile a lot of functionality gets stuck behind the lowest common denominator interface that JDBC is (e.g. it took forever to get any kind of async support even when both the things above and the things below JDBC were doing great at it).
                    • merb 3 years ago
                      > e.g. it took forever to get any kind of async support even when both the things above and the things below JDBC were doing great at it

                      is there even support for async jdbc? I tought they dropped the idea ? (https://mail.openjdk.java.net/pipermail/jdbc-spec-discuss/20...). The reasoning is stupid because async drivers is not only about threading and more about i/o, but w/e.

                  • jsmeaton 3 years ago
                    Casually building a django db backend as an acceptance test. Nice work and great write up!

                    I had déjà vu reading the section on bind parameters in aggregate queries. The oracle backend had a similar issue that was fixed/hacked by comparing values and grouping them into named parameters. https://code.djangoproject.com/ticket/27632

                    • dvarrazzo 3 years ago
                      This is interesting, thank you for pointing it out. Worth looking if the problems can be fixed the same way in both the adapter (and probably everything else using server-side binding).
                    • slownews45 3 years ago
                      The example syntax for connection handling is perfect. The footgun of not releasing the connection (especially from pools at the end of the with) is real. And the concepts are different (block of transactions vs normal with file xxx. )

                        with connect(DSN) as conn:
                          with conn.transaction():
                              do_something()
                              with conn.transaction():
                                  do_something_nested()
                      
                          with conn.transaction() as tx:
                              do_something_else()
                              # we were just testing and we don't really want to do this
                              tx.rollback()
                      
                        # and here the connection is closed
                      
                      Any chance of pushing some changes up into the postgresql side? I'm thinking of a block of trx with params separate? Didn't look at it at all.
                      • nerdponx 3 years ago
                        For what it's worth, using context managers for this kind of thing is pretty much standard nowadays, and not supporting this behavior feels jarring and archaic.
                        • slownews45 3 years ago
                          Agreed - I spent so much time trouble shooting this exact issue because I honestly couldn't even imagine that I wasn't freeing the connection.

                          I ended just giving up and doing a close() after searching for every use. But even then didn't immediately understand what was going on.

                          With XXX do Y

                          is totally standard - but here we open but do not close.

                        • anarazel 3 years ago
                          Could you expand on the "pushing into pg" bit?

                          Do you want to specify parameters that have a scope larger than a single statement and send the whole parameterized batch in one go?

                          • gigatexal 3 years ago
                            Ruh roh I have some code to fix. :sweat_smile:
                          • geenat 3 years ago
                            Really great to see more than one viable async database library aside from asyncpg.

                            Thank you!!

                            • nerdponx 3 years ago
                              What isn't viable about Aiopg?

                              (Assuming you are only talking about Postgres libraries.)

                              Maybe it's somewhat obsolete now that Psycopg itself supports async. I think most people used Asyncpg because it's "easier" and supports useful stuff like connection pools.

                              • 3 years ago
                            • getty 3 years ago
                              Thanks for your hard work in making the transition from psycopg2 to psycopg3 as simple and painless as possible, Daniele!