How to Port from Python 2 to Python 3 (2019)

54 points by noego 5 years ago | 48 comments
  • ryandvm 5 years ago
    Python 3 and IPv6 are the poster-children of how _not_ to do a major upgrade. I'm not sure what the right way is, but if the short-term advantages of the upgrade do not outweigh the immediate pain, prepare for the matter to drag out for _decades_.
    • badsectoracula 5 years ago
      > I'm not sure what the right way is

      The right way is to make sure that stuff that used to work in the previous version still works in the current version. Breaking people's work, especially work that spans multiple years, projects, knowledge, etc and expecting them to be happy about it is naive. Being condescending when they turn out to not be happy and try to avoid the unnecessary busywork forced on them does not help either.

      This isn't just about Python, many libraries and languages (and some OSes - see iOS, macOS and to a slightly less extent Android) are terrible about this. The proliferation of semver with its normalization of breaking stuff (the fact that a dependency - be it a library or language or whatever - uses semver communicates that they have already decided that they will break backwards compatibility at some point) shows that most people are fine with breaking others' code.

      • rtpg 5 years ago
        The most painful breaking change was the string treatment. Breakage was necessary if you wanted to make it possible to have more confidence in the basic building blocks of python.

        If you make a mistake when making a tool, you can either leave it forever, permanently causing pain for users forever, or you can try to find a path to fix it.

        That being said, a Python 3.0 which was _just_ “can’t call encode in string, decode on bytes” then subsequent releases fixing up other stuff over time would have been much nicer.

        Like the “everything is iterators now” release could have happened later.

        • dTal 5 years ago
          The constant stream of breaking changes in Python - that is, in the "standard library" which may as well be part of the language - is the most frustrating thing about Python. There are perfectly good projects that can no longer be run without major work, just because they were left unmaintained for a few years. This is a silly state of affairs, and depressingly common when the fix is really simple: version declarations. Feel free to move fast and break things, but always provide the old behaviour if the user puts a "version=3.2" flag in their code. There's no reason this mechanism couldn't have extended to every change in Python since its release.

          If POV-Ray can do it, Python could have done it.

          • badsectoracula 5 years ago
            Free Pascal changed the string type some time ago to make it encoding aware (i'm not really a fan of the idea, but it was done for Delphi compatibility which is considered important by the FPC developers). I only had to change a handful of lines in my 10+ year old code (at the time, now it is older) to make things work (all were about treating the string as a byte array and manipulating the memory directly - i just changed the type to RawByteString which provides exactly that functionality).

            AFAIK that was the biggest "breaking" change they introduced by far. In general have code from 2007 that compiles out of the box and this sort of stability is why i stick with FPC (and C) despite it being messy sometimes.

            If you make a mistake when making a language or API you should make sure whatever fix you come up with will keep the existing code working, most common way being that the old API is implemented in terms of the new (even if slower, things will keep working) or in the case of languages, new stuff that can conflict with existing code can be opt-in (Free Pascal often uses compiler submodes for this).

            Yes, this makes implementing the library/language harder but it is going to be a bit of extra work for the implementors in exchange for avoid A LOT of work for the users (especially when you consider all the combined time wasted in porting Python 2 to Python 3).

            • magicalhippo 5 years ago
              The language we use at work (Delphi) changed its string type from ANSI to Unicode, and it took us less than a day to fix our ~500kloc code base, which does a _lot_ of string manipulations all over.

              This was due to the hard work the people behind Delphi had put down to make the transition as smooth as possible.

            • MaxBarraclough 5 years ago
              > The right way is to make sure that stuff that used to work in the previous version still works in the current version.

              But that too brings considerable downsides.

              For all its merits, C++ is an extremely bloated language, getting even more complex every release, due in no small part to its commitment to backward compatibility.

              There's no perfect answer. Python3's decision wasn't stupid, they just chose one downside over another.

              • badsectoracula 5 years ago
                C++ bloated largely because they decided to make it bloated - they didn't had to, they just decided to shove in whatever new idea sounded good without much concern about the language's size.

                But despite that i 100% guarantee you that people who actually use the language and have large codebases are really glad that C++ is backwards compatible and they do not have to waste time refactoring code that works.

            • cletus 5 years ago
              IPv6 is a perfect example of the second-system effect [1] as in it added a bunch of things no one needed (but might need someday) and didn't solve roaming. All IPv4 really needed was a bigger address space.

              But as soon as you cross the mental threshold of making a breaking change (which expanding the IPv4 address space obviously was) then it's easier to convince yourself to make a bunch more breaking changes. And this is where Python3 really lost its way (IMHO).

              One of the silliest design decisions in Python3 was (initially) removing the string prefixes like s and u. Now obviously Python2 defaulted to ASCII and Python3 defaulted to Unicode but this decision just made making libraries compatible with both, so much so that they added it back (around 3.2-3.3 IIRC).

              There are also always decisions you make that in hindsight you wish you'd done differently (eg the mutable Date class in Java) but just because you're making breaking changes doesn't mean you should "fix" all of those. You still have to look at each one and ask yourself "does this really matter enough to justify changing it now?". The default answer is "no" and the bar for "yes" should be really high.

              I feel like Python3 failed here too.

              And look where we are. Python3 out in 2008 and we're still writing migration guides in 2019.

              [1]: https://en.wikipedia.org/wiki/Second-system_effect

              • mixmastamyk 5 years ago
                It's important to realize however that not everyone feels that way about Python3. I'm glad they fixed it and I wish they'd fixed more.
              • segmondy 5 years ago
                Upgrades are just hard.

                See perl5 to perl6 GWbasic to Qbasic to VB to VB.net

                you either make a clean break or keep all the warts, Either way folks are going to be unhappy.

                Keep the warts, COBOL, Fortran, C, C++, PHP, Excel

                • Asmod4n 5 years ago
                  Ruby did a great job back in the day with their 1.8 release which changed the language to be Unicode friendly.
                  • mixmastamyk 5 years ago
                    What did they do differently? I guess they benefited from hindsight.
                  • zojirushibottle 5 years ago
                    > GWbasic to Qbasic

                    wait wuh? i thought these were just two of the dozen variants of the BASIC dialect... interesting!

                    • lizmat 5 years ago
                      Rename Perl 6 to Raku. People happier.
                      • badsectoracula 5 years ago
                        It makes sense if a language is so different and backwards incompatible than the previous version to just rename it to someone else.

                        Hence why a lot of people back in the day felt that Visual Basic .NET should be called Visual Fred instead :-P

                    • rb808 5 years ago
                      At least Python3 brings an improvement. I still can't think of a good reason I should spend any time trying to figure out IPv6. Maybe my external router has to think about it, but that is about it.

                      Edit: If you down vote me please say why I should care about IPv6

                      • Fr0styMatt88 5 years ago
                        The main mental stumbling block I have about IPv6 (and I know it's kind of a silly one, but it's honestly what I feel on introspection) is that I can't remember an IPv6 address off by heart. IPv4 addresses just _feel_ so much more human-consumable than IPv6 addresses. I can't imagine myself using IPv6 addresses on the command line like I do with IPv4 now.

                        There's also the issue that honestly, I have no idea what is using IPv6 and what's using IPv4 right now. On my internal network I only ever deal with IPv4, but I have no intuition as to what is using IPv6; I couldn't tell you off the top of my head if my ISP supports it.

                        • deathanatos 5 years ago
                          > is that I can't remember an IPv6 address off by heart. IPv4 addresses just _feel_ so much more human-consumable than IPv6 addresses.

                          Stop remembering numbers meant for a machine, and use DNS. It will make your life so much easier. I spend ~$15/yr for my personal DNS name & hosting, and I never want to memorize an IPv4 or v6 address ever again.

                          > I have no idea what is using IPv6 and what's using IPv4 right now.

                          Any "end" machine (laptop, tablet, phone, server) is going to be dual stack, supporting both IPv4 and IPv6. If it is able to auto-configure an address, then it will use those. Otherwise, it won't. If a domain is IPv4-only, it'll use the IPv4 address. All of this is automatic.

                          The big issue, for home consumers, is that a lot of ISPs are dragging their feet. They don't need anything from the customers — they just need to get it deployed & turned on. Generally, typing "what is my IP address?" into Google will tell you if you have working IPv6; it will display an IPv6 address if you do.

                          In the cloud… some cloud vendors have been dragging their feet about rolling support out. You need to do some things, like associate an AAAA record to your domain (s.t. it resolves to an IPv6 address), and make sure things like logging can handle the new addresses, or if you implement IP blocking, that you can block those addresses/networks. If you're writing network code, you need to check that you're not assumptions about the socket type. You can also do things like HTTP proxy from an IPv6 connection to an IPv4 VLAN, e.g., I think w/ an ELB. That is,

                            client <-- HTTP/IPv6 --> ELB <-- HTTP/IPv4 --> backend server
                          
                          which allows a partial upgrade. None of it is terribly hard, but typical project management puts upgrading to future tech in the perpetual backlog.
                          • 5 years ago
                          • downerending 5 years ago
                            It sounds like you don't need to, and honestly, after all of these years I still don't know much about it either, because I haven't needed to.

                            One could argue that this is actually a significant benefit of IPv6, in practice.

                          • downerending 5 years ago
                            cf Itanium vs amd64
                          • LoreleiPenn 5 years ago
                            Hopefully no more people will keep saying "learn Python 2 because 3 has almost no packages".

                            It is so easy for people to just repeat what they heard even if that idea originated a decade ago and was valid a decade ago.

                            And that way we got into a mess of not migrating until pass the time it is no longer supported...

                            • zdw 5 years ago
                              I've been doing a lot of Python 2 -> 3 lately, and found this to be one of the best actionable guides: https://portingguide.readthedocs.io/en/latest

                              Also, using tox on the project to run tests against both python 2.7 and multiple versions of 3 and the work goes pretty quickly.

                              • mixmastamyk 5 years ago
                                Looks like a lot of the guide assumes you want to run on Py2 and 3 concurrently. The time for that has passed to be honest. A clean port is easier to do.
                                • zdw 5 years ago
                                  For larger code bases making direct jump straight to Py3 that breaks Py2 compatibility without a transitional period can be problematic especially if it's a library.

                                  The time may have passed, but lots of code is still out there that needs to be updated.

                                  • mixmastamyk 5 years ago
                                    Maintaining two branches shouldn’t be a problem. If you waited this long to port, velocity on the legacy branch can’t be especially high.

                                    Now that I think of it the legacy branch should be eol soon.

                              • pjc50 5 years ago
                                We built a product with an embedded Jython interpreter. Jython is stuck on Python2 and somewhat abandoned. So that's nice.

                                Re: packages, one of the huge advantages of the C ecosystem has been that compiled packages are usually fine across language transitions, not only between major compiler version numbers but even from C to C++ which are much more different languages than Python2 to 3. How different would the Python transition have been if it were possible to load Python2 packages in a Python3 program?

                                • o_x 5 years ago
                                  Isn't it ironic that Sentry is one of the tools mentioned in py2->py3 migration? (Sentry is on py2 and as far as I remember they were not very optimistic about migrating)
                                  • zojirushibottle 5 years ago
                                    that's correct. sentry itself is on python 2.7, not the python client.

                                    not to pick on sentry here, but you know, my experience is that people are having a hard time migrating due to them using obscure tricks and features of python 2.7. so their code is breaking because the language evolved.

                                    the saying goes write dumb code or something because debugging is twice as hard. if there is anything to learn from all this, it's to write dumb code because maintenance is twice as hard too.

                                    that's all that is happening really!

                                    • ggregoire 5 years ago
                                      Why would you even build a logs collector in python? Especially if it’s your core business and you know you will need scale and reliability. You kinda shoot yourself in the foot.
                                      • baq 5 years ago
                                        they're using python 2.7. perhaps started on an earlier one. maybe there was nothing except java 1.4 when they started.
                                  • zitterbewegung 5 years ago
                                    The big issue of ports like these is not the tutorial but to justify that to your boss.

                                    From enterprise to a self run startup you have to see if it’s worth it .

                                    • GrayTzar 5 years ago
                                      Hi, I'm one of the STX Next content crew. We actually have a companion piece to this that goes over the reasons why you should migrate: https://stxnext.com/blog/2019/07/30/why-migrate-from-python-...

                                      Maybe that would be useful for a conversation with one's boss.

                                      • BeetleB 5 years ago
                                        > The big issue of ports like these is not the tutorial but to justify that to your boss.

                                        I think that's a red herring. Convincing your boss is an issue even for minor upgrades. In a previous job, we couldn't convince him to move from Python 2.5 to 2.7.

                                      • swalsh 5 years ago
                                        I literally just got on a phone call to discuss our migration away from 2.7, very timely post.
                                        • classified 5 years ago
                                          IIRC, the Python used in the macOS vim(1) is still 2.x. So at least on a Mac it won't be possible to just move on to Py3 and forget / uninstall Py2 for the foreseeable future.
                                        • mixmastamyk 5 years ago
                                          Porting is a non-event for most non-large projects. In short:

                                          - First cut a new major version

                                          - Write a few tests if needed, they go a long way here.

                                          - Update to 2.7 best practices and logging

                                          - Run tests, commit

                                          - Add a few future statements, commit

                                          - Run pyflakes3 on it, fix, commit

                                          - Run under 3.x/fix until clean, commit

                                          However, if your project is huge and/or does a lot of string and bit twiddling it's excruciating. Hence the controversy between factions.

                                          • grifball 5 years ago
                                            sed -i 's/print \("[^"]*"\)/print(\1)'
                                            • blebleble 5 years ago
                                              Here's how you can run your python2 code using python3 in one easy command:

                                              mv python2 python3