Paperless-Ngx v2.0.0

182 points by rhim 1 year ago | 72 comments
  • ydant 1 year ago
    There was a pretty big discussion about paperless-ngx a couple of months ago:

    https://news.ycombinator.com/item?id=37800951 (183 comments)

    I tested it out then and am considering migrating from my current system (Google Drive) to using a self-hosted approach. Paperless seems to have a good approach for minimizing the mental overhead of ingesting and categorizing new documents - which is what ultimately leads me to stacking documents up for months before processing them. My initial pilot run was promising, but I haven't gotten around to switching yet.

    From the changelog, it's not really clear to me what's notable about this release, especially as a new/potential user.

    This page is a better introduction to the product, although it doesn't mention the v2 release yet:

    https://docs.paperless-ngx.com/

    • andrew_eu 1 year ago
      I've been using Paperless for several years now very happily and can recommend it over my previous system, also Google Drive. During the transition I found it helpful to set up a cron which (A) made an export of Paperless and (B) uploaded that export to a Google Drive folder.

      One feature which seems to be quite a nice improvement (speculating as I haven't upgraded yet) is consumption templates [0]. My workflow involves an ADF scanner with an Android application, sharing the scanned PDF with Paperless Share [1] and then it's uploaded to the server via API. It seems that consumption templates will enable adjusting tags/sharing settings/permissions of a document at ingestion time based on where it's ingested from.

      [0] https://github.com/paperless-ngx/paperless-ngx/pull/4196

      [1] https://github.com/qcasey/paperless_share

      • itchynosedev 1 year ago
        I use syncthing to sync from paperless data folder which runs on Kubernetes (k3s).

        It's a one-way sync. Paperless is the authoritative location. The only reason I back up to Google drive is so that my phone has easy access to the documents I may need on the go.

      • benhurmarcel 1 year ago
        Could you specify how it improves over using Google Drive or similar? Is that "just" because you control the hosting, or is the experience better?
        • andrew_eu 1 year ago
          Personally, I think it isn't really an improvement over Google Drive. Drive offers so many more features, an office suite, integration with many other Google services, etc.

          That said, I don't think Paperless is supposed to fill all those gaps. For me, its sole job is to make scanned documents searchable (from anywhere with Tailscale) and durable (with encrypted off-site backups). Having this isolated from a Google account with already too-far-reaching access is a benefit in my opinion.

          Edit: rereading the context, I think you were referring to how I used Google Drive before Paperless. In that case I just stored scans in Google Drive. I struggled to organize consistently and search was lackluster. Paperless improves on these, but also is much more hackable. It's easy to set up post ingestion scripts, backups, email ingestion, etc.

      • ydant 1 year ago
        One feature that isn't mentioned on this release that I was looking for before actually got added in the RC1 for 2.0.0:

        https://github.com/paperless-ngx/paperless-ngx/releases/tag/...

            * Feature: Implement custom fields for documents @stumpylog (#4502)
      • jdoss 1 year ago
        If anyone is looking to kick the tires of Paperless NGX quickly, check out my little pet project [1] for running it with Podman. I use it every week to scan papers from my Brother ADS2800w which will SFTP the PDFs into a directory for Paperless NGX to consume.

        I just updated my install to v2.0.0 with a simple podman pull and a systemctl restart of my paperless pod and everything looks great. Hats off to the contributors of the project. Every update, even major ones like this have been really smooth.

        1: https://github.com/jdoss/ppngx

        • cwiggs 1 year ago
          How do you like this setup?

          I've been thinking of moving from docker-compose to podman, specifically using the [podman-play-kube](https://docs.podman.io/en/v4.2/markdown/podman-play-kube.1.h...) but haven't gotten around to it.

          I like Podman has a lot to offer for self-hosters but it isn't popular (yet?)

          • jdoss 1 year ago
            I like it a lot. I moved totally away from docker-compose and Docker a couple years ago to using only Podman and I haven't looked back. Using Podman Pods let's me isolate my workloads in their own namespaces and I can prototype a multi-service workload very quickly.

            If you check out the bash script on my ppngx project you can get an idea of how you could write your own script for your workloads. I can run ./start.sh over and over again and it will replace the running containers with my changes which is a very fast DX.

            The README on ppngx talks about using the podman generate systemd command to create units from the pod so you can run them via systemd, but this command is being deprecated in favor of using Quadlet [1] (systemd generator) to crate the units on the fly. I haven't gotten around to using it since I like to have more control over my systemd units. I could see Quadlet being very good for users that don't know the inner workings of systemd and podman.

            1: https://docs.podman.io/en/latest/markdown/podman-systemd.uni...

        • edward 1 year ago
          I love paperless-ngx but I wish it had a rotate button. Some of my document scans are upside down.
          • diarrhea 1 year ago
            I don't think I'd be comfortable with it having elaborate editing functionality. PDF editing in a browser is finicky, and an enormous bug fest.

            I do PDF editing offline, on the desktop, then re-upload to paperless. Not the most integrated flow, but much more bulletproof. I want the PDFs themselves to be immutable once on paperless. Only metadata should be editable.

            • ttyprintk 1 year ago
              It keeps an “original” PDF and presents a working copy for modifications like OCR and metadata. Rotation is important for OCR, so rotate-and-redo is a worthwhile feature.
            • prometheanfire 1 year ago
              There is an issue about this, basically it's not going to happen because it is editing functionality. They suggest using another solution before import (build a pipeline).
              • ndsipa_pomu 1 year ago
                It does have rotate clockwise/anticlockwise
                • maweki 1 year ago
                  Where? I'm pretty sure my instance doesn't.
              • el_sinchi 1 year ago
                you can use an opensource tool for scanning, like NAPS2, which will let you rotate before you mail it to paperless-ngx
              • CommanderData 1 year ago
                I wish paperless-ngx included native advertising to printers for the "Sent to PC" feature.

                Last I checked it doesn't and had to run a separate service to advertise to the printer the paperless endpoint.

              • matrss 1 year ago
                I haven't been using it too much yet but I am really impressed by paperless-ngx so far. It just works(TM) and the auto-tagging functionality is surprisingly good, even with just a few documents in it.

                Does anyone have a good scanner recommendation though? I am eyeing the Brother ADS-1700W since it seems to be recommended often, but I would really like to use the "scan to webhook" feature (it's 2023 after all) instead of SMTP or whatever else are the options I would have with the Brother.

                • draugadrotten 1 year ago
                  Recommendation: https://www.quickscanapp.com/

                  I am using iPhone as a scanner and it automatically scans, OCRs, uploads and ingests to the paperless-ngx instance, even remotely using tailscale.

                  The iPhone camera is more than good enough for scanning documents.

                  • matrss 1 year ago
                    I don't have an iPhone, but on Android there is the "Paperless Mobile" app (https://github.com/astubenbord/paperless-mobile), which can be used to scan as well. There are just some documents that I would prefer to have in proper and consistent "document scanner"-quality; I am always having a hard time with lighting using those phone scanners (although Paperless Mobile is one of the better ones I have used).
                    • westurner 1 year ago
                      Would a document capture camera with a [ring] light also work?
                    • bretto13 1 year ago
                      I had used this app prior to the addition of the Paperless-ngx integration and it worked well, but with that functionality added it's just so easy to scan and be done. I have a Brother scanner as well that I'll still use to import longer documents or anything I want in the best quality, but for 95% of things importing from this app works perfectly.
                      • deanc 1 year ago
                        Thank you also for the tip on this one. Took a bit of work to get it working with my setup but have it working flawlessly.
                        • bovermyer 1 year ago
                          Thank you for this! This reduces the friction for scanning documents a _ton_.

                          I love that it integrates with Paperless so well!

                        • pintxo 1 year ago
                          I am scanning from my Brother multi-function device to an SMB share, which paperless monitors for changes. Works like a charm. You can even bulk move files there using your local file manager.
                          • rubenbe 1 year ago
                            Which type of brother printer do you use? And do you use it under Linux?
                            • senectus1 1 year ago
                              I'm using a Brother MFC-L3770CDW (Colour laser, with a duplexing scanner). Very reasonable price and super capable device, works fine with linux.
                              • moontear 1 year ago
                                There is a long list of supported scanners directly from Paperless: https://github.com/paperless-ngx/paperless-ngx/wiki/Scanner-...

                                Personally I go with Brother ADS-1700W. I don't use it under any operating system since it is Scanner > SMB share.

                                • alexdoesstuff 1 year ago
                                  Brother DCP-L2550DW here. One of the cheapest b/w multifunction devices with automatic document feeder and reasonable print and scan performance. Works like a charm on Linux, Windows, Android, and IOS.

                                  I am using it with [NAPS2](https://www.naps2.com/), which is brilliantly simple, multi-platform, free, and open-source.

                                  • pintxo 1 year ago
                                    Just one of their Color-Laser scanner/printer combos. Works like a charm for iOS, Linux, MacOS, Windows.
                                    • lakomen 1 year ago
                                      I wish I could selectively subscribe to comments on HN, but I have to comment to do that. So this is my subscription comment. #metoo
                                    • senectus1 1 year ago
                                      exactly the same setup here, but i also have paperless pointing to a mailbox that i use exclusively for sending documents to.

                                      all works perfectly.

                                      • pintxo 1 year ago
                                        I am using a paperless@<domain> address for this as well. Handy to archive stuff coming in via email.
                                    • tecleandor 1 year ago
                                      I'll start with Paperless NGX sooon, and after looking around for lots of document scanners with autofeed (that are quite expensive) I found that in my office they were getting rid of a big multifunction HP printer that was sitting unused since COVID and remote work, and I got that for free.

                                      I'll clean all the rollers and stuff next week and test it :P

                                      • andrew_eu 1 year ago
                                        I've had great luck with an Epson Workforce scanner. Originally I got it to scan ~10k family photos -- took about 1 hour and entirely smooth.

                                        In that case I scanned to a USB drive attached to the scanner (since each photo was a separate file). For Paperless I use the Epson Smart app, scan the document with whatever settings, remove/rotate pages as needed, and then share it to Paperless with Paperless Share [0].

                                        Many network attached scanners can scan to SMB, no device needed, but I kind of like the human-in-the-loop aspect. Since my Paperless server runs on an HDD next to the scanner I can actually hear once the file lands which is quite satisfying.

                                        [0] https://github.com/qcasey/paperless_share

                                      • daveguy 1 year ago
                                        Paperless-ngx + ScanSnap iX1600. Works with a samba share that is very easy to set up in Linux these days. Fast, easy, and you can have different scan profiles to set the destination folder. Push a button for the type and a button to scan. Paperless-ngx automatically files and tags reliably. It is saving me hours per week in filing. Can't recommend it enough. This is a personal system -- not sure how it would scale to 100k - 1M+.
                                        • WXLCKNO 1 year ago
                                          Almost 600 Canadian for that scanner. Is it mainly that's it's incredibly fast and can go through a stack of pages?
                                          • kstrauser 1 year ago
                                            I've had an iX500 for a few years. You're right, and it's also a complete tank. I deployed several of them to a doctor's office that had to scan lots of paperwork every day, and they all worked perfectly all the time, every time.

                                            They're the Brother laser printer of scanners.

                                            • daveguy 1 year ago
                                              Fast, stack of pages, but also compatibility with different destination types, ease of setup, and no cloud account required (looking at you Raven) and your PC doesn't have to be on for network scans (direct, not passthrough).

                                              But yes, the scanner is pricey. It was definitely an investment.

                                            • xattt 1 year ago
                                              I’ve got an ix500 and I’m suffering for no SMB support.

                                              The only thing that comes to mind is either do a convoluted SnapScan Online -> Google Drive -> rclone -> Paperless or bite the bullet and figure out how to directly scan into the local box via USB.

                                          • somehnguy 1 year ago
                                            Paperless is one of my favorite pieces of software. A few years ago I got fed up with my filing cabinet full of folders & tons of documents that didn't quite fit into any of the categories.

                                            I installed Paperless on my home server & spent a night digitizing everything. After being comfortable with it for a few months I went back & shredded all my paper copies. Today my process is similar - when I get a document I would normally toss in that filing cabinet I just scan, upload to Paperless, and shred it. It's also really nice for storing large purchase receipts - I've previously had the writing on thermal paper receipts go invisible after a period of time, no longer an issue.

                                            Searching for something specific is so easy now! Huge QOL improvement. Just make sure you have a solid backup strategy, losing my Paperless database & filestore would be devastating.

                                            • itslennysfault 1 year ago
                                              Just curiosity... What does "ngx" mean in this context?

                                              To me it means Angular (the web framework). So, I was surprised to learn this wasn't an Angular plugin. Angular is often referred to as ng for short and as such their plugins tend to have ngx as a prefix. For example, the angular wrapper for ChartJS is ngx-chartjs.

                                              • georgehotelling 1 year ago
                                                Paperless started as "paperless" but the dev stopped work so another dev forked it to "paperless-ng" (for "next generation" I think). That dev, too, stopped work, so "paperless-ngx" was created.

                                                The paperless-ngx's core team focused on gathering a group of people to support it to avoid any burnout problems and keep the project sustainable.

                                                • luoc 1 year ago
                                                  The x was rather the transition away from a single maintainer to the org. Iirc that guy still sticks around
                                                • ydant 1 year ago
                                                  I don't know if it has a specific meaning. There have been multiple forks:

                                                  paperless (https://github.com/the-paperless-project/paperless) -> paperless-ng (https://github.com/jonaswinkler/paperless-ng/) -> paperless-ngx (https://github.com/paperless-ngx/paperless-ngx/)

                                                  • 1 year ago
                                                    • __jonas 1 year ago
                                                      As others said I'm not sure if the name relates to Angular but it's worth saying that the frontend is in fact Angular

                                                      https://github.com/jonaswinkler/paperless-ng/tree/master/src...

                                                      • jdoss 1 year ago
                                                        Paperless was a project and then it died, so it got forked to Paperless NG (Next Generation). Paperless NG died off and it got forked again to Paperless NGX.

                                                        At least that is my understanding following the Paperless project over the years.

                                                    • lhl 1 year ago
                                                      I set up paperless-ngx w/ a scanner attached to my nas and a bit of scripting to get the scan button working a while back, but then forgot about it.

                                                      For me, as someone who wants my docs on my own server, but well, doesn't care enough to want to constantly keep up with forks/changes/migration/updates, I've been looking for just something stable I can use for years (or maybe decades?, eg part of the appeal of something like Obisidian is that it just falls back to .md text files).

                                                      Curious if there are any long-term active users of this (or other systems) for handling all their paper and what they think about maintainability/longevity?

                                                      • nitsua2 1 year ago
                                                        I had the same concern as you when I started, and after roughly two years of use I’ve been impressed with how minimal the maintenance overhead has been.

                                                        So far I’ve probably updated the software ~5 times across various releases, each time I’ve updated it been because there was a new feature I wanted rather than needing to pull in fixes (the software has been bug free for me). The update process is well documented and very straight forward if you are using their docker compose setup to run the application

                                                        • Wool2662 1 year ago
                                                          I have been using paperless for years now. There was the 1 issue a while back when the original maintainer stopped and they had to fork it. But otherwise it's super stable. They keep to semver religiously and all your documents are neatly organised in original format on disk if you ever need them.
                                                        • sigwinch28 1 year ago
                                                          I am in the process of getting this running on a Kubernetes cluster in my home. That’s where I throw all self-hosted containerised applications these days. But there’s a bit of friction.

                                                          Their entrypoint script makes a lot of assumptions and in their docker-compose example they use a single container running supervisord instead of multiple containers, each with a dedicated purpose (ingestion, consuming, web server). The setup is almost insistent on logging to a file instead of stdout. It also checks and tries to modify permissions of some folders(!!). This requires quite a bit of unpicking.

                                                          This is doable, but not frictionless to get it to do what I consider “best practices” but I understand that it’s probably a mix of “easy for someone who’s day job is not to be an infrastructure engineer” and “we were using supervisord for baremetal anyway”. Maybe a lot of it is personal preference but I do feel like the project is not taking containerisation fully to heart. Maybe being more user-friendly in their eyes is more important than being a containerisation purist.

                                                          Either way, I’ve got it nearly working with my Brother ADS-1700W, which has shortcuts for me, my wife, and “joint”, which uploads documents to different directories via SFTP which then automatically have their paperless-ngx owner set appropriately.

                                                          • ornornor 1 year ago
                                                            I finally switched from my ancient Mayan EDMS running an outdated version on an Ubuntu 16.04 VM that I couldn’t upgrade because the Mayan docs for that version are not available anymore. I’m not a huge user but I shred everything I can and have around 1000 documents.

                                                            I have zero regrets so far. Paperless ngx is so much more user friendly, the automatic date extraction from OCR, the auto tagging and document type classification, and the ease to backup and restore sold me. I highly recommend it.

                                                            • justsomehnguy 1 year ago
                                                              > running an outdated version on an Ubuntu 16.04 VM that I couldn’t upgrade because the Mayan docs for that version are not available anymore

                                                              For years I was eyeing Mayan as one the variants I could use. Not anymore.

                                                              • ornornor 1 year ago
                                                                Mayan is also doing a good job but I think geared more towards businesses/enterprises and at least for the older versions long term support is an issue. It’s also just one guy, or at least it was when I last checked.

                                                                For home I’d go with paperless-ngx no contest, especially if you can run it in a docker container.

                                                                • jeauxlb 1 year ago
                                                                  Does paperless have the same support for workflows and indexes as Mayan? I use these two features heavily to automatically place documents into a hierarchy (eg payslips into their financial year). That and the ability to add arbitrary fields like 'parent', which then means I can created a linked list style association between documents, for example a series of correspondence. It's been a while since I looked at paperless and its forks, but I understand it's not quite built to have such extensibility/flexibility?
                                                                  • 1 year ago
                                                              • rmu09 1 year ago
                                                                I recently migrated from another (more "enterprisey") open-source EDMS system that shall remain unnamed to paperless-ngx. Can't praise this high enough. Where the other system needed multiple clicks for the easiest things and had a bunch of UI antifeatures, paperless has a very intuitive and well thought-out UI and handles ~30k documents without issues.
                                                                • tobi1449 1 year ago
                                                                  Has any paperless user found a good way to "deskew" scanned pages? Sometimes, when scanning from my Brother printer through the ADF, the pages are skewed/rotated and it can be pretty jarring.
                                                                • cgeier 1 year ago
                                                                  I'd love for this to be able to use something like s3 as a backend and (tax) audit prove archiving.
                                                                  • trallnag 1 year ago
                                                                    There are various FUSE-based file systems that use S3 under the hood.
                                                                  • 1 year ago