Should you split that file?

126 points by vinipolicena 1 year ago | 96 comments
  • jameshart 1 year ago
    Underappreciated aspects of the C# compilation model:

    1: file agnostic. While files provide a scope for imports, in general the compiler can’t tell the difference between the same set of classes split across multiple files or crammed into the same file.

    2: unordered - files don’t import one another, or get processed in particular sequence. The compiler takes a set of files and turns that into a set of type definitions.

    2: partial types, which lets you even split a class definition between multiple files

    3: regions, to let you bracket of sections within a file and organize things how you want.

    No ‘one class per file’ rules. No ‘references at the top of the file to every other file you need’. You can reorganize code freely, adding or combining files at will.

    • aragonite 1 year ago
      That sounds like something I really wish were available for JS development.

      Used to have a setup where, during development, I write all my .js files as "script files" rather than modules, all placed inside a folder, and the "build-step" is simply concatenating all .js files in that folder. All the functions in each script file are globally visible from all the other script files and there's no need to import/export anything. No one should publish code in that form, but during development it was incredibly freeing and helps me concentrate on the actual programming tasks at hand rather than getting sidetracked by trying to figure out the right way to organize the code into modules, something that (imo) I'm in a much better position to do when the programming tasks have largely been solved.

      It's like writing a book. The natural process is to first create substantive content in fragments and then gradually for a chapter or section structure to naturally emerge from these fragmentary bits of content. Focusing too much on 'modules' during early development is imo akin to being preoccupied with spending time deciding what chapters and sections you want your book to have before even having a substantial base of content.

      Unfortunately the current tooling for JS doesn't really support this workflow as well as I'd like.

      • bbkane 1 year ago
        I think, by using a directory as "the module unit", Go gets similar benefits while being simpler.
        • pipe01 1 year ago
          If only cyclical references were possible
        • jcparkyn 1 year ago
          C# is also much better than most languages about being able to split code into separately compilable projects with references between them, and multiple applications referencing the same shared library. Trying to do the same in TS or python (without extra tooling) is an order of magnitude harder.
          • Too 1 year ago
            Are you saying that editing .sln and .csproj files can be done without tooling?

            It’s only convenient as long as you stay in visual studio. Not that I would write C# any other way, but the comparison isn’t exactly fair. It’s very easy to create reusable libs in both TS and Python.

            • jcparkyn 1 year ago
              I really meant "without third-party monorepo tooling", but SDK-style csproj files can absolutely be edited by hand (I never use the GUI for them, even in VS). Solution files, while an abomination, are mostly optional if you're not using VS.

              > It’s very easy to create reusable libs in both TS and Python.

              I've obviously been doing something wrong then, because I've spent many cumulative hours trying to make these work and there's always compromises. Ever tried making an app with create-react-app that uses a shared lib, with reasonable support for HMR (or even just a working live reload), type hints, breakpoints, de-duplicated dependencies, and linting?

              As for python, is it even possible to do with a standard python install? I can't remember all the issues I ran into, but the conclusion I came to is that doing it without something like poetry (at the very least) is so much effort that it's not worth it (and let's not even think about making it cross-platform). It doesn't help that every single source on the internet seems to recommend a different approach.

              Compare this to C#, where there's exactly one way to add a reference to a shared library, it takes one line in a csproj file, and it works with just about every type of application out of the box.

              Edit: To be clear, I'm talking about locally reusable libs, not published and versioned packages.

              • ygra 1 year ago
                .csproj files are trivial to edit without tooling (the newer SDK-style projects at least; the others are merely easy). It helps to understand how MSBuild works, though. .sln files are mostly needed for Visual Studio and they are not required. Fun fact: solution files can be built by MSBuild and are converted into a normal temporary MSBuild project first for that.
                • neonsunset 1 year ago
                  csproj is fairly similar to cargo.toml except xml, it is easily edited by hand or CLI tooling. For sln, you just use dotnet new sln, dotnet sln add and remove commands.
            • inChargeOfIT 1 year ago
              I find if you subscribe to the "Write libraries not software" mantra where your libraries/packages/classes are focused and adhere to the single responsibility principal, code organization kinda sorts itself out. But I agree with the author that organizing and commenting long files like this makes it easier to grok what's going on.

              But if you are clicking around a project and scrolling with your mouse, instead of taking advantage of your IDE's ability to use keystrokes to find and open files, jump to references/definitions, find and replace text, etc., you are handicapping yourself. It's worth the investment to learn.

              • nucleardog 1 year ago
                > But if you are clicking around a project and scrolling with your mouse, instead of taking advantage of your IDE's ability to use keystrokes to find and open files, jump to references/definitions, find and replace text, etc., you are handicapping yourself. It's worth the investment to learn.

                Really? I've observed the opposite.

                The developers that I work with that make heavy use of things like fuzzy file matching, go to definition, etc tend to have a _much_ harder time understanding the codebase.

                My theory as to why this is the case is that they're never building or taking advantage of any higher level context. Having to browser through the directory structure (assuming it roughly mirrors the code structure), having to look through classes, etc forces you to actually build some awareness of the greater context of that method you're looking at.

                Using "go to definition", you may find yourself in a method, then later you hit "go to definition" and find yourself in another method, and unless you've made a conscious effort to observe so, you may very likely have never noticed they're even in the same class and presumably related.

                Using "go to definition" you might end up in a "process" method. Knowing that's in the "Image" class in the "ImageResizer" module would probably add a lot of clarity to what that's doing before you even get started looking at code.

                When I need to onboard in a new codebase, I very much _don't_ use those tools until I have a solid grasp on the big picture.

                • PH95VuimJjqBqy 1 year ago
                  I just can't imagine there's any real correlation between a developers ability and their use of specific IDE features.
                  • radiator 1 year ago
                    Yes. I would have liked to add something more, but you have said everything there is to say about this subject.
                  • tikhonj 1 year ago
                    I try to organize most of my code as libraries too, but I still find having sections and subsections super useful. I sort of took that for granted because most of my code was in Haskell, and the Haskell documentation tool supports sections/subsections/etc natively, so I just used those to organize my larger modules.
                    • inChargeOfIT 1 year ago
                      I totally agree. And some long/complex files are simply unavoidable (shell/build scripts, config files, etc).
                  • polyrand 1 year ago
                    I kind of agree with the post. For me, this technique almost starts overlapping with literate programming[0].

                    One trick I use[1] is adding certain characters before the section name. e.g:

                      ¡¡ settings
                    
                    This makes searching and jumping between sections a lot easier.

                    [0]: https://en.wikipedia.org/wiki/Literate_programming

                    [1]: https://ricardoanderegg.com/posts/write-apps-in-single-file/

                    • tikhonj 1 year ago
                      The Emacs version of this is using ^L (the ASCII form feed character), which lets you jump around using Emacs's built in page navigation commands.

                      I've used this occasionally in Emacs Lisp code and it's pretty nice, but I've never done it in other languages because I suspect that non-Emacs-users won't be able to handle it :P

                      The Emacs Wiki has a whole page on this: https://www.emacswiki.org/emacs/PageBreaks

                    • vemv 1 year ago
                      In practice, the chances that N programmers with follow the pretty comment convention of a given file approach zero, as time passes.

                      Splitting into files is an approach that is easier to lint against (e.g. max 500 lines - a best-effort mechanism to foster SRP).

                      And it's not really problematic after mastering a few key aspects: jump to definition, and more interestingly Peek Definition.

                      It's also important to not traverse files top-to-bottom (which yes, is very tempting).

                      Optimal code traversal is tree-like, you jump from definition to definition, skipping over the irrelevant, and gathering necessary info on demand.

                      The file/line location of a function becomes irrelevant.

                      • solarkraft 1 year ago
                        > It's also important to not traverse files top-to-bottom (which yes, is very tempting). Optimal code traversal is tree-like, you jump from definition to definition, skipping over the irrelevant, and gathering necessary info on demand.

                        A (successful!) ex-colleague of mine read thousands of lines long files top to bottom.

                        I don't know how, but somehow this worked for them.

                        • vemv 1 year ago
                          Yeah I mean, it can be done, it can be helpful but is it efficient?

                          You could as well try memorizing the entire $framework API, which most people would agree that isn't the best use of one's energies.

                      • chrismorgan 1 year ago
                        I’ve been playing around with using extra indentation for these sorts of purposes in the last couple of years, mostly at the sub-function level but also in the larger file situation described in the article. I’ve found it to work so well that it has even been significantly influencing the design of the lightweight markup language I’ve been using for my own purposes for the last year or two and further designing as I go.

                        The minor rough point is that text editors probably don’t support mixing line folding techniques, so you probably won’t be able to fold sections and functions, which it would sometimes be nice to be able to do. Maybe at some point I’ll get round to making a hybrid fold expression for Vim to mix multiple fold methods, though honestly I find Vim’s folding pretty janky (my Rust foldexpr works fine when you load code, but start editing it and the fold tree rapidly falls apart in painful ways and it’s absolutely Vim’s fault) and would probably prefer to shift the entire thing to FFI for more flexibility.

                        Some languages/IDE ecosystems support other ways of doing this sectioning, like Visual Studio’s “regions”, which gets around the folding problem since it’s all in the language rather than being a mixture of language syntax and independently-chosen convention.

                        • emmanueloga_ 1 year ago
                          Things humans are bad at: organizing files, topological sorting, ontologies.

                          Things computers are good at: organizing files, topological sorting, ontologies.

                          The point is I wish we could just ask the computer to sort things out for us, show different views, etc. this would require a software system able to work with more granular code units, maybe something like [1].

                          I still have some ptsd from huge refactoring where a big part of the hassle was to manually create directories, move files around, fix imports, build files, cyclic dependency errors, sigh…

                          I suspect a good solution would be to have a language agnostic dsl to describe a program’s data flow, then use it to generate the boilerplate of functions involved. Then you could ask the system to show you only the pieces of the code strictly related to a certain feature, hopefully never again have to deal with file structure manually.

                          1: https://www.unison-lang.org/

                          • jerf 1 year ago
                            "The point is I wish we could just ask the computer to sort things out for us, show different views, etc. this would require a software system able to work with more granular code units, maybe something like [1]."

                            The problem I've seen with this idea is that in order for it to work, you have to be willing to add more metadata to your program, because programs themselves don't have the necessary data in them. You can make it mandatory to your language, but than just raises the friction for people trying to use that language.

                            But, if you're willing to add the metadata to a hypothetical new system for slicing and dicing code, why aren't you using the existing mechanisms in your system for organizing and documenting things?

                            If you are, then you make the delta between "it may not be perfect but here's some documented and organized code" and the perfect vision small enough to not be worth a lot of effort to chase.

                            If you aren't, then you aren't going to give the perfect all-singing slicer and dicer what it needs either.

                            Replace "you" with "coworker(s)" for the same dilemma, only sadder. And your generic strawman coworkers are going to be very upset when you try to get them to use this language... "it's so hard, it's always complaining about perfectly good code, I'm missing my deadlines because it's making me document things I'll never use, I'm just putting 'a' as the documentation for these things anyhow, oh look here's our boss ordering us to go back to our previous language because we're all spending too much time documenting rather than building new features".

                            Some things can't really be fixed by process.

                            • trealira 1 year ago
                              > The problem I've seen with this idea is that in order for it to work, you have to be willing to add more metadata to your program, because programs themselves don't have the necessary data in them. You can make it mandatory to your language, but than just raises the friction for people trying to use that language.

                              I'm confused by what you mean. Can you be more specific as to what sort of metadata that has to be embedded in the code would be needed?

                              I've never used it, but watching this Smalltalk-80 IDE demonstration seems like it could be a workable idea.

                              https://youtu.be/JLPiMl8XUKU?si=Hly_8m4GiaDSQmpn&t=269

                              Edit: A more modern example using Pharo. You can see in the video that he's not accessing files, rather everything is sorted into packages and classes.

                              https://youtu.be/HOuZyOKa91o?si=qibMeMOkKy7ao5Hb

                              • jerf 1 year ago
                                I mean that if you are asking for something we don't currently have, it pretty much by construction has to be something that involves more data and metadata. IDEs already do a lot of this sort of thing with all languages. Slicing and dicing by functions and such is not that hard and I've seen any number of variations on it, but coherent views of any sizable code base do not arise from any amount of that. If you want a coherent view of the code base, you're going to need to provide more. One such example would be literate programming, something that a lot of people look over and pine for the end results, but aren't actually willing to put the effort in that is irreducibly necessary to obtain such results.
                              • solarkraft 1 year ago
                                > order for it to work, you have to be willing to add more metadata to your program, because programs themselves don't have the necessary data in them

                                I'm kind of stumped by this assertion. What data is missing?

                                If it is required or necessary, doesn't it make it just data, as in an integral part of the code?

                                • emmanueloga_ 1 year ago
                                  I agree on the human friction part... I like this idea and yet I struggle to implement it in my own projects. I think the reason is that there's no good language yet to write that metadata.

                                  Even advanced type systems mainly focus on guiding how components connect, but they don't explain _why_ a program behaves a certain way or how the data is supposed to flow. Documentation is often unreliable as it becomes stale easily. As developers we spend and order of magnitude more time building a graph of data flow in our heads than actually writing code. Could we reify that graph?

                                  IMO that crucial metadata would answer the questions we would ask the Senior developer in our team when we need to work in the codebase:

                                  * When dealing with feature X, what are the involved software components Y and Z, how do we create instances of those?

                                  * What are the actual features the system implements? Is this the correct behavior? Is this what the system was intended to do?

                                  * If I point to a specific file and line in the project, what features does that code support? Why is that file there?

                                  Expressing the answers in a machine-readable format, in a language agnostic way, could look something like:

                                      System {
                                        Dependencies
                                          - Database depends on Logging
                                          - UI depends on Database
                                          - ...
                                        Feature A
                                          - Subtopic 1
                                          - Subtopic 2
                                        Feature B
                                          - ...
                                      }
                                  
                                      Feature A {
                                        requires database
                                        queries X
                                        if (database has data)
                                          produces Y, Z
                                        else
                                          doTheThing(X, "blah")
                                      }
                                  
                                  This is a broad idea, and the devil is in the details... This would not replace type systems or traditional languages. I'm thinking this spec-lang would have the minimum amount of flowchart-like constructs, and everything even slightly more complicated would rely on an interface. The system would implement all boilerplate and layout the project entirely. It would be amazing if we could also express state machines or even statecharts.

                                  Teams could use off-the shelf generators, or create a custom one, to produce the interfaces and boilerplate implementation. Since the system has a lot of information we could query it to answer questions like "give me all code of feature A", or even ask it to create symlinks so we can focus our attention in a single folder of files. Also different generators could create the structure that is more appropriate for different programming languages. It could be a good tool to assist rewrites, if needed for whatever reason. Tests/specs could be generated automatically, to a certain extent.

                                  I'm sure some people here will have PTSD about 4th gen languages and BPEL/UML nightmares :-). But even though BPEL is horrible I think the core of the idea is good and worth revisiting.

                                • crabbone 1 year ago
                                  > Things humans are bad at: organizing files

                                  Based on what do you make this statement?

                                  I don't feel like this is a problem. At least not for me :| Of course, size can make everything a problem, but for whatever projects I had to work with, organizing files was usually an entry-level problem that would normally sort itself out after a month or so.

                                  In other words, I don't feel like this is a problem worth solving.

                                  • MetaWhirledPeas 1 year ago
                                    > The point is I wish we could just ask the computer to sort things out for us, show different views, etc.

                                    I agree. I'm usually opinionated about a related subject: code formatting. But I'm not too opinionated in one case, where we're using Prettier. The project is autoformatted and expects everyone else to do the same. It's so refreshing to let the computer take over sometimes, even if the end result isn't exactly what I would have typed out myself. If a piece of software could so something similar with the folders, files, and code that would be pretty great. Maybe. It seems like a much harder problem to solve than simple formatting, without introducing a bunch of bugs.

                                    But you seem to be suggesting a different way of viewing the code, which I guess might be the way. But still underneath you'd have the spaghetti code, just sitting there not being dealt with, like a complicated Microsoft Word document.

                                    • solarkraft 1 year ago
                                      I do believe that text in files isn't the ideal way of expressing and structuring program instructions.

                                      But the friction to get something else established is so huge - all tooling would need to change ... unless perhaps one develops a way to translate between the internal format and text.

                                      • kmoser 1 year ago
                                        You've hit the nail on the head: any code written as a series of text files will lack the overall structure needed to easily navigate, refactor, and perhaps most important of all, render as different "views." For example, if the development environment automatically ingested the code and understood which modules were related to each other, it should be trivial to ask it to sort the functions by name, and/or group them by module, or anything else you can think of.

                                        When moving a function to a different module (e.g. moving bar() from foo::bar too baz::bar), the development environment should automatically change all references to foo::bar() to baz::bar(). And this is just the tip of the iceberg of what the development environment should do for you.

                                        The problem is, developers have come up with all sorts of clever hacks to do this searching and refactoring, including IDEs which do much of the heavy lifting, but at the end of the day they still fall short of what should be a simple task for the developer.

                                        As for additional metadata that would be required to make this work: objections that programmers would be reticent to provide this info are similar to objections to writing good documentation, and my response is: you reap what you sow. If you really want good, structured code, you have to write it, and you have to do so in an environment that accepts such structured data. You can't get blood from a stone.

                                        In short, for large, complex projects, our current method of writing code as textfiles is unwieldy. Unfortunately developers' visions are clouded by their hard-earned ability to parse those textfiles, and don't see this as a problem.

                                      • 1 year ago
                                        • norman784 1 year ago
                                          I was about to comment also that unison is an interesting approach, we need more tooling support before it can be possible, ideally there need to be a protocol that the new kind of editors can be built on top, similar to how LSP enabled editors to support different programming languages without a proprietary implementation.

                                          I'm wondering if the unison devs already have plans on working on a protocol like that.

                                          • 1 year ago
                                          • nightpool 1 year ago
                                            The overall advice here (add comments to your code to delineate different 'sections" of a file) is ultimately a bad hack to deal with the fact that your codebase is not structured well into function / modules / classes with a single responsibility and the right level of composition. Refactor your codebase to put the related code together, and then this "problem" will disappear and your code will be much easier to change in the future
                                            • tikhonj 1 year ago
                                              The problem is that real-world domains—or even just relatively complex software domains—do not decompose perfectly into 100% self-contained units. It's not even a limitation of any given programming language (although those limitations matter too!); we can't even have a purely conceptual ontology where everything neatly fits into exactly one slot in a neat hierarchy. That's just not how, well, anything works.

                                              It's definitely worth trying to have your code neatly organized in a way that maps to some clean conceptual model, but doing that well is going to require using every tool at your disposal—which includes ways of grouping code that don't need the sort of crisp definition and conceptual coherence of an in-language abstraction.

                                              Sections and sub-sections seem like a reasonable way to do that with our modern, painfully limited tools. If we weren't limited by needing everything to live in plain text, I'd also reach for something like a system of tags.

                                              • jbenoit 1 year ago
                                                1. The author links to this file as an example: https://github.com/Semantic-Org/Semantic-UI/blob/49b9cbf47c1... . How would you structure it better than it currently is without using sections?

                                                2. So you have a class that has a bunch of getters and setters. Let's just assume that "generate them automatically" is not an option. You want to make it really easy to see the part of the class which is getters, and the part of the class which is setters, and then skim past that. How do you do it?

                                                3. So you have a file that defines 3 data structures. Each data structure has a definition, a bunch of functions for parsing it, and a bunch of functions for serializing it. The author suggests that you split the file into 3 sections for the types, with subsections each for the definition, parsing, and serializing. How would you do it? Let's say the language is Rust or Typescript.

                                                • jaggederest 1 year ago
                                                  > 1. The author links to this file as an example: https://github.com/Semantic-Org/Semantic-UI/blob/49b9cbf47c1... . How would you structure it better than it currently is without using sections?

                                                  Dear God it's a 3300 line file. Any way but one long 3300 line file is a significant improvement. I'm being hyperbolic but seriously, instead of button.less it should be deduplicated (less can be much less verbose, pun intended) and be a button directory with several subcomponents in it, like each top level heading. Less is a serious language that you should use the semantic features to organize your code instead of just comments.

                                                  > So you have a class that has a bunch of getters and setters. Let's just assume that "generate them automatically" is not an option. You want to make it really easy to see the part of the class which is getters, and the part of the class which is setters, and then skim past that. How do you do it?

                                                  You put it into a getters file and import it into the parent class. Almost every language has features for this. Or you put each attribute into its own file, if any of the getters and setters has smarter logic than just x = parameters[x]. Ideally you build classes that don't have so many attributes that it's difficult to scroll past the getters/setters in the first place - N > 8 is a significant warning sign the code needs to be split unless it's a configuration class or equivalent.

                                                  > So you have a file that defines 3 data structures. Each data structure has a definition, a bunch of functions for parsing it, and a bunch of functions for serializing it. The author suggests that you split the file into 3 sections for the types, with subsections each for the definition, parsing, and serializing. How would you do it? Let's say the language is Rust or Typescript.

                                                  It should definitely be in 3 files, possibly 3 folders, in Typescript: (/thing/index.ts, /thing/parsing.ts, /thing/serializing/json.ts) It's so marvelously easy to import things, you should be using modules amply to split up your code. Obviously the 10-lines-of-import-for-3-lines-of-code is too much, but seriously, imports are easy and cheap.

                                                  • norir 1 year ago
                                                    > Dear God it's a 3300 line file. Any way but one long 3300 line file is a significant improvement.

                                                    I happily hack every day on a project that is self contained in a single 12K line (and growing) file. For me, splitting into multiple files would have negative utility. Everything is essentially in one place. I can find anything I need for my project with '/' search very quickly in vim.

                                                    My style is obviously not for everyone but it works great for me. I programmed for decades with traditional file splits and only in the last few years have I switched to single file. I have little interest in going back. For me, it is liberating to stop thinking about directory layout entirely. It also helps me to use simpler tools (I only use vim with no plugins) in part because I don't need help managing multiple files.

                                                    • jbenoit 1 year ago
                                                      > You put it into a getters file and import it into the parent class. Almost every language has features for this

                                                      Wait, what?

                                                      So you can do this in C/C++ for sure. #include is not just for imports. But Java? C#? How!

                                                    • chrismorgan 1 year ago
                                                      At least some of the sectioning in that file would probably be better handled by leaning into the language more, doing things like nesting selectors.
                                                    • cellularmitosis 1 year ago
                                                      Ah, yes, my favorite sort of HN reply. The author puts in a lot of effort to precisely articulate an issue which is difficult to articulate, and a commenter just dismisses the entire thing with a vague "You're holding it wrong".
                                                      • nightpool 1 year ago
                                                        The author didn't put any effort into actually trying to articulate a concrete example of a system that was already well-architectured but also needed section headers. All of the examples they brought up would be better-served by more sensible refactors that didn't go to one extreme (4 line files) or the other (2000 line files). I'm sure there are rare files where having section header comments really is helpful, but personally I've never found it a useful way to organize my code, even in >500k LoC projects.
                                                        • slingnow 1 year ago
                                                          Sometimes, you really are just holding it wrong.

                                                          Should the OP have written a manuscript to make you feel better? In your mind, should a long, drawn out article only be refuted by something of similar length?

                                                          • Bjartr 1 year ago
                                                            > Should the OP have written a manuscript to make you feel better?

                                                            Nope

                                                            > only be refuted by something of similar length?

                                                            No, but if you don't address the points that have been made, you're not refuting. The comment says "refactor your codebase to put the related code together", but the article already addresses some downsides to this approach.

                                                            Does the commenter believe that those downsides are more avoidable than the article states? Or maybe they believe the downsides are dwarfed by the upsides? Or maybe something else. We don't know, so we can't evaluate the position effectively.

                                                            • TeMPOraL 1 year ago
                                                              > Sometimes, you really are just holding it wrong.

                                                              Yup. We all are. This is only a problem because editing raw plaintext code as single source of truth is a bad idea, and we're reaching its limits. "Split or don't split" is one of many holy wars that can't be solved, because they happen on the Pareto frontier. The only way to move forward is to accept that different coding tasks need different representations, and the computer should synthesize them for us, and we generally should not touch the underlying single source of truth, anymore than we manually poke in assembly files generated by our compilers.

                                                              • cjaybo 1 year ago
                                                                They should at least address the actual points raised by the article (which does already touch on the trade offs involved in doing what the commenter suggests, none of which were addressed by their comment). As it stands it sounds like they’re responding to nothing more than the title of the post.

                                                                It’s a shallow criticism by HN standards.

                                                                • ziddoap 1 year ago
                                                                  >Should the OP have written a manuscript to make you feel better? In your mind, should a long, drawn out article only be refuted by something of similar length?

                                                                  It's not about "feeling better"? It's that the comment is very dismissive and sums up to "just refactor". Advice of "just refactor" (or "just do x") is lazy and ignores all context.

                                                                  • rustyminnow 1 year ago
                                                                    That would make me feel better. Without actionable advice, OP is just just grumbling. It's not helpful.

                                                                    > Refactor your codebase to put the related code together, and then this "problem" will disappear

                                                                    But how do I do that exactly? What does this even mean? I could literally put the whole codebase into one file since it is all related somehow or another.

                                                                • leghifla 1 year ago
                                                                  "Refactor your codebase to put the related code together"

                                                                  Of course it is a goal, but not always possible. You sometimes have two (or more) conflicting "relations" in the codebase. E.g.: you are dealing with taxes in various countries. Do you group by country or by kind of taxes (on sales, profits, earnings, energy, real estate...) ?

                                                                  The right answer really depends on how your team is organized and how you are making changes.

                                                                  • froggit 1 year ago
                                                                    > E.g.: you are dealing with taxes in various countries. Do you group by country or by kind of taxes (on sales, profits, earnings, energy, real estate...) ?

                                                                    Seems like you would be dealing with one of 2 reasonably well defined problems in this case. How to group would follow logically. It's either:

                                                                    1) "If i will be dealing with one specific type of tax at a time and how it is applied in different countries. (i.e. What are the sales taxes in UK, France, and Germany? )" Group by tax.

                                                                    OR

                                                                    2) "If i will be looking at one country at a time and their assorted taxes. (i.e. What are the taxes in the UK for income, sales and value added?)" Group by country.

                                                                    Else "i have failed to properly define the problem." that's gonna be the problem.

                                                                    • Bjartr 1 year ago
                                                                      How about when supporting both use cases? A multinational company might need the former for people making decisions across the whole company, and a country specific department doesn't need anything but what's relevant to that country.

                                                                      You could make two tools, but now you're duplicating implementation. Factor out a shared library/service/modularization-approach-de-jour? Good idea, but we're back to the question of how to structure things.

                                                                      You don't always get to enforce that "OR"

                                                                  • inChargeOfIT 1 year ago
                                                                    This is not always possible and a little short-sighted, especially when you are on a large, complex, established project with many contributors, where "just refactor 1.5 million lines of code, spanning 1500 files that have evolved over a 10 year period" is just something not open for debate.

                                                                    In those cases, and in the cases where a single large file just makes more sense or is unavoidable (IaC/build scripts, shell/sql/migration scripts, config files, etc.), the author's methods are absolutely valuable.

                                                                    • nightpool 1 year ago
                                                                      Try working on open source Minecraft projects. You will make these types of refactors monthly. The Bukkit/Spigot API has a 13 year pedigree, 10s of thousands of developers building against it, easily encompasses 1M+ lines of code across the whole codebase and needs to remain nimble in the face of unpredictable upstream changes. Watching the work some of these developers are able to pull off will definitely give you a healthy appreciation of what sorts of refactors are possible when you commit to them.

                                                                      > or is unavoidable (IaC/build scripts, shell/sql/migration scripts, config files, etc.

                                                                      This is a failure of the tooling in question. We shouldn't accept shoddily built tooling as an acceptable justification for unreadable codebases. Obviously, there are tradeoffs and sometimes you just need to get the tool out the door, but when a particular tool (like a professional IaC project) becomes some people's primary programming languages, then treating those languages like second class citizens who "unavoidably" are going to just be awful to work with is just limiting the growth of the tool as a whole. Terraform modules are a great example of this—all that ceremony and hassle just for what is effectively a single function call! If Terraform had single-file modules and a better import system than "relative file paths", I'd expect we'd see a lot more Terraform code bases with smaller root module files.

                                                                  • kraftman 1 year ago
                                                                    This is kind of related, but I've always wondered what a 2D or 3D IDE would look like, if we could drop the limitations of files and working top to bottom, and instead be able to group code more spatially (and i guess still have it then outpt to files for storage.) Does anything like this that isnt a GUI drag and drop editor?
                                                                    • bbkane 1 year ago
                                                                      https://www.unison-lang.org/ stores code in SQLite (last I checked) and looks super interesting.

                                                                      In a related idea, I wonder what organizing code imports by one or more "tags" instead of import paths would be like. Probably too confusing after a while (tag systems tend to get that way), but fun to daydream about

                                                                      • swizzler 1 year ago
                                                                        There is code bubbles. Early demo: https://www.youtube.com/watch?v=PsPX0nElJ0k current site (eclipse plugin for java projects?): https://cs.brown.edu/~spr/codebubbles/index.html. Maybe also inspired Visual Studio debugger exploration. I haven't used it (but have wanted to)
                                                                        • kraftman 1 year ago
                                                                          This is exactly what I meant, thanks!
                                                                        • SnooSux 1 year ago
                                                                          I like tools like vim and tmux for being able to quickly and easily split windows and show multiple terminals. If I could extend things out further than just my monitor that would be really cool. Or stack windows on top of each other so I could scroll through windows in a specific spot.
                                                                          • marwis 1 year ago
                                                                            JetBrains got you covered: https://www.jetbrains.com/mps/
                                                                            • mattxxx 1 year ago
                                                                              Hm this is a good idea. Or similarly, be able to make a "meta file" that maybe contains every function that was used in a chain... Cool thought
                                                                            • earthboundkid 1 year ago
                                                                              Are y’all reading files top to bottom? This is weird to me. I pretty much just follow execution using jump to definition, and I rarely read something top to bottom in a way that having sections would be helpful.
                                                                              • zogrodea 1 year ago
                                                                                The ML (metalanguage) family of languages like Standard ML, F# and OCaml intentionally enforce a one-way-flow where a function at the top cannot call anything below it (although there are ways to get around it).

                                                                                I like this approach to organisation because it helps avoid dependency cyles. https://fsharpforfunandprofit.com/posts/cyclic-dependencies/

                                                                                I once had to try and understand the entirety of a subsection by hopping between files that mutually depend on each other, which was quite confusing. UML diagrams I drew didn't help my understanding but drawing a dependency tree (the functions called that depends on no more are leaves and functions that call other functions are nodes) before I knew of this technique did help.

                                                                                This does have its disadvantages though so I think it's a good default in some cases but one should be given the choice to break it when needed.

                                                                                • earthboundkid 1 year ago
                                                                                  That makes the file backwards because main calls foo which calls bar, so it should be main, foo, bar; not bar, foo, main when you read it.
                                                                                  • zogrodea 1 year ago
                                                                                    That order makes sense as well and it has the advantage that you will often see how the leaf functions fit into the overall picture if you see how other functions depend on them. I just prefer some linear order over the alternatives.
                                                                                • okwubodu 1 year ago
                                                                                  Within individual files I tend to create a mental index of names and purposes from top to bottom before tracing, which usually helps avoid redundant jumping or branching so deep it becomes difficult to keep track.
                                                                                  • dclowd9901 1 year ago
                                                                                    So thankful for Typescript/VSCode _chiefly_ for this reason. Doing this in JS before was hard, if not impossible. Now it's so much easier to follow control flow.

                                                                                    That said, staying on topic with the article, I think the times where I have a difficult time following along are when it's just a monolithic function or class, and especially if it has implicit behaviors via decorators or mixins.

                                                                                    • jbenoit 1 year ago
                                                                                      So do you never look at a program and try to figure out everything that it does? You always just read one function at a time?
                                                                                      • earthboundkid 1 year ago
                                                                                        If I wanted a big picture, I would look at the docs to get the overview of the public interfaces. Once you're in the file, you're tracing execution for the most part.
                                                                                      • MuffinFlavored 1 year ago
                                                                                        maybe it’s a C thing where (without header definitions) you have to put main at bottom and anything it needs above it (same for functions that reference other functions/structs that aren’t main) to make the compiler happy?
                                                                                        • darth_avocado 1 year ago
                                                                                          Haha, me too! I was concerned for a second. Though, I do organize my files bottom to top with the innermost part of the execution at the top.
                                                                                          • gipp 1 year ago
                                                                                            I honestly wish more people would try to organize their files to be read top-to-bottom, it makes it much faster to grok something new.

                                                                                            But of course there are lots of other factors constraining the way a file can/should be ordered, depending on language

                                                                                            • cellularmitosis 1 year ago
                                                                                              Yes, the "inverted pyramid" concept from journalism. Most important details up top, implementation details down below.

                                                                                              In our (Swift) codebase, this typically translates into "publicly-consumable interface up top, private and internal functions down below".

                                                                                            • 1 year ago
                                                                                            • berlinquin 1 year ago
                                                                                              Have been using a version of this recently: with C++ in Visual Studio you can add `#pragma region X` that will let you expand/collapse regions of code in the same file. Can be useful for top-level organization.
                                                                                              • smusamashah 1 year ago
                                                                                                C# natively supports `#region <name>` and `#endregion`. IntelliJ supports the same using comments like `//#region <name>`.
                                                                                                • cglong 1 year ago
                                                                                                  TypeScript IDEs also typically support this via `// #region <name>` and `// #endregion`.
                                                                                              • jeffbee 1 year ago
                                                                                                Some of this may be potentially true for TypeScript programmers. I have no idea, since I never used it. However, sensible division of code into small files can have enormous benefits in C++. Throwing everything into giant files makes it annoying to write and run unit tests, or to reuse part of the code in a new translation unit. There is a noticeable difference between working in a code base where the tests link and run in milliseconds, compared to ones where it takes a long time to compile and link them.
                                                                                                • dvt 1 year ago
                                                                                                  This is a much larger discussion, and it's not as easy as adding headings, subheadings, etc. to your comments. There has been a clear cultural shift against writing comments and properly documenting code. This is probably due to several factors, including unreasonable deadlines, bad architecture, job security, or meaningless output factors (e.g. lines of code).

                                                                                                  This is why open-source code tends to be some of the best architected and documented code out there: it's pretty much the definition of "by committee" (in the best sense of that term) and meant to be inviting to anyone to contribute. So of course you want it to look nice.

                                                                                                  At BigCo, these factors don't really come into play (even though it would probably be better for the company if they did). There's also a lot of differences between writing software and building robots. Systems engineers build and refine requirements documents, pages upon pages of "this tiny part should be built like this, should have these constraints, etc." with traceability, discussion, and clear deliverables. Software, on the other hand, is unfortunately mostly just "patch a few things together and get the login form working."

                                                                                                  • mrkeen 1 year ago
                                                                                                    Where-clauses help break up long modules.

                                                                                                      somefunc = ...
                                                                                                        where
                                                                                                        subfunc1 = ...
                                                                                                        subfunc2 = ...
                                                                                                    
                                                                                                      otherfunc = ...
                                                                                                        where
                                                                                                        subfunc1 = ...
                                                                                                          where
                                                                                                          subfuncA = ...
                                                                                                        subfunc2 = ...
                                                                                                    
                                                                                                    Then the reader can instantly tell top-level concerns from implementation details, and the amount of jumping-around is reduced since those subfunctions are limited in scope.
                                                                                                    • hoten 1 year ago
                                                                                                      Here's a good rule to use, only applies in the extreme.

                                                                                                      When your cpp file is so large that you begin losing debug info in upstream tools like Sentry, you should probably split it.

                                                                                                      Related, when your functions are so large that they fail to compile under higher debug modes for emscripten (too many locals!), you should probably split it.

                                                                                                      (send help)

                                                                                                      • wruza 1 year ago
                                                                                                        Was doing that for ages. I have snippets com and com2 for first and second level full-width headers. Third level is just a comment surrounded by empty or “//“ lines.

                                                                                                        “;” cycles through headers.

                                                                                                        Never understood that “more import lines than lines of code” style too.

                                                                                                        • eternityforest 1 year ago
                                                                                                          Why do we not have rich text code yet? Headings could be large. We could embed diagrams with graphvis or mermaid. We could hyperlink and have tables of contents or even embedded runnable forms like a Jupyter notebook.

                                                                                                          Comment anchors and similar are close though.

                                                                                                          https://marketplace.visualstudio.com/items?itemName=ExodiusS...

                                                                                                          • marcrosoft 1 year ago
                                                                                                            In typed languages like go I don’t need to look at module imports linked to files for navigating. With the vim shortcut ‘gd’ I can immediately jump to the definition making this entire discussion about where to put things almost irrelevant. I would still try to organize for reading.
                                                                                                            • ilrwbwrkhv 1 year ago
                                                                                                              The rails world is filled with stuff like this:

                                                                                                              class Something

                                                                                                              def execute

                                                                                                                first_method
                                                                                                              
                                                                                                              end

                                                                                                              private

                                                                                                              def first_method

                                                                                                                second_method + 1
                                                                                                              
                                                                                                              end

                                                                                                              def second_method

                                                                                                                2
                                                                                                              
                                                                                                              end

                                                                                                              end

                                                                                                              This should have been just one method instead of thousands of tiny one line methods. I blame Uncle Bob and that whole group for brainwashing the Ruby community.

                                                                                                              • amadeuspagel 1 year ago
                                                                                                                Maybe plaintext is not the perfect way to organize code.
                                                                                                                • tadfisher 1 year ago
                                                                                                                  One of the best improvements in Kotlin (over Java) is the ability to have multiple top-level classes, functions, and static variables in one file. I hate jumping between 10 files when reviewing code.

                                                                                                                  You technically can do this in plain Java, but it comes with many caveats and probably won't work with a default build setup. And no, nested classes are not the same thing.

                                                                                                                  • beeboobaa 1 year ago
                                                                                                                    > And no, nested classes are not the same thing.

                                                                                                                    They actually are if they're public static. Quite literally so, nested static classes are the exact same as "normal" classes but named `OuterName$InnerName.class`

                                                                                                                    • dtech 1 year ago
                                                                                                                      They are not the same at all. One practical reason is that in Kotlin (and other languages supporting this) you can always move the class to a dedicated file if it or the file becomes to unwieldy. With a nested class all the imports would need to change, which is a no go unless you can refactor all code using it in 1 go.
                                                                                                                      • beeboobaa 1 year ago
                                                                                                                        This only matters if you are building a library intended to be consumed by other projects, and you are intending to release before deciding on a public API, and for some reason put multiple classes in a single file. Pretty specific case of a bad practice. Otherwise, just use your IDE's refactor capabilities.
                                                                                                                      • occz 1 year ago
                                                                                                                        It might be for the compiler, but the nesting implies hierarchy for the reader, which may or may not be desirable.
                                                                                                                        • beeboobaa 1 year ago
                                                                                                                          Same goes for putting them in the same file.
                                                                                                                      • 1 year ago