Silkenweb Example: Hackernews Clone

Advantages of Monorepos (2015)

183 points by Naac 3 years ago | 140 comments

lisper 3 years ago
It's very simple: with a monorepo you always have access to everything you need, together with a ton of stuff you don't. Whether or not this is advantageous boils down to whether the cost of not having access to something you need is greater than the cost of having access to a bunch of stuff you don't. As long as your system is reasonably efficient at letting you select small subsets of everything you could potentially have access to, the cost of having access to a bunch of stuff you don't need is essentially zero. Perforce is good at that. Git isn't. So people who use Perforce tend to think that monorepos are good and people who use git don't. And they're both right.
- lhorie 3 years ago
  I don't think version control system is the major differentiator for whether people like monorepos or not, tbh. Having a good incremental build/test system is far more important to developer experience, IMHO.
  The biggest dissonance when it comes to the purported benefits of monorepos is that a "good" monorepo generally assumes very good interface design skills across all teams, but in reality, the path of least resistance is tacking on more and more unique codepaths (e.g. forking/"rewriting" existing things), so in effect, likability often comes down to how well a team is able to isolate itself from global changes (by choosing stable/boring APIs, inventing their own abstractions in their own little corner, or what have you).
  - dylan-m 3 years ago
    > Having a good incremental build/test system is far more important to developer experience, IMHO.
    This is very true. Used poorly, monorepos are a crutch which allow a team to pretend that stable interfaces, versioning, and boundaries don't matter. Sure, your team can (theoretically) build the universe from a single git clone. Now what happens when another team needs to deal with your mess? What happens when you add some external dependency and now you have to deal with all of those problems anyway?
    [You also shouldn't use git submodules to solve this, because that's basically the same thing but with the added annoyance of git. You should publish your bloody packages. With version numbers. And changelogs. Real version numbers. Real changelogs. Written by humans].
    The author mentions the complexity barrier in open source, and I think that's a really interesting observation, but at the same time I think that complexity is the reason free software is alive today. It is definitely overwhelming for newcomers when a project requires a whole bunch of specific pieces that are all from different places. But once you've gotten past that, collaboration between a diverse range of people and organizations becomes an obvious and practical thing instead of a major undertaking. People don't all go off and write their own things from scratch[1], or clone code from place to place because it's too annoying to reuse it properly. Something internal feels similar to something external, which reinforces collective ownership.
    Consider that Chromium includes its own everything, takes half a day to build from source, and is decidedly not a community-run project. Debian, meanwhile, is the polar opposite of a monorepo and continues to be alive and well without the oppressive shadow of a single 600 ton schizophrenic gorilla.
    I think a lot of the time a team just wants a monorepo because they want a one-stop shop for fetching and building all of the things because internal dependencies are difficult. If that is the case, I think it's always worth considering something like BuildStream. It lets you specify where things are and how to build them, and it provides some useful tools on top of that. It doesn't solve brute-forcing a change across multitudes of applications, but it lowers the barrier to entry, it forces developers to care about deployment once in a while, and it can certainly help you to spot the integration issues when you change an interface without telling anyone.
    [1] People will laugh at me for saying that from an operating known for having more window managers than there are text editors, but really, have you seen some proprietary software projects?
- throwaway894345 3 years ago
  Honestly I would 100% do a monorepo every single time if there was good tooling for incrementally building and testing libraries. Having to rebuild every image from scratch for every single change scales miserably. Things like Bazel exist, but you basically have to have a team dedicated to operating it (maybe the difficulty varies by language, but it was a major pain when I tried to use it to build some relatively simple Python projects a few years ago).
  - sayrer 3 years ago
    This isn't really true anymore, in my experience. I've used Bazel with teams of 30-50 and no full-time maintainer, let alone a team.
    Once the migration is done, all you need is a few people that do some Bazel gardening every few weeks, and it's certainly not a full time job. This can be someone that does operations (CI, deployments, etc) or a product/infrastructure engineer, or one of each. Github / Gitlab scale to all but the largest projects, and even then, you can just split into two or three "monorepos" and kick the can down the road. With things like BuildBuddy, it's even easier.
    As the article states, there are a lot of little of hidden costs and paper cuts when using a many-repo layout. The one that I've seen that's most prevalent is that it obscures copy/paste behavior, since it's much more difficult to detect in a many-repo setup.
    Going to Bazel or equivalent is a bit of a mind adjustment, and some languages are better supported than others, but it really starts to pay off in larger projects. Especially if there's more than a few languages in use.
  - klodolph 3 years ago
    I have personally run converted build systems to Bazel, and use it for personal projects as well.
    Bazel 1.0 was released in October 2019. If you were using it "a few years ago", I'm guessing you were using a pre-1.0 version. There's not some cutoff where Bazel magically got easy to use, and I still wouldn't describe it as "easy", but the problem it solves is hard to solve well, and the community support for Bazel has gotten a lot better over the past years.
    https://github.com/bazelbuild/rules_python
    The difficulty and complexity of using Bazel is highly variable. I've seen some projects where using Bazel is just super simple and easy, and some projects where using Bazel required a massive effort (custom toolchains and the like).
    - lhorie 3 years ago
      My understanding is that the Python ecosystem is notoriously difficult to integrate w/ Bazel. Javascript is another ecosystem with a lot of fast and loose stuff going on during installs. Golang integration is way better. At work, we use wrappers over bazel (e.g. gazelle) mostly to handle things like auto-generation of BUILD files by parsing source code import declarations and the like. This takes most of the friction away, to the point that many folks don't actually need to understand Bazel to any significant degree.
  - AtlasBarfed 3 years ago
    Is there, say, IntelliJ support for Bazel? Do you need a central server?
    I've heard bazel is a bear...
    But... all mature build systems are, because they become essentially enterprise workflow engines, process execution engines, internal systems integration hubs, and schedulers. Why? Because that's what an enterprise/mature build system is, it only differs from other software packages with the same capabilities in that it concentrates on build / deploy / CI "business tasks".
    My current employer uses Jenkins (which has workflows/pipelines, daemons) and then feeds into Spinnaker (which has a full DAG workflow engine and interface) and likely this is pretty close to a "standard" or "best of breed" cloud build CI system. Of course there is a dedicated team.
    Oh and of course the gradle code build in github has its own pipelines and general Turing machine to do whatever you want.
    - jvolkman 3 years ago
      > Is there, say, IntelliJ support for Bazel?
      Seems like JetBrains has recently committed to some amount of 1st-party Bazel support: https://twitter.com/ebenwert/status/1506683612518887425
      And there's now a component in their tracker: https://hub.jetbrains.com/projects/Bazel
    - throwaway894345 3 years ago
      Yeah, in my experience Bazel is painful, but I've prototyped my own build system before. There's no fundamental reason it has to be a pain, I think Bazel just made really strange decisions and didn't pay much thought to helping people find the happy path. When I looked into it at least, the documentation seemed to assume you've used Bazel or something like it before.
      Jenkins isn't a solution because it doesn't understand the dependency graph and can't help you with things like incremental rebuilds. It's just a task runner component, which a distributed build tool would probably offer out of the box.
    - dastbe 3 years ago
      > Is there, say, IntelliJ support for Bazel?
      there is, but it heavily depends on language. It also has issues/makes choices with transitive resolves for things like java/kotlin, so you might have something that builds but the depencency is not resolved in intellij for autocomplete.
  - lupire 3 years ago
    > incrementally building and testing libraries.
    Like Make?
    - throwaway894345 3 years ago
      Make is not very well suited to this problem for a lot of reasons. Perhaps the biggest and least controversial is that it's not hermetic. Makefiles often make assumptions about the build environment, so a build that succeeds for one contributor will fail for another. I'm sure others have done a much better analysis of make vs bazel than I could do here.
  - wilgertvelinga 3 years ago
    Did you hear about nx.dev?
- anon23anon 3 years ago
  feel like the root problem companies run into when you don't have monorepo is shit gets locked down - e.g. I didn't even know this repo existed b/c I couldn't see it/clone it b/c of permissions. the other thing is lets say we have microservices - now I need to call your service - and most places are terrible at documenting things - especially if it's a new service which it probably is I'm trying to connect w/ it for the first time - now I have to figure out how to call your service - I can do that on my own my cloning your project and reading the code and bugging you but I'm way more less likely to bug you if it's part of the mono and I just need to open the code in the existing probject. I think this leads to a second point is mono does lead to more consistency and better knowledge sharing across codebases.
- cryptonector 3 years ago
  It's more than that. When you have to make changes that touch a lot of dependencies, it's much easier if all those dependencies
```
  - are in the same repo (making it easy to
    find and change all of them)

  - are in the same universe of build/test/deploy
    services (making integration of your changes
    atomic)
```
  Atomicity of integration is essential, especially in organizations that move fast and make lots of breaking interface changes. Where it's to make a breaking interface change, it will be OK IFF you can make that change atomic.
  Conversely, if you want to be able to make breaking interface changes, the integration and deployment of those has to be atomic.
  Not having a monorepo & monobuild means that you have to have stringent interface backwards-compatibility commitments. That's fine if you're shipping an operating system, say, but it's usually too painful if you're not shipping anything to third parties.
  For me, the atomicity feature is the killer feature of monorepos.
  - lisper 3 years ago
    But you can never have true atomicity like that unless you pull in all of the source for all of your dependencies. That means, for most people, the Linux kernel and the standard gnu libraries and utilities. That's a lot of source code. And then you have to maintain all of those. If you're Google, you can do that. If you're a startup, probably not so much.
    - cryptonector 3 years ago
      Correct, thus... monorepos.
      Now, for Linux, the kernel<->user-land ABI is deemed stable, so you don't have to coordinate updates with the C (and other) run-times.
      Other OSes did have the kernel and the C library in the same repository, so those have had the privilege of making their kernel<->user-land ABIs private. E.g., Solaris/Illumos, OS X.
      Now, obviously if you have a monorepo for your startup, you might not include the Linux kernel in it mainly because you probably don't want your devs changing the kernel unless that's integral to the startup's purpose.
    - 3 years ago
  - 8note 3 years ago
    Deploying atomic changes is much harder than writing them. having a host be updated atomically doesn't mean everything it communicates with has gotten the same change
    - cryptonector 3 years ago
      If there's no external linkage, then it's easy. If there is, then it's not. But usually the surface area of external linkage is much less than that of internal linkage. So, yes, there's value in this.
- parentheses 3 years ago
  This is massively oversimplified.
  - the cost of having access to more than you need: cognitive load and tooling for filtering, larger repositories require more tooling work to be performant
  - there's also the atomicity of change and past changes which one can see/understand
  - lisper 3 years ago
    How is that different from what I said?
- mikepurvis 3 years ago
  Years ago, Google had a gcheckout tool that would trace the dependency information for whatever project you were working on, and then selectively grab the portions of the monorepo that you were going to need for it. Maybe they still have that or it's evolved into something else; I dunno, I haven't been there in a really long time.
  Anyway, it seemed like such an obvious complement to the the perforce/monorepo style of working that I came away surprised that perforce wouldn't have hoisted such a thing into their product as a first-class feature. Tracing dependencies across a lot of different build systems is obviously not trivial, but it's not intractable, particularly if the tool is pluggable so that orgs can provide modules to handle their own particular approach.
  - jeffbee 3 years ago
    What you are describing is an artifact of the old Perforce which copied everything in your client to local storage. After the conversion to srcfs and piper, which was more than a decade ago, this became unnecessary.
    http://google-engtools.blogspot.com/2011/06/build-in-cloud-a...
captainmuon 3 years ago
One upside of smaller repos that I rarely hear about is that it forces you to think about versioning. If you have a monorepo, you often don't version individual components, you just have master that always builds. If your product is a user facing website, that is fine. But if you make releases, and have multiple components in different versions that have a stable API, and are expected to work in different combinations, then it is a real hassle. Of course you can tag individual library versions in a monorepo, but that is not the way of least resistance.
One place I've worked at migrated to a monorepo, the ATLAS experiment at CERN. It was not bad, although there were the usual problems with long checkout time. But it worked because we tended to version every single piece of software together in a big "release" anyway (to make scientific results reproducable).
- bentcorner 3 years ago
  This almost feels like a version of Conway's Law: you inevitably ship the org structure.
  Your dev tooling also influences the shape of the thing that you write. If you have a monorepo then it encourages you to ship a monolith that freely interoperates with itself. If you have multiple repoes that need to be versioned against each other, you will ship components with more stable APIs.
  So this means that if you ship a product within which customers are free to update portions of them at will, then using a monorepo will make things more difficult than necessary.
  And if you ship a single unversioned monolith to the world, then using multiple repoes adds unnecessary friction to working within the company.
- klodolph 3 years ago
  Google, at one point, had component versioning that was not just "build everything from the latest commit". Libraries within the tree would get tagged releases, and everything else would build from the latest tag of those libraries.
  This practice was abandoned, but I don't know the reasoning for why it was abandoned.
  - inoffensivename 3 years ago
    People hated that they couldn't make atomic changes across components. Google's monorepo means everybody has to move in lock-step, which is bad for everybody:
    * library maintainers must make sure they don't introduce any regressions to any users at all. There's no major version number that you can increment to let people know that something has changed. Development necessarily slows.
    * Library users must deal with any breakage in any library they use. Breakage can happen at any time because everybody effectively builds from HEAD. There are complicated systems in place for tracking ranges of bad CL numbers
    Monorepo isn't entirely to blame for this, but it certainly doesn't help. I've been at Google 15 years and I'm tired of this.
    - klodolph 3 years ago
      Question: Doesn't the same thing apply to managed services?
      Let's say you want to make a change to the filesystem. You can change the client libraries today, but old client libraries are going to be in production for weeks, or longer. Your filesystem service has to be backwards compatible with some weeks or months of filesystem libraries.
    - azornathogron 3 years ago
      Regarding tracking bad CL ranges: Ecosystems (outside Google) which use versioned packages have the same requirement. If some version of a package you depend on has a bug then you might detect it yourself if you're lucky but more likely you won't detect it, so you need to use tools to centrally track known-bad versions and check whether your systems are affected. Package repositories support removing versions that are known to be bad for the same reason. Most of the attention in these areas is on security related bugs right now, but that's really just a sub-category of the overall problem.
      I don't think the bad-versions tracking outside Google is any less complicated than the bad-CL-ranges tracking inside Google.
    - VMtest 3 years ago
      Your frustration has already been addressed in danluu article under the header "Cross-project changes"
      I believe he said that he wrote this article to avoid repeating the same convo again and again....
- WorldMaker 3 years ago
  Right, smaller repos add more friction to dependencies, that is certain, but flipside view of that is that it enforces API boundaries and thinking about systems building as SOLID components in their own right.
  That friction sometimes helps: If it is painful to update Dependency A because it usually means upstreaming changes to A's Dependency B first, for instance, that can often indicate a tight-coupling problem that in a class diagram someone might easily discover and refactor over lunch but in a systems diagram was non-obvious without that "update hell" pain. Solving such tight-coupling problems is hard, and it may mean living with the pain for some time, and while monorepos make that pain go away they never solve those coupling problems (and arguably make it far easier to strongly couple systems that you likely don't want coupled). It's a lot like turning off all the Warnings in your compiler; it makes the immediate dev experience a lot nicer, but it risks missing things that while not problems now may be problems in the future.
  I think there are also some benefits to using the same dependency managers for first-party components/libraries as for third-party components. The auto-updating of first-party versions is seen as a benefit to monorepos, but if recent and current CVEs have taught us anything you need to audit and update your third-party components quite regularly. Needing to also update first-party components/libraries with the same dependency managers has some benefits in terms of forcing a regular dependency update cadence, that then also benefits additional developer eyes on third party update rhythms. (Especially as increasingly more dependency managers pick up auto-auditing/security and CVE awareness tooling that runs on each update. There's more likely developer eyeballs on those audit reports if frequently run for first-party components and third-party components.) Dependency managers are their own friction in the process, but necessary friction for third-party components, and there are benefits to first-party components needing the same friction.
  As with most software development practices there is no objectively "right" answer here. Monorepos have less friction in a large org. Friction and pain are sometimes useful tools, despite few people "want" them in their developer experience. Systems design is hard and tight-coupling is often an easy solution. Looser coupling is often better, more resilient design that is easier to work with at the boundaries and the "I can trust this other team's repo to be a black box and they let me file bug reports as if they were a second-party vendor" level, which can be its own tool for avoidance of mental fatigue.
  - klodolph 3 years ago
    My personal experience in large orgs is that friction is a much larger problem at larger orgs than it is at smaller orgs. The friction was always much lower, day-to-day, at small orgs. (Small orgs front-load the friction somewhat... "Here, set up your development environment.")
    - WorldMaker 3 years ago
      That's a fair way to view it, and my experience is also that a lot of that friction at its worst tends to accrete around bureaucratic barriers and fiefdoms. That also plays out in its own ways to the compiler warnings analogy: if there's a lot of friction touching a particular code for bureaucratic reasons, often the bureaucracy doesn't go away in the monorepo case it just disappears until it painfully shows back up later in the process. For instance, multi-repo may add a lot of friction to even finding/getting access to the repo in the first place but once you have access after bureaucratic red tape, PRs may be painless. Yet there are certainly all sorts of monorepo horror stories of making an "easy" PR and then finding that PR get bogged down in a lot of politics as bureaucrats crawl out of the woodwork from PR pings (sometimes pings they themselves set and the PR creator isn't ever aware of until the PR is sent). The bureaucracy is much the same in both cases, the pain is very similar, in one case it is just front loaded and obvious. (Everyone knows "Oh, Bob owns that repo. You need to fill out these forms, take it to the castle next door, and look for the ogre to give the forms to. That's Bob. Then you can make PRs to your hearts content if Bob likes you and doesn't eat you." versus a troll jumping out from under a bridge to completing a PR that you never expected and demanding a sacrifice of some goats before you may cross the bridge.)
  - idunno246 3 years ago
    yea that friction is good. i wrote some code. someone liked it and added a dependency of their app on my app. i needed to update my code - all the sudden i was responsible for updating some other random app and ensuring it kept working - behavior we considered a bug and they didnt, so both code versions needed to work. the monorepo let them make an api where one was never intended to exist
- cryptonector 3 years ago
  Whether that's an upside depends. Mostly I think it's a downside.
bob1029 3 years ago
We've been doing this for a few years now. Biggest non-intentional thing that came out of it was that the entire team started speaking in terms of commit hashes.
Once a non-technical person learns that the entire state of a product/project/organization can be described by a hash, they will begin to abuse it for literally everything. And, I totally endorse this. Its incredible to watch unfold. An employee passively noting the current commit hash like its the time of day puts a bit of joy into my brain every time.
Everyone can speak this language. The semantics are ridiculously simple.
- chubot 3 years ago
  Hm can you give an example of that? Are they wondering if the features they care about are deployed?
  The linear version numbers could have an advantage in that regard. If you want to know if CL 12345 is deployed, and you know the current deployment is running as of CL 12350, then it should be in there. Conversely if it's less than that number, it's definitely not in there.
  git hashes also have good properties but I'm wondering how non-technical employees use them. Do they know how to dig through the git history?
- a9h74j 3 years ago
  > Everyone can speak this language. The semantics are ridiculously simple.
  Similar to the engineering/BOM-oriented semantics of everything is a drawing with a matching part number?
akshayshah 3 years ago
I hope that Amazon open-sources Brazil and the surrounding version set ecosystem someday. They're the only large company I know of that uses individual project repos at scale, and they've built tools that solve many of these problems. (I've never worked there, so I don't know how loved those tools are internally.)
Edit: I worked at Microsoft, which also uses tons of tiny repos (at least within Azure). I didn't encounter any good cross-repo management tools, though; apart from having a Jira-like ticketing system built in, Azure DevOps seemed quite a bit worse than GitHub.
- qznc 3 years ago
  At least, I would like to read more about it. A while ago, I collected some information [0] but I don't know it first-hand. One Amazon developer told me that Nix is the closest thing in the Open Source world.
  [0] http://beza1e1.tuxen.de/amazon_manyrepo_builds.html
lliamander 3 years ago
All these advantages really come down to making it easier to manage tightly coupled systems. That's great that the monorepo approach used by large tech companies with whole departments devoted to developer tooling can make that work.
However, I think the "polyrepo" response to most of these advantages would be to focus on decoupling your systems instead.
Take for instance:
> With a monorepo, you just refactor the API and all of its callers in one commit. That's not always trivial, but it's much easier than it would be with lots of small repos. I've seen APIs with thousands of usages across hundreds of projects get refactored and with a monorepo setup it's so easy that it's no one even thinks twice.
Like, that's really cool you can do that. But why are doing that?! Why are you breaking your API contract and forcing all of your clients to change all at once?
Of course, proper decoupling also requires good engineering. A polyrepo environment can still get horribly tangled, but the natural response to all of these tangling problems in a polyrepo is to move in a direction of looser coupling.
- jvolkman 3 years ago
  What is proper decoupling? In a properly-tooled monorepo, project A can't take a dependency on project B unless B is a public library or explicitly gives access to A [1]. Authors have full control over coupling.
  Sure, avoid changing the API contract. But when the time comes to change the API, you can 1) make the change backwards compatible and maintain both methods forever; 2) release a new major version and maintain both versions forever; or 3) just migrate the callers and immediately be free of all technical debt that would've been accrued in 1 and 2. This assumes internal clients whom you presumably can't break.
  1: https://bazel.build/concepts/visibility
  - lliamander 3 years ago
    Visibility is somewhat orthogonal. I mean yes, you can avoid coupling issues by preventing the dependency altogether, but really the interesting problems occur once you do have a dependency.
    Once you have the dependency, the loosely coupled approach means that (when possible) you avoid making changes to your API contract that would break your clients. I see the appeal of approach #3 that you suggest, but here's the problem I see with that (and maybe you have an answer):
    For any change that breaks the contract (outside of trivial things like renaming an identifier) you are necessarily either adding a requirement your clients may not be able to satisfy, or removing a capability they depend upon. Migrating your clients in that case is more than just a simple refactor; the client may need to re-architect so that it can adapt to the change in contract or even move to a new dependency altogether. If you're not the owner of that client, that means you are either interrupting the other team while they are forced to help you with the migration, or you are blocked waiting for them to have the time.
    In general, I would say the best approach to making breaking changes to an API is to use a deprecation process. That allows clients to migrate at their own pace. You can of course do that in either a monorepo or a polyrepo approach, but my expectation would be that the monorepo doesn't really provide you with any advantages in that case.
oceanplexian 3 years ago
I know some of the FAANGs do monorepo (Google being the biggest) but AWS does not.
A monorepo is an organizational mess when trying to manage and transfer ownership across thousands of teams, contain the blast radius of changes, unless you invest a ton of resources into proprietary tooling that requires a bunch of maintenance, since all the open source solutions are terrible at this and the whole data model is built around splitting out individual project repositories. And then after all that effort, why wouldn’t you just use tooling the way it was intended, and the way it’s used in the open source model, so you can partition your CI/CD without a bunch of hacks, and don’t run into bizarre scaling issues with your VCS.
It perplexes me people advocate for this strategy. All I can think is it’s another one of those cargo-cult ideas that everyone is doing because Google did it (So it must be good).
- throwaway894345 3 years ago
  Not having to submit and coordinate PRs across a dozen repos is a pretty tangible benefit.
  > unless you invest a ton of resources into proprietary tooling that requires a bunch of maintenance, since all the open source solutions are terrible at this and the whole data model is built around splitting out individual project repositories.
  Agree that there's a bunch of tooling needed to operate a monorepo, but there's also a bunch of tooling to sanely manage dozens of "microrepos" as well (when an upstream library changes, update downstream libraries' dependency manifest files to take the new version, run tests, report errors back to upstream, etc). I don't know of any open source tools that manage this problem, but I'm guessing they aren't high-quality if only due to the complex nature of the problem space.
  > And then after all that effort, why wouldn’t you just use tooling the way it was intended, and the way it’s used in the open source model, so you can partition your CI/CD without a bunch of hacks, and don’t run into bizarre scaling issues with your VCS.
  Because the tooling sucks, as previously mentioned. Many changes require touching many repos, which means coordinating many pull requests and manually changing dependency manifest files and so on.
  Ultimately, the "repo" concept is limiting. We need tooling that is aware of dependencies one way or another, and sadly all such open source tooling sucks whether it assumes the relevant code lives in a single repo or across many repos.
  - oceanplexian 3 years ago
    > when an upstream library changes, update downstream libraries' dependency manifest files
    As someone who's more systems oriented, ideally projects are locked in to a specific versioned dependency, and nothing changes unless a developer of a project explicitly asks for it.
    What I've seen is the opposite, someone owns a dependency and is lazy and and wants to perform a breaking operation, and rather than version the change or orchestrate a backwards compatible change, they use mono-repos to "solve" the problem. IMHO it's a bad pattern and leads to a lot of risk.
    - throwaway894345 3 years ago
      It's fairly hard to do that in a monorepo world, because (in theory) upstream can't merge anything until downstream tests are passing. Moreover, if you do break something (especially out of laziness), you get reprimanded. And if you really wanted, you could still require downstream approval for upstream changes.
  - erik_seaberg 3 years ago
    > when an upstream library changes, update downstream libraries' dependency manifest files
    This needs to happen periodically, when we have slack. Doing it continuously adds risks that aren’t really our job to take.
    - throwaway894345 3 years ago
      In my experience, if it doesn't happen continuously, it simply doesn't happen at all until something breaks (and then there's a bunch of finger-pointing at upstream even though downstream didn't update). Your first line of defense is that downstream tests are run before anything that affects downstream is merged. The next line of defense is stuff like canary deployments which allow you to minimize blast radius and roll back quickly. Obviously this depends a great deal on your risk regime--if you're SaaS this is probably fine, but if you're embedded this is a non-starter.
- giaour 3 years ago
  Amazon has Brazil versionsets and workspaces, which solve many of the same problems monorepos do. I really liked the way those extra resources let you organize "ad-hoc monorepos" from smaller repositories, but it's infrastructure that I haven't seen elsewhere.
  Since leaving Amazon, I've mostly worked with monorepos and wouldn't go back to multirepos without Brazil-style tooling.
  - ghoward 3 years ago
    Hey, I'm hoping you can answer some questions for me.
    I'm building a build system and a VCS (separately). I want to do it right.
    Could you explain to me what Brazil is? Is it the build system? [1] Or is it the VCS that Amazon uses?
    If it is the build system, then it appears that versionsets are literally just a list of dependencies with their versions to use for a build. Is that correct? If not, or if you can give me more detail, what are versionsets, exactly?
    Also, what are workspaces? Does this quote from one of the comments on the link match?
    > A workspace consisted of a version set to track and any packages that were checked out.
    [1]: https://gist.github.com/terabyte/15a2d3d407285b8b5a0a7964dd6...
    - dastbe 3 years ago
      there's nothing particularly magical and a lot the behaviors are actually similar to bazel, in ways. a version set is a directed acyclic dag of packages linked by dependencies. These edges comprise (more or less)
      * normal deps
      * compile/test deps (i.e. non-transitive)
      * runtime deps
      * tool deps (non-transitive, but also non impacting on the closure resolution algorithm)
      version sets allow for multiple versions of the same package to exist in them (ex foobar-1.0 and foobar-1.1), which has some benefit but in practice is just painful.
      dependencies are defined in a capital-c Config file. when you run brazil, it does a few things
      * it resolves all of the tools and makes them available on your path
      * it sets up some environment variables
      * it invokes your a build command defined in the Config file
      Config files can also declare outputs (there's also some canonical outputs), so you can use a query tool to ex.
      * get all jars in the runtime closure
      * generate a symlink farm of all client sdk configuration files
      the results of your build go into a build directory, and when you want to generate the runtime for a particular package, it will symlink together the outputs of all packages in the deps + runtime closure, which you can do on demand.
      version sets are updated by building new packages versions into them, which will rebuild the version set to make sure all builds pass. if they do, a new "commit" will be put onto the version set. you can also merge from one version set into another, where you can get packages and their associated dependencies merged in along with a full rebuild.
    - giaour 3 years ago
      I am not an expert on Brazil, but the link you shared matches my recollection.
      It's important to remember that Brazil covers a lot of ground, and many internal tools at Amazon are external to Brazil but rely on it and the way it organizes resources. So I (or others on the internet) may incorrectly call something Brazil if it's part of the larger Brazil ecosystem.
      Let's start with a repository and build outwards:
      You have some code (let's say it's a Java library) that does something cool. To compile it, you add a Brazil file to the repository root. This file specifies what Brazil packages your code depends on, how it should be built, and what kind of artifact it will produce. Once that file is there, you can run "brazil-build" to produce a Brazil package (which is just a jar with some metadata).
      You want to use this library in a web service, so you check out both repositories in a single workspace. Every workspace has a source versionset where it fetches dependencies, but if the code repository for a package is checked out locally, "brazil-build" will build and use the local version instead. You make some changes to the library and web service, then test how they work together by running the web service from within the workspace folder. This ensures that it is using your local modifications to the library repository before those changes have been merged.
      Once you're satisfied with the change, you open a PR with a brazil-integrated tool that can show changes to multiple repositories as an atomic change (a "change set"). The CI system for this tool uses a Brazil workspace to make sure your update code packages build together and that all tests pass.
      If the PR is approved and you merge to main, there is probably a pipeline watching your repository for changes that proactively rebuilds one or more version sets based on the merged change set. Any package in the version set that depends on your library will be rebuilt using the changes from your change set. So while a versionset is implemented as a list of packages at specific versions, it's best to think of a versionset as more or less equivalent to a monorepo containing all packages on that list, since changes that cross package lines can be built into a versionset in an atomic unit. (This is very helpful if you need to push out a breaking change.) A package can exist in multiple versionsets, which is of course impossible with monorepos.
  - dastbe 3 years ago
    very much this. there's also a cultural understanding that libraries/fat clients are equal or worse in terms of maintainability compared to services. equal in the sense that you have to treat them like a service, worse because it's easier to mess up and you don't get to control your rollout strategy.
    EDIT: Though while most teams have gone towards many smaller packages for their applications, I suspect that most would be better served by a team level monorepo. That gets you all of the benefits of monorepo locally and all of the benefits of manyrepo globally, and unless your project hits the ~200+ developer mark maintaining things will be stay tractable.
- radicality 3 years ago
  * Why is ownership management a mess in a monorepo? Can just decide “this team owns this folder hierarchy” etc. * ‘Contain blast radius of changes’ - is that actually difficult? Isn’t there tooling that figures out what changed and what dependencies need rebuilding? (eg Facebook buck)
  For context, I was for a very long time at FB so am definitely used to the monorepo way, and recently switched to place which uses github + many repos, and it feels so much worse.
  Honest question - how do you actually effectively share code between many repos? Example: How do I know that me changing my backend app’s API doesn’t break any other project in the company potentially calling it? It should be a compile/buildtime error for the other project, but how does that work if everything is in its own little repository?
  - dragonwriter 3 years ago
    > Honest question - how do you actually effectively share code between many repos?
    One way is: Each repo is a responsibility boundary and single source of truth, you use code from other repos the same as any other external dependency.
    > How do I know that me changing my backend app’s API doesn’t break any other project in the company potentially calling it?
    Changing an API breaks projects using it; you either do versioned APIs and/or coordinate changes with downstream consumers, the same as you would with an API with external customers.
    (Another way is “downstream projects checkout their dependencies and build against them as a routine part of their process.“)
    - sagarm 3 years ago
      > Changing an API breaks projects using it; you either do versioned APIs and/or coordinate changes with downstream consumers, the same as you would with an API with external customers.
      This is a ton of work relative to building/running the tests for all your reverse dependencies and fixing the call sites for them (up to a certain level of scale, of course).
      > (Another way is “downstream projects checkout their dependencies and build against them as a routine part of their process.“)
      This happens automatically in a monorepo. Any breakages are revealed as soon as upstream makes the change.
  - giaour 3 years ago
    > Honest question - how do you actually effectively share code between many repos?
    Locally, you can use an Amazon-internal tool to check out multiple repos and make changes to all of them. The tooling calls this a "workspace," but it feels very much like working in a monorepo since building and testing can happen at the workspace level.
    > How do I know that me changing my backend app’s API doesn’t break any other project in the company potentially calling it?
    In terms of change management, Amazon dependency graphs are managed as "version sets." Changes have to be built into a given version set, and that build will also rebuild any package in the version set that consumes the repository whose changes are being built in. (Usually, repositories are configured to build into one of the owning team's version sets on each commit to the primary branch.)
- ignoramous 3 years ago
  > It perplexes me people advocate for this strategy. All I can think is it's another one of those cargo-cult ideas that everyone is doing because Google did it (So it must be good).
  Not sure if it is a generic comment or a comment on TFA:
  i) If the latter, I'm compelled to point out that TFA doesn't nearly advocate for monorepos as much as it lists reasons why a few SV companies use it, how they use it, and what they get out of it.
  ii) If the former, then this blog post makes for a good read: https://tailscale.com/blog/modules-monoliths-and-microservic...
- jvolkman 3 years ago
  Let's not pretend that Amazon hasn't spent significant effort over decades building and maintaining their own (non-monorepo) build systems and tooling.
pbiggar 3 years ago
Monorepos are also great for small monorepos with just a few projects. The darklang monorepo [1] has a devcontainer that installs all the build tools for 4 projects which create 21 different services, using 6 languages, and building everything is one step.
In fact, it makes it so easy to add new stuff that I didn't even realize we had 21 services til I counted. My first guess was 12.
[1] https://github.com/darklang/dark
codenesium 3 years ago
Having been down the route of repos for every service I would always choose monorepo in the future. I could see separate repos for libraries. There is just too much overhead trying to manage multiple repos. With a single repo it's possible to build a package that represents all of your software vs being forced to version everything. Tasks almost always touch multiple services unless you are so big you have a team per service.
- Yeroc 3 years ago
  I don't agree with having libraries in their own repositories. For us one of the biggest costs to fixing bugs in libraries particularly was having to then update all dependent projects with the new version. By moving our libraries into a monorepo alongside all the consuming projects we got away from that busywork. It really streamlined things for us.
hardwaregeek 3 years ago
I agree that monorepos are great if you're using version control systems in their current state. But I can't help but wonder if it's a question of monorepos being good, or version control/tooling inhibiting other options. If you had a VC tool that could compose repositories with ease, that could understand multiple histories and allow for atomic commits across repos, perhaps monorepos wouldn't be the best? Or you could keep the monorepo, but allow a "lens" into a specific subsection.
Even with Dan's point about monorepos making tooling easier, if a VC tool had a good API, perhaps this point would be moot. Why is it hard to query files and repository dependencies? Should there be some way to model dependencies in your version control system? It'd be interesting to see someone tackle these problems in version control.
- throwaway894345 3 years ago
  > I agree that monorepos are great if you're using version control systems in their current state. But I can't help but wonder if it's a question of monorepos being good, or version control/tooling inhibiting other options.
  Ultimately the problem is that we need tooling which is aware of dependencies, and the repo abstraction isn't. Whether that code lives in a single repo or in many repos is fairly irrelevant, but keeping the code in a single repo is usually a fair bit easier for many things (especially when you're working in a single language since the language's build tools are usually well-suited for this basic case) and you don't need to manually update dependency manifest files, test how a given upstream change affects every downstream package, or coordinate half a dozen PRs for every change.
  - WorldMaker 3 years ago
    If you are using a dependency manager the repo abstraction, with multiple repos, does start to align as a useful node abstraction at the dependency graph level. For instance, if you are working in JS/TS, every repo has a top-level package.json file that is very easily consumed by tooling to discover dependencies. Github has a dependency graph that's pretty comprehensive for public packages as dependended on by public repos. For instance, the repos that depend on Typescript: https://github.com/microsoft/TypeScript/network/dependents?p...
    There's often less tooling available for private repository hosts and private package feeds, but dependency management from a per-repo standpoint is if not a solved problem in practice, an easily solvable problem. (Github has some tools for private repos if you pay for them. Other systems can borrow from the same playbooks.)
    (Other languages have similar dependency manifest files, most of which are similarly slurpable by easily automated tooling given the need/chance. Dependency discovery doesn't have to be a problem in multi-repository environments.)
    > test how a given upstream change affects every downstream package, or coordinate half a dozen PRs for every change
    Some of this is push versus pull questions. One developer needing to push a lot of changes is a lot of work for that one developer. Downstream "owners" needing to pull changes at a frequency is in some cases much tinier slices of work spread out/delegated over a larger team of people, many of whom may be closer to downstream projects to better fix and/or troubleshoot secondary impacts.
    Monorepos make push easier, definitely. Sometimes pull is a better workflow. (Especially if you are using the same dependency tooling for third-party components. These days given CVEs and such you want a regular update cadence on third-party components anyway, using the same tools for first-party updates keeps more reason to keep that cadence regular. Lots of small changes over time rather than big upgrade processes all at once.)
benreesman 3 years ago
Dan is diplomatic to a fault. Splitting repos on boundaries that aren’t necessary because of access control, legal obligation, or infrastructure constraint is for people who have nothing better to do.
All the big shops have multiple repositories. They all broke each one out grudgingly and under some kind of pressure.
- exitheone 3 years ago
  That's factually untrue. Google, Microsoft, Facebook, Twitter, Airbnb all have a huge monorepo. Splitting it out is obviously not necessary with the right tools.
  - benreesman 3 years ago
    As someone who spent a decade at FB I can assure you that we had as few monorepos as possible. No more, no less.
    The danger with mouthing off on HN is that this place is thick as thieves with people who actually do or did what the bloggers whinge on about.
    Though in this case, the blogger has worked at all three of MS, Google, and Twitter, so I wouldn’t be quick to disregard him either.
    - 3 years ago
  - jeffbee 3 years ago
    Google at least has separate repos for the linux kernel and android, and probably others (chrome/os/ium?). The hugeness of google3 is not in doubt, but the mono-ness may be.
  - williamsmj 3 years ago
    No. Twitter absolutely has multiple repos. Some of them are very big, but they certainly have more than one. Why they have > 1 repo is a long story, but the post you're responding to is in the ballpark.
  - disgruntledphd2 3 years ago
    And FB have multiple, really, really large monorepos.
    - benreesman 3 years ago
      So there was a (I hope flattering) joke behind Simpkins’ back that anyone of his broad-spectrum superiority, as he’s both a world-class hacker and conventionally handsome and consistently articulate that he must be carrying a “lizard”. His nickname was “the Iguana”.
      He spent a lot of time optimizing ‘hg’, and one assumes that the Iguana did what could be done.
- 8note 3 years ago
  Legal obligation and unnecessary are mutually exlusive
honkycat 3 years ago
The thing about monorepos is similar to the the thing about micro-services: they require a lot of tooling and discipline and documentation that most organizations do not have.
On our multi-repos I have consistently seen dozens, if not hundreds, of stale pull requests and branches and issues piling up never to be merged. This compounds with a monorepo.
Additionally, how do you avoid doing pointless builds when new features are pushed? I can only imagine what the `.github` folder in a monorepo looks like.
For me it is similar to the "one large file" argument, and why I don't agree: obfuscation is bad, but information hiding is GOOD. When I open a file, I want the information relevant to the current domain I am working in, not all of the information all at once.
Similarly, when I open a github page, I want its issues, pull requests, branches, and wiki to represent the state of a single project. The one I am currently interested in. You lose this with a monorepo.
You can argue "well tooling can..." yes tooling that does not exist and that I do not want to implement. Similar to the "one large file" argument, editors are set up to manage many different files with tabs. You COULD just compile the code and navigate symbols, but that isn't the world we currently live in.
- Orphis 3 years ago
  > Additionally, how do you avoid doing pointless builds when new features are pushed? I can only imagine what the `.github` folder in a monorepo looks like.
  It's simple, with proper tooling, you know exactly the dependencies, so you know which test depend on the affected files and can run those tests, the rest shouldn't be impacted. And that tooling exists. It's not the one you may be using, but it exists, and not just in FAANG.
jsnell 3 years ago
(2015)-ish. Significant previous discussions:
https://news.ycombinator.com/item?id=9562923
https://news.ycombinator.com/item?id=16362345
trollied 3 years ago
> With a monorepo, projects can be organized and grouped together in whatever way you find to be most logically consistent, and not just because your version control system forces you to organize things in a particular way. Using a single repo also reduces overhead from managing dependencies.
I don't actually understand this. You can do this with git submodules. It's just a directory structure. Can somebody please explain? If the problem is committing to multiple things at the same time for a point-in-time release, then the answer is tags. Rather than terabytes of git history for a gigantic organisation that has many unrelated projects.
A good example for you: Google releases the Google Ad Manager API externally periodically, with dated releases. How does having that in a huge monorepo make sense?
- nickdandakis 3 years ago
  Have you used git submodules before? I've only used them once and vowed to never use them again.
  It's effectively just a pointer to a hash, and ends up being useless for versioning + a really nice footgun for tracking upstream updates.
  The monorepo vs manyrepo tradeoff boils down to this:
  Do you want more complicated build + deploy tooling or do you want more complicated dependency management?
  If the former, pick monorepo. If the latter, pick manyrepo.
  - perrygeo 3 years ago
    This is the best summary of the topic I've read. Having used all three (submodules, monorepo, manyrepo) the only thing I can say with any certainty is - don't use submodules. The mono/manyrepo decision is not as clear cut but your description nails it.
    Edit: submodules IS a viable solution for truly third-party repos over which you have no control and don't expect to ever edit.
    - slavik81 3 years ago
      Have you ever tried contributing to Qt? I rather liked their use of submodules. https://github.com/qt/qt5
jkaptur 3 years ago
> the downsides are already widely discussed.
Does anyone have any useful pointers? I'm in such total agreement with the article that I actually don't know the counterarguments.
- cortesoft 3 years ago
  There are a bunch of downsides, although they are often just the opposite problem from what the monorepo solves.
  For example, the article states:
  > [In the other direction,] Forcing dependees to update is actually another benefit of a monorepo.
  What happens when the other teams that depend on your work don’t have the time/priority to update their code to work with the new version of your code? The ideal case that monorepo proponents tout is that the team updating the code that is depended on can just update the code for everyone who depends on them… however, that update is not always trivial and might require rework for code deeper inside the other teams projects. Maybe they are depending on functionality that is going away and it requires major work to change to the new functionality, and the team is working on other high priority things and can’t spend the time right now to update their code.
  What does the team do? Do they wait until every team who depends on them is ready to work on updating? Do they try to work out how the depending team is using their code so they can update it themselves? How does this work if there are dozens of teams that use the dependency? You cant have every team that creates core shared code be experts on every other team’s work. You can end up stuck waiting for every team to be ready to work on updating this dependency.
  Imagine if this was how all dependencies in your code worked, and every build task used the latest release of every dependency regardless of major version bumps. You might wake up on a Tuesday and your build fails and now you have to spend a week updating your project to use the latest version. Multiply this by all the dependencies and your priority list is no longer your own, you are forced to spend your time fixing dependencies.
  This is why we specify versions in our dependencies, so we can update on our own schedule.
  Of course, the downside of this is now you have to support multiple versions of your code, which is the trade off and the problem a monorepo solves.
  You are going to end up with downsides either way, the question is which is worse.
  - mvc 3 years ago
    > What does the team do? Do they wait until every team who depends on them is ready to work on updating? Do they try to work out how the depending team is using their code so they can update it themselves?
    versioned multi-repos may solve this for the team[s] demanding incompatible changes to shared code but any team who was happy to use the shared code as it currently is, and was expecting to also benefit from any upcoming compatible improvements will see only problems with this "solution".
    Better to give the new incompatible behavior a new name. Deprecate the old name. Then callers of the old thing can fix on their own schedule.
    - cortesoft 3 years ago
      > versioned multi-repos may solve this for the team[s] demanding incompatible changes to shared code but any team who was happy to use the shared code as it currently is, and was expecting to also benefit from any upcoming compatible improvements will see only problems with this "solution"
      Normally this is solved with semantic versioning... you pin to a minor version, so you get all non-breaking changes, but don't pull in breaking changes.
- rbetts 3 years ago
  A monorepo assumes all your IP is either open or closed or you need a very reliable way to extract the OSS bits and publish them to a mirror without putting exposure of closed source IP at risk.
- jeffbee 3 years ago
  The main downside that people always mention is it takes a long time to clone or pull a large repo. This is actually a flaw of git, not a flaw of the monorepo as a concept.
  - dundarious 3 years ago
    People can't use the concept, they must use an actual tool.
    The problem with these frequent monorepo discussion threads is that monorepos are at a significant disadvantage when it comes to good existing and available tools (especially open source ones), but most of the boosters work at companies that mostly use good existing and unavailable tools.
    I've no problem with the discussion of course, and largely agree with the conceptual superiority in many cases, but on the practical side, the downsides are still significant and IMO overpowering. I've worked at insanely profitable medium sized companies that would use a monorepo if the tools were there, but instead used svn+externals and then git+a very simple script implementing essentially the same thing as svn:externals. The latter is a great option, IMO especially if you flatten all dependencies to the project/top level (i.e., all transitive dependencies specified and versioned at the top level), as you don't have the A->B->C problem where A using an updated C requires work from team C, B, A; you can just do C, A. It also discourages deeply nested dependencies, and bounds dependency count somewhat, and provides a very explicit and conscientious view of your total dependencies. Updates are also easy to partially automate.
    - WorldMaker 3 years ago
      At least in the case of git a surprising amount of the monorepo tooling is making it upstream into git itself. I'm aware of engineering efforts from both Microsoft and Twitter that are in today's git (a lot of the work on things like the git commit-graph and git sparse checkouts in particular are designed for monorepo tooling, though in some cases benefit smaller repos too).
      Microsoft's monorepo tooling has been especially interesting to watch from an engineering standpoint as seemingly almost all of it has been in the public eye, open source, and in most cases upstreamed. VFS for Git [1] was one of their first approaches (simply virtualizing the git filesystem and proxying it through servers as necessary), and while portions of it will never be upstreamed (in particular because it needs OS drivers) it's all open source, a lot of concepts from it were upstreamed into git itself and VFS for Git is mostly considered legacy/deprecated. Microsoft's more recent follow up tool was Scalar [2], which started as a fork of most of the remaining relevant bits of VFS for git plus a repo config tool that helped setup sparse clones while the git CLI ("porcelain") for sparse cloning took a bit to catch up with what the "plumbing" could do. Most of that got directly upstreamed into the git "porcelain" and since that point so much of Scalar was upstreamed into git that the remaining tools of Scalar are now VCed directly in Microsoft's git fork rather than its own repo.
      In terms of raw engineering capability it seems we are in something of a golden age of monorepo tools available as open source, for those trying to use git for monorepos. Admittedly the tools may be available now, but that doesn't make them any easier to work with than the era when they were simply unavailable because there's often a lot of engineering work still to be done to keep the tools humming along (in bandwidth and hosting alone).
      It's just interesting to see more of the tools available transparently, sometimes because they still have benefits to even smaller scaled repos. (While VFS for Git is unlikely necessary for small/medium repos, there are some times where sparse clones can be handy at even medium sizes. A lot of the engineering work upstreamed to make sparse clones performant and capable indirectly benefit repositories of any scale in reducing filesystem reads overall and adding support for storing better computed caches on-disk such as commit-graphs and reachability bitmaps rather than repetitively rebuilding them in memory.)
      [1] https://github.com/microsoft/vfsforgit
      [2] https://github.com/microsoft/scalar
  - yboris 3 years ago
    I think you can clone just the last commit:
    > Provide an argument of -- depth 1 to the git clone command to copy only the latest revision of a repo:
```
  git clone -–depth [depth] [remote-url]
```
    - WorldMaker 3 years ago
      Git has some strange behaviors with shallow clones (trying to manage volume using --depth). Shallow clones are great for CI builds, but not so great with working copies used actively by developers.
      At this point in git the better tool is called "sparse clones" (using the --sparse keyword and some other associated tools). A lot of interesting engineering work has been put into making "sparse clones" very performant on "conical sections" ("sparse cones") of a repository at a time. (In the way of checking out a single sub-directory of a monorepo and just its history type of thing.)
    - jeffbee 3 years ago
      OK but it still takes a long time with millions of source files.
  - 8note 3 years ago
    Is that a flaw with git? Or a flaw with trying to use git for monorepos, vs some other change management built for that kind of repo?
- hkt 3 years ago
  Principle of least privilege springs to mind but I'm not familiar with the other issues.
denimnerd42 3 years ago
git seems like the wrong tool for monorepos so what is used instead if you can't immediately just build your own tools
- tom_ 3 years ago
  Perforce is ok. It has roughly the right model, it scales pretty well, and the tools are... good enough.
- WorldMaker 3 years ago
  Microsoft and Twitter, at least, have very publicly invested a lot of engineering work in making git a tool for monorepos.
Naac 3 years ago
I think its worth calling out that there are different types of monorepos.
For example, I've worked in a monorepo that was one giant binary, but I've also worked in a monorepo that was a single repo that contained 4 ish independent services ( but were all in a single git repo ).
no_wizard 3 years ago
I'm a big fan of monorepos. If they get too unwieldy or you need VCS granular permissions, you should use Perforce over git, but using either git or Perforce generally speaking I think works fine for monorepos. The tool has come such a long way from even 10 years ago, especially for front end codebases, but even for things like Rust the story is really strong.
It comes down to how efficient you can be with tooling. Thats the one thing that monorepos really do require, is a good upfront investment in tooling, and long term maintenance. However I've found the initial setup "cost" of setting up a complex monorepo with correct tooling is far outweighed by the simplified operative overhead of working inside it.
atx42 3 years ago
Our team is unique at our company, having a "monorepo" with 9 components versus the standard 1 component / 1 repo that other teams use. With maven, we can use one command to build any one or all components. If we split, we'd tell Jenkins how to build everything, but would say goodbye to simple local builds. Without introducing some more technology or complexity and likely specifying how the build works in two different places, I didn't see a good solution to this.
I mention this here, as maybe I'm missing some obvious solution.
paulvnickerson 3 years ago
How do you address the blast radius problem with monorepos? For instance, I want to have a single gitlab repo for postgresql clusters. Using jsonnet, I deploy and configure a cluster for each customer, and adding a new cluster is as easy as adding a config file.
However, my colleague explained that it's a bad idea because any config changes or accidental button presses on gitlab's ci/cd page can bring down or wipe out everybody's cluster. How can that problem be mitigated? It seems intrinsic to monorepo style.
- pbalau 3 years ago
  Not sure why you got downvoted.
  The problem is with your deploy system. You can consider each of the clusters to be a service. Thus, a change in Service A (cluster A), should not trigger a deployment of Service B (cluster B).
  My pipeline is split in 2:
  1. on bitbucket, we run a pipeline that builds "build artefacts", docker images and "packaged" cloudformation templates.
  Each of these artefacts has a list of triggers, either base docker images or source code. I'm building the relevant docker image or cf package based on the triggers (it's quite a naïve glob() use).
  2. On aws side, I have something I call AWS Apps, in short a Stack Name, along with a set of triggers (the above build artefacts). On merge to main, I only deploy the AWS Apps affected by new build artefacts.
Thaxll 3 years ago
How do you manage versions / tags with monorepo? If you need to tag something ( a lib ) everyone gets the same, the entire repo now has a tag v0.0.1 eventhough only your library changed.
- pvarangot 3 years ago
  I worked on a team that was doing embedded software inside a monorepo with Linux and cloud stuff, so we needed to version our stuff because we were not doing "continuous deployment" flashing the uCs all the time a change on master happened. We just had big "feature branches" and those got rebased and merged weekly. For the cloud stuff, there was just one version and that version was what other team was striving to continuously deploy into staging and then production.
  It's not ideal but it was handy to have access to all the cloud and application code on the embedded side for stuff like interface definitions for communication protocols and stuff like that. On the same company I worked on another project where the definition files for the cloud interface where on a different repo we had to use a submodule and I preferred the monorepo.
  As other commenter said, we used bazel and there was indeed a smallish team that gave build support. Ramping up new hires on the build system was one of the more painful processes, I had to give support myself to teammates that had like a six month tenure only that they hadn't been there from day one.
- jefftk 3 years ago
  I think the most common option is you don't. The organizations that Dan gives as examples mostly don't produce public facing libraries, and when they do it's a separate process that lives on GitHub.
  Instead, everything is based around the idea that you check out the state of the world at some commit, do your build, whatever validation you need, and send it to production. You do this pretty often, ideally multiple times a week. Very occasionally you have an emergency where you want prod + cherrypick, and you generally build tooling that allows saying "build at this commit, but also with these later specific commits merged in".
- WorldMaker 3 years ago
  Technically in git you can namespace tags to your heart's content, it's a relatively free form naming structure, just like branch names. Basically the same rules even to the point that if you use slash separators (example: mylib/v0.0.1) some UIs will even give you a directory structure of tag lists. (On the flipside, some UIs get very confused, but that's not git's fault.)
  What monorepos I've seen rarely bother with tags in practice, in part because they rarely individually version libraries, but at a technical level you can do it in git, if you need to.
- jvolkman 3 years ago
  At least at Google: you don't. There is just one current version of your library. There are exceptions, but for the vast majority of things there is no concept of versioning.
pbalau 3 years ago
For multi repo I will need to build automation to manage all the repos and enforce a consistent experience across them, including syncing the repos, if we end up using stuff like submodules. And I need to do this now. We tried to "trust" every repo owner to do the right thing, but it was a cluster fuck.
With monorepo, I had to set up things once and go on my merry way. And I will be able to kick the monorepo-is-too-slow-can down the road for a few years from now.
trasz 3 years ago
Monorepo is one of the features I really like in FreeBSD. It makes adding functionality that goes across layers - eg adding a syscall implementation, its manual page, libc stub, and making use of it in some userspace component - trivial, compared to the hurdles necessary in the Linux world, where you'd need to interact with kernel folks, libc folks, some random userspace project folks, and then wait until it goes into distributions.
wjmao88 3 years ago
Its Conway's Law, Your code organization is a reflection of your engineering teams organization.
The number of repos you have should roughly be equal to how many autonomous engineering "groups" you can divide into that work largely independent of other groups. Anything a group touches should probably be in the same repo as everything else that the same group touches.
Maksadbek 3 years ago
We use git with monorepos. The codebase is so large that git status command takes about 3-6 secs. Do you also use git with monorepos ?
switch33 3 years ago
Large repos make sense or don't make sense based on companies that work with large data or not based on predicate calculus and derivatives usually dealing with repos as well as stories and have more problems with ssds too.
There is lots of problems associated with ssds as well as large monorepos. There are more complicated than people realize but if you did google code jam it teaches them somewhat but needs to be explained too. There problem is stories sort of intersect with programming too. Clockwork with ssds needs to be reworked for google code jams. The problem is elixir sort of works with stories and programming. Predicate calculus and proof theories sort of are the only way programming will really make sense in a world full of ssds. Leveldb could be a more interesting problem for google code jams if it has some newer features too. Conflict resolution is tower of hanoi and that has problems with consensus algorithms and concat too.SSDs need to do derivatives for pieceing and parting software too and that is more interesting too.
liminal 3 years ago
Any suggestions for how to go from multiple Git repos to a monorepo? Preserving history would be really nice. I've looked at submodules and subtrees and both seem to have huge downsides and don't deliver the same benefits of a true monorepo.
- moojd 3 years ago
  Yes! I once had to merge a dozen or so repos into a mono repo. I don't have my script handy but git allows merging repos with unrelated histories into one repo while preserving the history.
  If I remember correctly, this is how you do it:
```
  1. Create a new empty repo for the monorepo
  2. For each repo, 'git mv' all of the contents into a new directory with the repo's name
  3. Add the repos to the mono repos as remotes
  4. Run 'git merge --allow-unrelated-histories' for each repo
```
  You will now have a monorepo with preserved history with each old repo existing inside of a sub-directory in the new monorepo
- urxvtcd 3 years ago
  You just want to merge the repositories, there are plenty of guides online. You need to realize that git has pretty simple internals, so the procedure looks like this: 1. Inside one repository you add a reference to the other repository with `git remote add`. 2. When you do a fetch, git will just download all the objects from the other repository. 3. Then you check the files out and make commit. You can do it in a few ways, for example, you might wish to preserve tags from all the repositories, but put them in their own namespaces, so you don't get conflicts. I wrote an answer on SO explaining exactly this: https://stackoverflow.com/questions/1425892/how-do-you-merge...
- Pathogen-David 3 years ago
  I've not used it extensively, but Josh can probably help here.
  https://github.com/josh-project/josh
  It's designed for making multiple Git repos from a monorepo, but I think you should be able to make a skeleton repo that represents your desired final monorepo layout and push your individual repos to the Josh subviews of that repo to combine them all.
  (A big advantage of this approach over the multiple unrelated histories is that you don't have the mass move commits since Josh will rewrite all history as if the files were always in that folder, so you don't have to worry about history of individual files getting broken.)
  - chrschilling 3 years ago
    You don't even need to make a skeleton repo first. By passing `-o merge` as extra option on the push to a non existing view, the merging of unrelated histories will be done by the server. See: https://github.com/josh-project/josh/issues/596
- rkangel 3 years ago
  Git as a system has no objection to having more than one 'initial commit'. It will happily take that branching history and merge it together. With a bit of branch renaming you can add extra remotes to your repo so both 'masters' are present. Commit to both to make the resulting directory structure not overlap, and then just merge. You'll end up with full history of both.
  I did a quick google and these instructions seem about right (without the delete step): https://gist.github.com/msrose/2feacb303035d11d2d05
- rubyist5eva 3 years ago
  With Git I’m pretty sure you can literally just add a new remote to a completely unrelated repo, fetch it and then “merge” any branch from the other two into the new mono repo (ie. git checkout master; git merge other-remote/master). If all your projects are otherwise in a top level directory inside the monorepo this should merge cleanly and then can just live beside eachother in the checkout.
  - WorldMaker 3 years ago
    You just need the `--allow-unrelated-histories` flag (the first merge) which git requires as a small sanity check.
3 years ago
MichaelMoser123 3 years ago
one problem with multiple repos: you may end up with multiple binary components, like shared libraries, static libraries, etc, where each binary is produced from the sources of a separate repository. Now it may turn out to be a bit tricky to track a given binary found in a deployment to its sources. (on the JVM you could partially get by without the sources, as you have good decompilers)
I have never worked with mono repos, but I guess that this task would be somewhat easier, given that all sources are under a single repository.
dqpb 3 years ago
Use a monorepo, but organize your code as if it will someday be split into many repos.
88913527 3 years ago
In my experience, the developer experience for juniors is too much. Yarn + Lerna is just too much of a learning curve. However, having one repo and on CICD pipeline is convenient. But we've decided to divest from them. Your situation may not match mine, and that's okay.
- pcmaffey 3 years ago
  Lerna is only needed if you're publishing multiple packages from the monorepo. If you're consuming your packages only within the monorepo for your various services, Yarn Workspaces is generally all that's needed.
- dimgl 3 years ago
  Just don't use Yarn + Lerna :) pnpm is amazing
  - 88913527 3 years ago
    Doesn't work older Angular projects. My monorepo is a hodge podge of technologies.
exfascist 3 years ago
I'd argue that the optimal configuration is really a compromise; use sub modules with a dvcs tool like git. You get the organizational benefits of monorepos with the isolation benefits of individual repos. Your branches go stale in weeks rather than days, cloning even with full history can be very fast, and you don't need to learn new tools when you change organizations.