Data structures as topological spaces (2002) [pdf]

170 points by cpp_frog 1 year ago | 46 comments
  • mjhay 1 year ago
    Related to this, AlgebraicJulia has been doing a lot with applying concepts from algebra and category theory to data analysis and modelling.

    https://www.algebraicjulia.org/

    There's some blog posts that are also interesting:

    https://blog.algebraicjulia.org/

    • Verdex 1 year ago
      It feels like the following PDF better expresses what they're trying to accomplish.

      http://mgs.spatial-computing.org/PUBLICATIONS/lami-RR72--com...

      As far as I can tell, they're trying to model things like chemical reactions (and other stuff) where given a bunch of "stuff" in some solution it will combine with other "stuff" if it's in the same topological neighborhood (which I think is basically the idea that ALL of the "stuff" doesn't have to be next to each other b/c if you let a solution just sit there eventually reactions are going to react).

      It's kind of neat, but their website seems to indicate that it is not being actively supported. Or at the very least they don't seem to have any reason to make publicly available documentation after since around 2010.

      EDIT: So for example, you could use their "trans" transform primitive to implement conway's game of life by defining the birth rule as a pattern of an empty cell with three alive cells and a transform that results in the empty cell being alive and the alive cells being the same. The death rules in similar fashion. Neighbor here would be defined as being physically next to something (but the point is that because this is about topologies, then neighbor doesn't have to mean physical proximity ... although I'm not sure where that's defined ... in the collection maybe?).

      And then you just run the transform on a collection with some initial state.

      EDIT EDIT: Yeah, the notion of neighbor is defined on the collection. This allows you to use the same transform on different collection types and get the appropriate result.

      ALSO checkout figure 5 in the PDF I linked because it's an incredibly concise description of exactly what they're doing.

      EDIT EDIT EDIT:

      This also feels vaguely similar to what the Egison language is doing with their pattern matching. Documentation for Egison feels better at least to me.

      https://www.egison.org/

      However, I don't think that egison allows you to define arbitrary notions of neighbors in a collection like MGS does. But I haven't exactly tried to use it very much.

      • iamwil 1 year ago
        Figure 5 kinda reminds me of...Datalog? Where you have rules and a data set. Given the rules, you iteratively compute on the data set until there are no more rules that match the given data--the fixed point.

        Is that right?

        • Verdex 1 year ago
          I think that's accurate except it looks like you need to manually continue the applications of your transform.

          It looks like they've got a pure functional thing going on.

        • amelius 1 year ago
          I'm puzzled by what's happening on page 9 of that PDF.

          Ok, I quickly browsed through it, but a chemical reaction where two identical molecules react into those same molecules plus another molecule? How can this be possible?

          • Verdex 1 year ago
            I'm not up to speed on all of the things that computational biology is up to. But IIRC there are some metabolic pathways that cascade into producing a lot of whatever it is the organism is trying to produce. Which sounds kind of like what you're describing.
          • Cacti 1 year ago
            Is this for modeling a single reaction or chain of reactions? ie you don’t need neighborhoods if you are modeling 7 trillion of one thing combined with 14 trillion of another?
            • Verdex 1 year ago
              It looks like chains of reactions etc. As you say for something simple you don't need something complex.

              It's a DSL for computational biology where very little documentation has apparently made it to a public forum. I can decode the programming language part of it but I'm not up to scratch on computational biology so I'm not sure exactly what they want it for.

          • hackandthink 1 year ago
            Stephen Wolfram's ruliad seems to be some sort of concurrent topological computations.

            There are some beautiful pictures.

            https://content.wolfram.com/sites/43/2021/11/1110swimg46.png

            https://writings.stephenwolfram.com/2021/11/the-concept-of-t...

            • tudorw 1 year ago
              Fascinating, so would this mean an LLM is an approximator of a certain proportion of a 'ruliad'? Apologies for using GPT here, but it's a bit beyond my math to make a statement like that without reaching for my sidekick..."in a broad sense, one could conceptualize a Large Language Model (LLM) as an approximator of a specific, limited subsection of the ruliad" so I guess so, ish?
              • bibanez 1 year ago
                Considering the ruliad consists of all possible rules and their applications, you could say that. But it's oh so much more!
            • misja111 1 year ago
              How is this fundamentally different than considering data structures as graphs?
              • anon291 1 year ago
                Graphs are discrete, topologies are potentially continuous. Moreover, you can do different things with them such as create homeomorphisms to another topology much more easily than you can create bijections between graphs. In general, continuity lets you assume things that are impossible in discrete spaces. For example, many optimization problems are really easy in continuous spaces but really hard in discrete ones (linear programming for example)

                Given the recent success with vectors as a general model for data (as witnessed by the continued success with deep neural networks), it's an interesting discussion to have.

                • aleph_minus_one 1 year ago
                  > Moreover, you can do different things with them such as create homeomorphisms to another topology much more easily than you can create bijections between graphs. In general, continuity lets you assume things that are impossible in discrete spaces.

                  If you argue with more generality: why not consider sites (and, relatedly, topoi) instead of topological spaces then:

                  > https://en.wikipedia.org/wiki/Grothendieck_topology

                  > https://en.wikipedia.org/wiki/Topos

                  • anon291 1 year ago
                    > If you argue with more generality: why not consider sites (and, relatedly, topoi) instead of topological spaces then:

                    I thought linear programming would be something everyone knows, and I am not the original author so I can't speak for why they chose topological spaces instead of anything listed here. I think their e-mails are on the paper. Perhaps e-mailing them will help elucidate their choice.

                  • StevenXC 1 year ago
                    • Kalanos 1 year ago
                      so topologies are grids (e.g. coordinates) in this case? it's not a great naming choice as "topology" is frequently used to describe network/graph
                      • dlahoda 1 year ago
                        feels so true.
                        • mathgradthrow 1 year ago
                          Topologies are necessarily continuous
                          • xanderlewis 1 year ago
                            No, they are not.

                            In the usual mathematical sense of the words you are using, topologies aren’t even the right type of object to admit a notion of continuity. Your statement doesn’t even make sense. It’s maps between them that can be continuous.

                            In fact, a topological space is sort of the minimal amount of structure a set needs to have to be able to talk about continuity of maps to/from it.

                            • skhunted 1 year ago
                              You and OP are using the word continuous in two different contexts. Generally one would not say that the integers with the trivial topology is continuous. It’s a discrete space with a topology. But when someone says a space is continuous generally they mean not discrete.
                              • anon291 1 year ago
                                Perhaps the confusion is that I should have said topological spaces can be continuous. There are discrete topological spaces. Topologies (which I believe is typically used to refer to the collection of open sets in a topological space) are not functions or relations themselves, so I'm not sure a useful notion of continuity applies there, but if I'm wrong, please inform.
                                • bmacho 1 year ago
                                  This is not even false.
                                  • ducttapecrown 1 year ago
                                    You're going to tell me the discrete topology is continuous!?
                                  • JensW 1 year ago
                                    Discrete spaces can also be topological spaces, see discrete topology
                                    • HighFreqAsuka 1 year ago
                                      Did you not read the word “potentially”? Topological spaces are a more general case of spaces that contain the discrete case as a subset.
                                • whosthatguy 1 year ago
                                  > 1. selects a sub-collection B of A whose elements match the path pattern β,

                                  > 2. computes a new collection C as a function f of B and its neighbors,

                                  > 3. and specifies the insertion of C in place of B into A.

                                  This sounds a lot like the presentation of comonads as directed containers (e.g. https://arxiv.org/abs/1408.5809).

                                  • andoando 1 year ago
                                    I wish I understood what this paper was saying.
                                  • reuben364 1 year ago
                                    I'm left confused as to what the gluing in the rule replacement is. Must the boundary of a rule match on both sides? Also what examples there would be of what an example would of having topology that is not induced from a graph if it is at all possible.
                                    • andoando 1 year ago
                                      This is very interesting. Has anyone considered using paths/vectors in 2d/3d space as data structures?
                                    • anon291 1 year ago
                                      I wonder if this article is related to homotopy type theory at all, since they propose similar ideas.
                                      • Verdex 1 year ago
                                        Ultimately, I don't think so. All the HoTT stuff seems to really be focusing on constructing proof objects so that the computer can run them on mathematics which would otherwise not have any computer verification run on it.

                                        More or less advanced type checking for math.

                                        Meanwhile, the MGS language's topological collections and associated transforms seems to be about simulating things like chemical reactions. Not really verification so much as exploration.

                                        Although, to be sure, I think both are examples of how mathematics (even highly abstract mathematics like topology) can be useful to other disciplines.

                                        • Syzygies 1 year ago
                                          https://homotopytypetheory.org/book/

                                          I disagree. There's a migration from Haskell to Lean 4, which is influenced by HoTT, and is a credible general purpose programming language.

                                          Arguably, _the_ credible general purpose programming language, if one believes that programming should feel like doing mathematics. Languages are shaped by their target tasks, and writing tactics for proofs subsumes any other task one might consider. Programming is recognizing pattern, and pattern runs deep in Lean.

                                          When we're young, but past existential BS, we start to think that ten years of training will yield productive years that outstrip decades of muddling in ad hoc languages if we hadn't made the commitment. But soon, few want to make the commitment.

                                          If programming is ever going to become far more advanced, it will take the form of successors to Haskell and Lean.

                                      • richrichie 1 year ago
                                        [flagged]
                                        • xanderlewis 1 year ago
                                          It would have been slightly weird terminology even back then; it seems to have always been much more common to use the word 'children'.
                                          • penteract 1 year ago
                                            I suspect that English isn't the first language of the authors and there has been a translation from a language where all nouns are gendered. The phrase "an hexagon" also suggests that they don't pronounce 'h'.
                                            • 0x264 1 year ago
                                              That's correct. It's (and they are) French.
                                              • xanderlewis 1 year ago
                                                Ah — that makes sense.
                                          • HackerThemAll 1 year ago
                                            > So, the expression

                                            > 1, 1+1, 2+1, ():set

                                            > builds the set with the three elements 1, 2 and 3

                                            Regarding the "():set" part, and the "():something" idiom repeating in the article, is that from the same competition for the most absurd syntax where Golang got most of its awkwardness?