Show HN: Marqo – Vectorless Vector Search

62 points by jn2clark 1 year ago | 15 comments
Marqo is an end-to-end vector search engine. It contains everything required to integrate vector search into an application in a single API. Here is a code snippet for a minimal example of vector search with Marqo:

mq = marqo.Client()

mq.create_index("my-first-index")

mq.index("my-first-index").add_documents([{"title": "The Travels of Marco Polo"}])

results = mq.index("my-first-index").search(q="Marqo Polo")

Why Marqo? Vector similarity alone is not enough for vector search. Vector search requires more than a vector database - it also requires machine learning (ML) deployment and management, preprocessing and transformations of inputs as well as the ability to modify search behavior without retraining a model. Marqo contains all these pieces, enabling developers to build vector search into their application with minimal effort.

Why not X, Y, Z vector database? Vector databases are specialized components for vector similarity. They are “vectors in - vectors out”. They still require the production of vectors, management of the ML models, associated orchestration and processing of the inputs. Marqo makes this easy by being “documents in, documents out”. Preprocessing of text and images, embedding the content, storing meta-data and deployment of inference and storage is all taken care of by Marqo. We have been running Marqo for production workloads with both low-latency and large index requirements.

Marqo features:

- Low-latency (10’s ms - configuration dependent), large scale (10’s - 100’s M vectors). - Easily integrates with LLM’s and other generative AI - augmented generation using a knowledge base. - Pre-configured open source embedding models - SBERT, Huggingface, CLIP/OpenCLIP. - Pre-filtering and lexical search. - Multimodal model support - search text and/or images. - Custom models - load models fine tuned from your own data. - Ranking with document meta data - bias the similarity with properties like popularity. - Multi-term multi-modal queries - allows per query personalization and topic avoidance. - Multi-modal representations - search over documents that have both text and images. - GPU/CPU/ONNX/PyTorch inference support.

See some examples here:

Multimodal search: [1] https://www.marqo.ai/blog/context-is-all-you-need-multimodal...

Refining image quality and identifying unwanted content: [2] https://www.marqo.ai/blog/refining-image-quality-and-elimina...

Question answering over transcripts of speech: [3] https://www.marqo.ai/blog/speech-processing

Question and answering over technical documents and augmenting NPC's with a backstory: [4] https://www.marqo.ai/blog/from-iron-manual-to-ironman-augmen...

  • loxias 1 year ago
    I get your larger point, but the errors and phrasing are a bit off putting.

    Vector similarity alone _IS_ enough for vector search. That's literally what "search" means in this context! Finding another vector within an epsilon bound given a metric. After the 3rd read, I understand the point you're trying to make I think, and I think you might be right.

    There might be room in the market for an integrator, an all in one platform. It won't have the best performance or functionality, I doubt it would win in _any_ category. But if you can get the business model working right I could imagine such a product having sizeable market share. Hm...

    Edit: I'm also curious about the dimension and metric used. Any numbers about latency or size is kinda pointless without :).

    1 point in 1536-D space (what OpenAI uses),4 byte float == 6KB, so even 100 million points is only 600G...

    • jn2clark 1 year ago
      Regarding metric and dimension - it is really problem dependent as is throughput. Recall and latency numbers reported in benchmarks are typically on very well curated and structured datasets and average across all queries. Recall is not just a function of the HNSW algorithm. I can tell you though you can do 70M vector indexes with 768 dimensions <100ms including inference on very real world datasets. We will publish some benchmarks shortly as we are doing more evaluations on real world data. I also compiled throughput on open CLIP models here as well https://docs.google.com/spreadsheets/d/1ftHKf4MovnAyKhGyi05e.... If there are particular things you want to see let us know and we can add them!
      • loxias 1 year ago
        > it is really problem dependent as is throughput. Recall and latency numbers reported in benchmarks are typically on very well curated and structured datasets and average across all queries

        This is correct. :) Don't worry, I know enough to not trust any published benchmarks on this topic... (I'm also not your target market. I wrote my first "vector DB" in 2001 for music recognition.)

        I still think it's crucial to include just a few more facts though, because otherwise the statement is meaningless.

        Consider:

        A. "we can find an approximate NN match, euclidean, D=768, N=70000000, under 100ms on a modern laptop"

        vs

        B. "we can find an approximate NN match, euclidean, D=2, N=70000000, under 100ms on a modern laptop"

        vs

        C. "we can find an approximate NN match, euclidean, D=768, N=70000000, under 100ms on 1000x modern laptops"

        Notice how B and C aren't impressive, they're trivially beatable. :)

      • jn2clark 1 year ago
        I think it depends a bit on the definition of search here. It might satisfy a literal definition of search but not search as users would expect - which I think is the important point. IMHO vector similarity and vector search are conflated too much and solving search problems as users expect them requires more than similarity.
        • loxias 1 year ago
          I think you might be on to something, in thinking about it in terms of the platform from the perspective of the end user, and what they build on it.

          I humbly posit that you might be better off, at least from a communications/marketing perspective, ditching the "vector search without vectors" verbage because that alienates the segment that, uh, for lack of a better term, loves and understands high dimensional applied math, and computers. :)

          Perhaps instead find language that couches it as an entirely new category. Blue ocean. Ditch the word "vector" entirely.

          -$0.02

          • jn2clark 1 year ago
            Thanks for the feedback and questions - really appreciate it.
            • _false 1 year ago
              Why not semantic search?
            • rmilejczz 1 year ago
              Definitely, RAG programs often grab lots of unneeded context and sometimes miss crucial context. Improving this would be huge imo, for example in something like cursor.
            • blackkettle 1 year ago
              [dead]
            • bryanrasmussen 1 year ago
              I guess if you wanted to do decompounding and stemming you should make the fields with the stemmed values and the decompounded values yourself and ... then implement it for the queries as well? Or is there a way to do that kind of thing somewhere in there?
              • billythemaniam 1 year ago
                I found that stemming the text before generating vectors helps increase recall and the vectors still capture context, etc. However it does hurt precision because some information is lost by stemming. The more recent vector training algorithms are better able to capture semantic, syntactic, and contextual similarity without a lot of preprocessing. So I have found that vectors can replace all the nonsense that used to be needed to increase recall: stemming, manual synonym lists, etc.

                However vector similarity search only helps with the literal text search not ranking. Tf/idf, bm25, page rank, learn to rank ML, etc are still needed to rank documents. Whenever I find a new vector search engine, I always look to see what ranking features it has beyond vector similarity.

                • bryanrasmussen 1 year ago
                  I would want to do sort of similar to Lucene's support for both stemmed and non-stemmed fields together - so that you could rank the hit in the non-stemmed field higher than the hit in the stemmed field - so helping the precision.

                  In my experience this is more useful in complicated document searches.

                • jn2clark 1 year ago
                  At the moment you would need to do this yourself. It would be possible to have additional preprocessing to accommodate this though. Feel free to add a feature request here https://github.com/marqo-ai/marqo/issues. The other consideration is that you would want the distribution of the content and queries to match what the selected model was trained on.
                • Alifatisk 1 year ago
                  I guess its Vectorless vector in the same sense that we have Serverless servers?
                  • bryanrasmussen 1 year ago
                    probably stupid question - is there a way to use this to search over graph data - like some way to do graph embeddings here to map a graph to the vectors?
                    • jn2clark 1 year ago
                      Good question! At the moment if you have the abstraction of data -> model -> vector then it is amenable to searching like this. It will depend a bit on the use case though.