Show HN: We Built Open-Source Alternative to OpenAI's Operator, Runs in Chrome
9 points by casslin 4 months ago | 6 commentsI'm one of the creators of Nanobrowser, an open-source Chrome extension that lets you automate web tasks using AI agents. We were inspired by the potential of tools like OpenAI's Operator, but we wanted something that was:
-Open-Source:You can see the code, modify it, and contribute to the project.
-Browser-Based:No complex setups or server deployments. It runs directly in your browser.
-Customizable:You can tailor the agent's behavior to your specific needs.
-BYO LLM:Bring your own large language model API key (OpenAI, Anthropic,or even local models), No vendor lock-in.
-Privacy focused:All runs in your local browser.
So we built exactly that, also with a multi-agent system to optimize performance and cost efficiency.
We're launching today:https://github.com/nanobrowser/nanobrowser
we hope you like tinkering with it, and love your feedback!
- casslin 4 months agoForwarding tech co founder Alex's note to HN community:
Hi HN,
I'm Alex, technical co-founder of Nanobrowser. We have been working on this project to make web automation more accessible, powerful, and open.
In building Nanobrowser, we leveraged open source library such as browser use (https://github.com/browser-use/browser-use) , and Chrome Extension Boilerplate (https://github.com/Jonghakseo/chrome-extension-boilerplate-r...)
We completely rewrote the DOM processing part of browser-use with Typescript, built a multi-agent system and implemented the extension frontend with React, Vite and Typescript.
We hope you like tinkering with Nanobrowser! Checkout out our repo (https://github.com/nanobrowser/nanobrowser), create issues or open a pull reqeusts.
Thanks for taking a look! I'm excited to see what the HN community thinks.
(Alex was not able to send this comment out because he only registered his account today. )
- casslin 4 months agoWe built upon amazing open source techs such as:
-browser use: https://github.com/browser-use/browser-use -chrome extension boilerplate: https://github.com/Jonghakseo/chrome-extension-boilerplate-r...
We are grateful for everyone contributing in open source, and would love for you to join our community to help making open source AI automation better!
Even if you're not interested in contributing code, you can help by:
-Trying out Nanobrowser and sharing your feedback. -Suggesting new prompts and use cases. -Helping us build a comprehensive evaluation framework. -Joining our Discord community: https://discord.gg/NN3ABHggMK
We know it's early days, and there's still a lot to improve. We appreciate your support in such early times! Thanks for checking it out! We're excited to see what you build with Nanobrowser.
- FloatArtifact 4 months agoI love seeing tools like this. I could see a light LLM for classifying elements augmented by Voice Recognition for Accessibility. Natural language will never be a great interface for High Domain, Low latency use case such as accessibility.
- casslin 4 months agoThanks! Great idea-voice recognition is in our roadmap and exploring good open source options.
- FloatArtifact 4 months agoQuick clarification. Low domain knowledge is okay for those who don't have experience and don't know what to say, like Alexa. High domain is somebody who has expertise with a specialized workflow.
So, they will rely on voice commands for recognition, not natural language. Often one to two words to set a chain of tasks in motion. Think of having to control your entire computer, including navigating by voice. That would be very exhausting and inefficient through natural language. There needs to be a hybrid solution that can leverage low domain natural language, but also high domain command-based recognition. I cannot overstate how important of low latency between the beginning of a command and an action produced. High latency means a big cognitive load and not to mention just inefficiency.
There's a lot of overlap between UI automation and accessibility control tools. However, UIA automation has always been a slow process simply because the stack doesn't have the demand from devs for low latency.
It's a difference between having an independent agent do something on your behalf, not caring how long it takes, versus you waiting for a aynchronous task to be completed.
- casslin 4 months agoappreciate clarification. The low domain vs high domain distinction is spot-on:Latency kills expert workflows. keeping this in mind when integrating/designing voice recognition and more accessibility control options.
- casslin 4 months ago
- FloatArtifact 4 months ago
- casslin 4 months ago
- alexchenzl 4 months ago[dead]