Silkenweb Example: Hackernews Clone

Building an email-to-calendar LLM

128 points by bookmark99 1 year ago | 43 comments

robertclaus 1 year ago
"Setting up LLMs to output structured data is incredibly hard." resonated strongly with my experience working in similar one-off projects. I've almost always implemented some level of fuzzy-matching to validate and convert the LLM output back into my expected structured format.
I've also noticed that the LLMs are much better at writing code than structured JSON (no real surprise given the popularity of code assistants). If it makes sense in the specific situation, I now have the LLM generate code and parse it into the right structure rather than requesting structured data directly:
`generate_event("I need to do X", new Date("1-1-2025"))` seems to be more reliable to generate than `{ "description": "I need to do X", "when": "1-1-2025" }`
- BoorishBears 1 year ago
  I really don't get what people are doing wrong here.
  I have a 7000 token prompt that generates JSON chugging away in production and at scale I'm seeing ~1 in 4000 generations require a re-generation, and even that could probably be killed with some basic "healing" code.
  OSS are prone to outputting garbage in my experience, but OP mentions ChatGPT:
  How are you running into issues if you simply prefill the response with ```json and set ``` as your stop token?
  Also, are people also just not trying to parse the opening and closing bracket and treating it as broken if there's a preamble? The prefill gets rid of the preamble, but if you're not willing/able to prefill, how hard is getting JSON out of a string?
- refulgentis 1 year ago
  If you're doing it locally, it's likely got llama.cpp underneath it somewhere. Ask the dev to allow specifying a JSON schema via using its grammar feature.
- el_nahual 1 year ago
  As long as you can sanitize the LLM output somehow. You should never `eval` LLM code straight from the tap!
  - dns_snek 1 year ago
    You shouldn't sanitize, if you're taking the approach described above, you should run it inside a minimal interpreter that doesn't implement any potentially dangerous APIs.
- Mathnerd314 1 year ago
  I think it's the training data, there is not a lot of JSON. It's much easier to get it to generate list-style data, like "foo:\n* prop1 - val1\n* prop2 - val2", or similar formats, as the models seem to have seen a lot of that sort of data.
- bboygravity 1 year ago
  You're aware ChatGPT4 has a json only mode?
- nosefurhairdo 1 year ago
  Some ideas in this forum thread on the same topic: https://genai.stackexchange.com/questions/202/how-to-generat...
- Zetobal 1 year ago
  openai, Claude, Mistral large and all models that you can infer with ollama have JSON only modes!?
jakecodes 1 year ago
Hey! It's awesome to read other people's solutions to this.
I've been working on solving this for the past 2 years or so and I went through much of the same struggles in the beginning until we came up with a solution which is fairly complex, to get LLM's to output data in a way we can use.
The big problem is that 95% accuracy is not good enough for calendars. People lose confidence after 1 failed attempt. Trying to get LLM's to output JSON can have a 1 in 1000 invalid JSON problem which is unrecoverable. What I wound up doing is training models for the tasks with tremendous amounts of data. I did not use OpenAI's models as they were not right for the job. Would love feedback.
convoke.ai
- ljlolel 1 year ago
  You can just force it to output into a specified BNF grammar this is quite easy
  https://www.imaurer.com/llama-cpp-grammars/
- mofosyne 1 year ago
  Due to the vagueness of human language, could we also output a degree of confidence in the translation
politelemon 1 year ago
I'm looking at the AppScript site.
Is this like PowerAutomate, but for G Suite products?
Also I don't know why, looking at the site makes me think it's a candidate that Google is going to kill off without warning: https://www.google.com/script/start/
- simonw 1 year ago
  Apps Script is positively ancient at this point, first released in 2009: https://en.wikipedia.org/wiki/Google_Apps_Script
  It's one of the Google products I worry least about, mainly because there are 15+ years of existing Google Sheets documents that people have built using it at this point. I don't think even Google would lightly break THAT many of their existing (often paid) users.
  - htrp 1 year ago
    > I don't think even Google would lightly break THAT many of their existing (often paid) users.
    Says everyone using GCP services that get deprecated.
    - sanity 1 year ago
      So true, today I was trying to decide whether to use Google's Vertex vector search in a project - until I remembered Google's track record with pulling the rug on services. Made it an easy "no".
- tmpz22 1 year ago
  It stands out that it doesn't follow the Design System of many other Google products - wonder if that means it has its own fiefdom.
  - dudus 1 year ago
    It's definitely quite popular inside Google itself, where all sorts of small and one-off systems are built and deployed entirely in AppsScript.
    And it has had significant updates as well. The newer runtime is based on V8 using some clever way to isolate the code. Previously they used Mozilla Rhino as the runtime because it was easier to sandbox, but it was also very frustrating to work with.
    Now the DX is much better with more recent and performant executions and a better UI.
    If I sound like a sucker it's because I am and building with Apps Script is fun even with its limitations.
    While Power Platform in comparison might be 100x more powerful for all I care and I still wouldn't touch it with a 10ft pole. Most frustrating experience I ever had.
denton-scratch 1 year ago
Just use CALDAV; it's designed for making calendar entries automatically via email. I'm not hip with the fashion for putting an LLM into everything. I think it's lazy.
- IanCal 1 year ago
  That's a great solution to a different problem. Unless caldav has a process for extracting dates and actions from unstructured emails? But that doesn't seem related to caldav.
  - 1 year ago
  - denton-scratch 1 year ago
    Point taken; you do need a CALDAV client to source or consume CALDAV messages, which are indeed very much structured.
- darby_eight 1 year ago
  IDK about "lazy", but it's certainly an extremely expensive solution.
  - dugite-code 1 year ago
    Isn't the expensive part of LLM's the training? My understanding is once they are trained they can often be optimized to run quite cheaply. Not as cheaply as a well designed program but cheaply enough it shouldn't be too prohibitive to run.
    - denton-scratch 1 year ago
      I'd love to be shown I'm wrong; but I thought most 'runtime' LLMs required a shit-ton of memory. Just downloading one seems to require more storage than I have on this laptop.
  - refulgentis 1 year ago
    A 3B model runs on Android phones from 2 years ago at 6 tkns/s.
    - darby_eight 1 year ago
      I'm not sure what you're comparing this to or how you're making this comparison—can you enlighten us?
      (Somehow I doubt whatever caldav software the above poster references takes more than a second to process multiple emails.)
- swsieber 1 year ago
  If I'm generating the data, sure, I'll use CALDAV.
  But for ingesting unstructured data I didn't generate, I'm going to reach for an LLM now (provided latency isn't an issue).
iot_devs 1 year ago
I am working on GabrielAI, which is a tool to filter and autodraft reply for Gmail and outlook and of course it uses LLM under the hood.
https://getgabrielai.com
In cases like this the author could just set up a filter like:
"If the email contains a task for me" (or some variation)
Then add a Gmail label to it.
In this way the author will immediately find all the actionable emails for him in a specific folder, much faster to skim and to keep track of all of them.
Another option would it be to have GabrielAI generate a Draft like "reply acknowledging the task and put a to-do date in the email in 1 week"
This would allow Google to track the email and the deadline.
- bookmark99 1 year ago
  author here. This definitely sounds all good. I'll be trying out your app. Funnily enough, when we were building this, friend pitched the idea you're doing (seems fantastic) on GabrielAI so we will be signing up for the beta.
  A bit classless, but do you mind if I reach out to you about your experience building GabrielAI?
  - iot_devs 1 year ago
    I definitely don't mind!
    Feel free to use the email address on the GabrielAI website!
    I'd be happy to chat about it!
    - bookmark99 1 year ago
      Sounds good. can't find an email on your homepage, but I sent one to the email on your privacy policy page. Might land in the spambox. Looking forward to it.
maliker 1 year ago
I used to use calendar integrations for these kind of things. Then I realized I'd prefer to have the low priority stuff disappear until I have to deal with, so I switched to followupthen.com and have been happy with it. A nice effect is that it creates a paper trail so I know how many times I've put things off.
rockwotj 1 year ago
You can also do this with https://shortwave.com (and get iOS/Android/Web)
https://twitter.com/Shortwave/status/1760723475923390598
bookmark99 1 year ago
My friend and I built a gmail add on to easily parse tasks from an email and add them to your calendar.
phillipcarter 1 year ago
The fact that GPT could reliably produce the right JSON structure but an open model couldn't is fascinating to me. It's impressive how far ahead OpenAI is.
- kkzz99 1 year ago
  How is this fascinating? One is a 175-1000+B parameter model the other is 3-70B parameter model.
  - phillipcarter 1 year ago
    I’m allowed to find it fascinating, that’s why.
jerrygenser 1 year ago
I have a probably janky habit of creating slack reminders and then rolling them over by some mix of 3 hours, next day, or next Monday.
ac50hz 1 year ago
I simplify my prioritization strategies to use only 2 priorities: High (do it now) and Low (do it later). There should only be 1 high-priority item at any time, and when it's completed, the (next) low-priority item becomes the single high-priority item.
Maintaining multi-level priorities requires more decisions to evaluate relative priorities of different tasks and possible priority re-evaluation when new tasks arrive. Throw some colleagues, friends or others into the mix and agreements on the decisions become more distant.
Within the low-priority list these are sorted on the date and time required. If you then choose to ignore the priorities or sorting, the deviation will take you down the priority re-evaluation rabbit hole again. It's then your choice to follow the process, or not. Avoiding adding complexity to task scheduling and processes ensures I have focus.
Of course, this will not be for everyone. Good luck with that LLM!
ekianjo 1 year ago
structured output by LLM can be achieved by using the Python outlines library
darkest_hour 1 year ago
[dead]