Silkenweb Example: Hackernews Clone

OpenAI's o1-pro now available via API

131 points by davidbarker 3 months ago | 129 comments

davidbarker 3 months ago
Pricing: $150 / 1M input tokens, $600 / 1M output tokens. (Not a typo.)
Very expensive, but I've been using it with my ChatGPT Pro subscription and it's remarkably capable. I'll give it 100,000 token codebases and it'll find nuanced bugs I completely overlooked.
(Now I almost feel bad considering the API price vs. the price I pay for the subscription.)
- ldjkfkdsjnv 3 months ago
  As far as I'm concerned, all of the other models are a waste of time to use in comparison. Most people dont know how good this model is
  - dinobones 3 months ago
    Interesting... Most benchmarks show this model as being worse than o3-mini-high and sonnet3.7.
    What difference are you seeing from these models that makes it better?
    I say this as someone considering shelling out $200 for ChatGPT pro for this.
    - jbellis 3 months ago
      If you're in the habit of breaking down problems to Sonnet-sized pieces you won't see a benefit. The win is that o1pro lets you stop breaking down one level up from what you're used to.
      It may also have a larger usable context window, not totally sure about that.
    - Tiberium 3 months ago
      There actually were almost no benchmarks for o1 pro before because it wasn't on the API. o1 pro is a different model from o1 (yes, even o1 with high reasoning).
    - ldjkfkdsjnv 3 months ago
      I regularly push 100k+ tokens into it. So most of my code base/large portions. I use the Repo Prompt product to construct the code prompts. It finds bugs and solutions at a rate that is far better than others. I also speak into the prompt to describe my problem, and find spoken language is interpreted very well.
      I also frequently download all the source code of libraries I am debugging, and when running into issues, pass that code in along with my own broken code. Its very good
  - Hugsun 3 months ago
    How long is it's thinking time when compared to o1?
    The naming would suggest that o1-pro is just o1 with more time to reason. The API pricing makes that less obvious. Are they charging for the thinking tokens? If so, why is it so much more expensive if there are just more thinking tokens anyways?
    - Tiberium 3 months ago
      I think o1 pro runs multiple instances of o1 in parallel and selects the best answer, or something of the sort. And you do actually always pay for thinking models with all providers, OpenAI included. It's especially interesting if you remember the fact that OpenAI hides the CoT from you, so you're in fact getting billed for "thinking" that you can't even read yourself.
    - ldjkfkdsjnv 3 months ago
      I dont have the answers for you, I just know that if they charged 400$ a month I would pay it. It seems like a different model to me. I never use o3-mini or o3-mini-high. Just gpt4o or o1 pro
- jbellis 3 months ago
  Remarkably capable is a good description.
  Shameless plug: One of the reasons I wrote my AI coding assistant is to make it easier to get problems into o1pro. https://github.com/jbellis/brokk
- andrewinardeer 3 months ago
  I wonder what the input/output tokens will be priced at for AGI.
  - stavros 3 months ago
    They won't. Your use cases won't be something the AI can't do itself, so why would they sell it to you instead of replace you with it?
    AGI means the value of a human is the same as an LLM, but the energy requirements of a human are higher than those of an LLM, so humans won't be economical any more.
    - dnadler 3 months ago
      Actually, I think humans require much less energy than LLMs. Even raising a human to adulthood would be cheaper from a calorie perspective than running an AGI algorithm (probably). Its the whole reason why the premise of the Matrix was ridiculous :)
      Some quick back of the envelope says that it would take around 35 MWh to get to 40 years old (2000 kcal per day)
    - rlt 3 months ago
      OpenAI doesn’t have the pre-existing business, relationships, domain knowledge, etc to just throw AGI at every possible use case. They will sell AGI for some fraction of what an equivalent human behind a computer screen would cost.
      “AGI” is also an under-specified term. It will start (maybe is already there) equivalent to, say, a human in an overseas call center, but over time improve to the equivalent of a Fortune 500 CEO or Nobel prize winner.
      “ASI”, on the other hand, will just recreate entire businesses from scratch.
    - jasfi 3 months ago
      There's could be something to what you wrote. If AGI were to be achieved by a model, why would they give access to it via an API? Why not just sell what it can do? E.g. business services. That would be far more of a moat.
- foobiekr 3 months ago
  Can you describe this "find a bug" workflow?
- hooloovoo_zoo 3 months ago
  Is your prompt {$codebase} find bugs?
  - davidbarker 3 months ago
    Typically something like:
```
  Look carefully through my codebase and identify any bugs/issues, or refactors that could improve it.

  <codebase>
  …
  </codebase>
```
    Doesn't have to be anything overly complicated to get good results. It also does well if you give it a git diff.
    - ionwake 3 months ago
      Sorry if this is a noob question, but are you just pasting file strings inbetween those tags? like the contents of file1.js and file2.js?
    - crossroadsguy 3 months ago
      Do you get this -
      When you say: But is that really a bug?
      GPT: That's right. Now that I see it again this is not a bug….and a lot of blah blah.
simonw 3 months ago
This is their first model to only be available via the new Responses API - if you have code that uses Chat Completions you'll need to upgrade to Responses in order to support this.
Could take me a while to add support for it to my LLM tool: https://github.com/simonw/llm/issues/839
- icelancer 3 months ago
  Oh interesting. I thought they were going to have forward compatibility with Completions. Apparently not.
  - dtagames 3 months ago
    It does. There are two endpoints. Eventually, all new models will only be in the new endpoint. The data interfaces are compatible.
- 3 months ago
- dtagames 3 months ago
  It shouldn't be too bad. The responses API accepts the same basic interface as the chat completion one.
  - simonw 3 months ago
    The harder bit is the streaming response format - that's changed a bunch, and my tool supports both streaming and non-streaming for both Python sync and async IO - so there are four different cases I need to consider.
  - Tiberium 3 months ago
    Even the basic interface is different, actually - "input" vs "messages", no "max_completion_tokens" nor "max_tokens". That said, changing those things is quite easy.
  - dtagames 3 months ago
    If it's easier, just ask Cursor to make the upgrade. Give it a link to the OpenAI doc. You might be surprised at how easy it is.
- luke-stanley 3 months ago
  Simon, I see if via Chat Completions as well as Responses in their API platform playground.
  - simonw 3 months ago
    I just tried sending an o1-pro prompt to the chat completions API in the playground and got:
```
  This is not a chat model and thus not supported in the
  v1/chat/completions endpoint. Did you mean to use v1/completions?
```
    - luke-stanley 3 months ago
      Sorry, since the Platform UI featured it as an option, I figured OpenAI might enable o1-Pro via the chat completions endpoint, I just got around to testing it, and I also get the same 404 `invalid_request_error` error via the platform UI and API. It's such an odd and old 404 message, to suggest using the old completions API! It's hard to believe it could be an intentional design decision. Maybe they see it as an important feature to avoid wasting (and refunding) o1-pro credit. I noticed that their platform's dashboard queries https://api.openai.com/dashboard/which lists a supported_methods property of models. I can't see anything similar in the huge https://raw.githubusercontent.com/openai/openai-openapi/refs... schema yet (commit ec54f88 right now), and it lacks mention of o1-pro at all. Like the whole developer messages thing, the UX of the API seems like such an after-thought.
simonw 3 months ago
It cost me 94 cents to render a pelican riding a bicycle SVG with this one!
Notes and SVG output here: https://simonwillison.net/2025/Mar/19/o1-pro/
- mateus1 3 months ago
  I’m no expert but that does not look like a 94c pelican to me.
  - deciduously 3 months ago
    Better than my svg pelican would be, but it's a low bar.
- jascination 3 months ago
  Your collection of pelicans is so bloody funny, genuinely brightened my day.
  I don't know what I was expecting when I clicked the link but it definitely wasn't this: https://simonwillison.net/tags/pelican-riding-a-bicycle/
- qingcharles 3 months ago
  Whenever you experience a new pelican I always have to check it against your past pelicans to see progress towards the Artificial Super Pelican Singularity:
  https://simonwillison.net/tags/pelican-riding-a-bicycle/
- orzig 3 months ago
  At this point you’d come out ahead just buying a pelican. Even before the tax benefits.
- prawn 3 months ago
  I have been using ChatGPT to generate 3d models by pasting output into OpenSCAD. Often feels like coaching someone wearing a blindfold, but it can sometimes kick things forward quickly for low effort.
serjester 3 months ago
Assuming a highly motivated office worker spends 6 hours per day listening or speaking, at a salary of $160k per year, that works out to a cost of ≈$10k per 1M tokens.
OpenAI is now within an order of magnitude of a highly skilled humans with their frontier model pricing. o3 pro may change this but at the same time I don’t think they would have shipped this if o3 was right around the corner.
- danpalmer 3 months ago
  If you start paying someone and give them some onboarding docs, to a first approximation they'll start doing the job and you'll get value.
  If you attach a credit card to o3 and give it some onboarding docs, it'll give you a nice summary of your onboarding docs that you didn't need.
  We're a long way from a model doing arbitrary roles. Currently at the very minimum, you need a competent office worker to run the model, filter its output through their judgement, and act on it.
  - lherron 3 months ago
    More like: every time you tell o3 to do something, it will first reread the onboarding docs (and charge you for doing so) before it does anything else.
  - levocardia 3 months ago
    Right, value per token is much more important (but harder to quantify). A medical AI that could provide a one-paragraph diagnosis and treatment plan for rare / untreatable diseases could be generating thousands of dollars of value per token. Meanwhile, Claude has probably racked up millions of tokens wandering around Mt. Moon aimlessly.
    - elicksaur 3 months ago
      “Untreatable” disease.
      Yet somehow the AI knows a treatment?
  - serjester 3 months ago
    I think that’s the remarkable thing - even with all of its flaws and its insane pricing, there’s plenty of people that will pay for it (myself included).
    LLM’s are good at a class of tasks that humans aren’t.
- dragonwriter 3 months ago
  > Assuming a highly motivated office worker spends 6 hours per day listening or speaking, at a salary of $160k per year, that works out to a cost of ≈$10k per 1M tokens.
  I guess...if by office worker you mean a manager that does nothing but attend meetings and otherwise talk to people. For other workers you probably want to count the token equivalent of their actual work output and not just the chatting.
  - ben_w 3 months ago
    I suspect inner monologue is the useful metric for token count. I don't know if (any, let alone most or all) human brains think in token-like chunks, but if we do, and that's at 180/minute, thats 180x60x5x48 (working weeks/year) = 20,736,000 tokens/year. At that rate, $160k/year would be ~$7700/million tokens.
    My guess is that this is better than a human who would cost $16k/year to hire. But with the logarithmic improvements in quality for linear price increases, I'm not sure it would be good enough to replace a $160k/year worker.
    - ben_w 3 months ago
      Just noticed I missed the 8x in the LHS, but the total is correct:
      > 180x60x5x48 (working weeks/year) = 20,736,000 tokens/year
- nebula8804 3 months ago
  How do you reconcile issues such as the o1 pro model erroring out every 3rd attempt at an extremely large context? (that still fits but is near the limit)
  Every time I try to get this thing to read my codebase and onboarding docs (about 40k line angular codebase) it is "pull your hair out" failing leading to frustration.
danpalmer 3 months ago
It has a 2023 knowledge cut-off, and 200k context window... ? That's pretty underwhelming.
- gkoberger 3 months ago
  On the flip side, the cutoff date probably makes it a lot more upbeat.
  - throw310822 3 months ago
    Don't know if it's me, but this is really funny.
- bearjaws 3 months ago
  For a second I was like "2023 isn't that bad"... and then I realized we're well into 2025...
EcommerceFlow 3 months ago
o1-pro still holds up to every other release, including Grok 3 think and Claude 3.7 think (haven't tried Max out though), and that's over 3 months ago, practically an eternity in Ai time.
Ironic since I was getting ready to cancel my Pro subscription, but 4.5 is too nice for non-coding/math tasks.
God I can't wait for o3 pro.
- Tiberium 3 months ago
  "Max" as in "Claude 3.7 Sonnet MAX" is apparently Cursor-specific marketing - by default they don't use all the context of the model and set the thinking budget to a lower value than the maximum allowed. So essentially it's the exact same 3.7 Sonnet model, just with different settings.
- sheepscreek 3 months ago
  4.5 works on Plus! I know. I was surprised too.
jwpapi 3 months ago
Those that have tested it and liked it. I feel very confident with Sonnet 3.7 right now,if I would wish for something its it to be faster. Most of the problems I’m facing are like execution problems I just want AI to do it faster than me coding everything on my own.
To me it seems like o1-pro would be to be used as a switch-in tool or to double-check your codebase, than a constant coding assistant? (Even with lower price), as I assume I would need to get done a tremendous amount of work including domain knowledge done to come up for the 10x more speed (estimated) of Sonnet?
- CamperBob2 3 months ago
  o1-pro can be very useful but it's ridiculously slow. If you find yourself wishing Sonnet 3.7 was faster, you really won't like o1-pro.
  I pay for it and will probably keep doing so, but I find that I use it only as a last resort.
WiSaGaN 3 months ago
I have always suspected that the o1-Pro is some kind of workflow on the o1 model. Is it possible that it dispatches to say 8 instances of o1 then do some type of aggregation over the results?
- 3 months ago
ein0p 3 months ago
Did not know it was that expensive to run. I'm going to use it more in my Pro subscription now. I frankly do not notice a huge difference between o1 Pro and o3-mini-high - both fail on the fairly straightforward practical problems I give them.
_pdp_ 3 months ago
At first I thought, great, we can add it now to our platform. Now that I have seen the price, I am hesitant enabling the model for the majority of users (except rich enterprises) as they will most certainly shoot themselves in the foot.
- danpalmer 3 months ago
  > they will most certainly shoot themselves in the foot
  ...and then ask you for a refund or service credit.
bakugo 3 months ago
> $150/Mtok input, $600/Mtok output
What use case could possibly justify this price?
- refulgentis 3 months ago
  It enables obscene unnatural things at a fraction of most SWE hourly rates. One win that jumps to mind was writing a complete implementation of a Windows PCM player, as a flutter plugin, with some unique design properties and emergent API behavior that it needed to replicate from existing iOS/Android code
  - zipy124 3 months ago
    Does it really? Your average software engineer is like £20-30 an hour, for the cost of 1m output tokens you can get a Dev for a full week.
    - sheepscreek 3 months ago
      The math doesn’t check out. A day maybe. Also it’s not just about a placeholder dev. The person needs to know your use-case and have the tech chops to deliver successfully in that timeframe.
      Now to have that delivered to you in less than an hour? That’s a huge win.
    - refulgentis 3 months ago
      Leaving the dissection of this to the separate reply, let's estimate cost:
      - 80 chars per line, 30 occupied (avg'd across 300 KLOC in codebase)
      - 500 lines of code
      - 15000 characters
      - 4 chars / token
      - 3750 tokens output
      - 10 full iterations, and don't apply cached token pricing that's 90% off
      - 37,500 tokens req'd in output
      - $600 / 1M tokens
      - $0.60 / 1K tokens
      - $18
    - intelVISA 3 months ago
      The avg. SWE is a toss up if they create more issues than they solve over time. Factor in on-boarding, bugs and taking time away from other expensive people becomes >$100/hr real quick.
  - risyachka 3 months ago
    More mediocre software is all the world needs.
    - refulgentis 3 months ago
      A tool is a tool. Your output is what you decide.
  - Snuggly73 3 months ago
    Probably not great (or even unnatural) example. There are tons of examples of PCM players as Flutter plugins on the net and Gemini from the free AI Studio spits an implementation out in about 20 seconds and 0$.
    YMMV
    - refulgentis 3 months ago
      No, you're wrong. I wish you weren't. I hate posting this stuff because at least a few people reply to the absolutely weakest version of what I actually said.
      Go check out flutter_pcm_sound_fork, find me even one package with the same streaming PCM => speakers functionality, and I'll give you $500. All I ask is, as a personal favor to me, you read the part in the Hacker News FAQ about "coming with curiosity"
  - wincy 3 months ago
    I used O1 Pro to write a .NET authorization filter which when I wrote it I didn’t even know what that was. I was like “I have this problem, how can I fix it” and it just started going and the solution worked the first try. Everyone at work was like “great job!” I guess I did feed it a bunch of surrounding code and the authorization policy, but the policy only allowed us to attach one security trait when we wanted it to be “attach any number of security attributes and verify the user has at least one”. Still, it solved what likely would have been at least a day or two of research in an hour or so conversation.
    - alphabettsy 3 months ago
      Is it secure?
- serjester 3 months ago
  Synthetic data generation. You can have a really powerful, expensive model create evals so you can tune a faster, cheaper system with similar performance.
  - jsheard 3 months ago
    You could do that, but OpenAI specifically doesn't want you to: https://openai.com/policies/row-terms-of-use/
    What you cannot do. You may not use our Services for any illegal, harmful, or abusive activity. For example, you may not: Use Output to develop models that compete with OpenAI.
    Presumably you run the risk of getting banned if they realize what you're doing.
    - andyferris 3 months ago
      > You may not use our Services for any illegal, harmful, or abusive activity. For example, you may not: Use Output to develop models that compete with OpenAI.
      This reads as if they consider developing models that compete with OpenAI as illegal, harmful or abusive. Which is crazy. (The other dot points in their list in the linked terms seem better).
    - echelon 3 months ago
      Screw their TOS.
      OpenAI trained on the world's data. Data they didn't license.
      Anyone should be able to "rip them off" and copy their capabilities on the cheap.
    - levocardia 3 months ago
      I wonder if some of the high pricing is specifically an attempt to ward off this sort of "slow distillation" of a powerful model
    - littlestymaar 3 months ago
      I suspect they have no way to enforce that without risking false positive hurting their rich customers (and their business).
    - SJC_Hacker 3 months ago
      If it was possible:
      1) Why wasn't OpenAI doing it themselves?
      2) This means we've reached technological singularity if AI models can improve themselves (as in getting a smarter model, not just compressing existing ones like Deepseek)
    - serjester 3 months ago
      Synthetic data is just as useful for building app layers evals. Probably significantly cheaper ways to get the data if you’re training your own model.
    - kelseyfrog 3 months ago
      I compete with AI, not my models.
- icelancer 3 months ago
  Full file refactoring. But I just use the webUI for this and will continue to at these prices... probably.
irthomasthomas 3 months ago
o1-pro doesn't support streaming, so it's reasonable to assume that they doing some kind of best-of-n type technique to search over multiple answers.
I think you can probably get similar results for a much lower price using llm-consortium. This lets you prompt as many models as you can afford and then chooses or synthesises the best response from all of them. And it can loop until a confidence threshold is reached.
NoahZuniga 3 months ago
Seems underwhelming when openai's best model, o3, was demoed almost 4 months ago.
3 months ago
katherineingram 3 months ago
[dead]
liu9950 3 months ago
[dead]
ilrwbwrkhv 3 months ago
Deepseek r1 is much better than this.
- nsoonhui 3 months ago
  Interesting take, care to explain more exactly how it is much better?
  - flippyhead 3 months ago
    It's exactly "much" better!