OpenAI's o1-pro now available via API
131 points by davidbarker 3 months ago | 129 comments- davidbarker 3 months agoPricing: $150 / 1M input tokens, $600 / 1M output tokens. (Not a typo.)
Very expensive, but I've been using it with my ChatGPT Pro subscription and it's remarkably capable. I'll give it 100,000 token codebases and it'll find nuanced bugs I completely overlooked.
(Now I almost feel bad considering the API price vs. the price I pay for the subscription.)
- ldjkfkdsjnv 3 months agoAs far as I'm concerned, all of the other models are a waste of time to use in comparison. Most people dont know how good this model is
- dinobones 3 months agoInteresting... Most benchmarks show this model as being worse than o3-mini-high and sonnet3.7.
What difference are you seeing from these models that makes it better?
I say this as someone considering shelling out $200 for ChatGPT pro for this.
- jbellis 3 months agoIf you're in the habit of breaking down problems to Sonnet-sized pieces you won't see a benefit. The win is that o1pro lets you stop breaking down one level up from what you're used to.
It may also have a larger usable context window, not totally sure about that.
- Tiberium 3 months agoThere actually were almost no benchmarks for o1 pro before because it wasn't on the API. o1 pro is a different model from o1 (yes, even o1 with high reasoning).
- ldjkfkdsjnv 3 months agoI regularly push 100k+ tokens into it. So most of my code base/large portions. I use the Repo Prompt product to construct the code prompts. It finds bugs and solutions at a rate that is far better than others. I also speak into the prompt to describe my problem, and find spoken language is interpreted very well.
I also frequently download all the source code of libraries I am debugging, and when running into issues, pass that code in along with my own broken code. Its very good
- jbellis 3 months ago
- Hugsun 3 months agoHow long is it's thinking time when compared to o1?
The naming would suggest that o1-pro is just o1 with more time to reason. The API pricing makes that less obvious. Are they charging for the thinking tokens? If so, why is it so much more expensive if there are just more thinking tokens anyways?
- Tiberium 3 months agoI think o1 pro runs multiple instances of o1 in parallel and selects the best answer, or something of the sort. And you do actually always pay for thinking models with all providers, OpenAI included. It's especially interesting if you remember the fact that OpenAI hides the CoT from you, so you're in fact getting billed for "thinking" that you can't even read yourself.
- ldjkfkdsjnv 3 months agoI dont have the answers for you, I just know that if they charged 400$ a month I would pay it. It seems like a different model to me. I never use o3-mini or o3-mini-high. Just gpt4o or o1 pro
- Tiberium 3 months ago
- dinobones 3 months ago
- jbellis 3 months agoRemarkably capable is a good description.
Shameless plug: One of the reasons I wrote my AI coding assistant is to make it easier to get problems into o1pro. https://github.com/jbellis/brokk
- andrewinardeer 3 months agoI wonder what the input/output tokens will be priced at for AGI.
- stavros 3 months agoThey won't. Your use cases won't be something the AI can't do itself, so why would they sell it to you instead of replace you with it?
AGI means the value of a human is the same as an LLM, but the energy requirements of a human are higher than those of an LLM, so humans won't be economical any more.
- dnadler 3 months agoActually, I think humans require much less energy than LLMs. Even raising a human to adulthood would be cheaper from a calorie perspective than running an AGI algorithm (probably). Its the whole reason why the premise of the Matrix was ridiculous :)
Some quick back of the envelope says that it would take around 35 MWh to get to 40 years old (2000 kcal per day)
- rlt 3 months agoOpenAI doesn’t have the pre-existing business, relationships, domain knowledge, etc to just throw AGI at every possible use case. They will sell AGI for some fraction of what an equivalent human behind a computer screen would cost.
“AGI” is also an under-specified term. It will start (maybe is already there) equivalent to, say, a human in an overseas call center, but over time improve to the equivalent of a Fortune 500 CEO or Nobel prize winner.
“ASI”, on the other hand, will just recreate entire businesses from scratch.
- jasfi 3 months agoThere's could be something to what you wrote. If AGI were to be achieved by a model, why would they give access to it via an API? Why not just sell what it can do? E.g. business services. That would be far more of a moat.
- dnadler 3 months ago
- stavros 3 months ago
- foobiekr 3 months agoCan you describe this "find a bug" workflow?
- hooloovoo_zoo 3 months agoIs your prompt {$codebase} find bugs?
- davidbarker 3 months agoTypically something like:
Doesn't have to be anything overly complicated to get good results. It also does well if you give it a git diff.Look carefully through my codebase and identify any bugs/issues, or refactors that could improve it. <codebase> … </codebase>
- ionwake 3 months agoSorry if this is a noob question, but are you just pasting file strings inbetween those tags? like the contents of file1.js and file2.js?
- crossroadsguy 3 months agoDo you get this -
When you say: But is that really a bug?
GPT: That's right. Now that I see it again this is not a bug….and a lot of blah blah.
- ionwake 3 months ago
- davidbarker 3 months ago
- ldjkfkdsjnv 3 months ago
- simonw 3 months agoThis is their first model to only be available via the new Responses API - if you have code that uses Chat Completions you'll need to upgrade to Responses in order to support this.
Could take me a while to add support for it to my LLM tool: https://github.com/simonw/llm/issues/839
- icelancer 3 months agoOh interesting. I thought they were going to have forward compatibility with Completions. Apparently not.
- dtagames 3 months agoIt does. There are two endpoints. Eventually, all new models will only be in the new endpoint. The data interfaces are compatible.
- dtagames 3 months ago
- 3 months ago
- dtagames 3 months agoIt shouldn't be too bad. The responses API accepts the same basic interface as the chat completion one.
- simonw 3 months agoThe harder bit is the streaming response format - that's changed a bunch, and my tool supports both streaming and non-streaming for both Python sync and async IO - so there are four different cases I need to consider.
- Tiberium 3 months agoEven the basic interface is different, actually - "input" vs "messages", no "max_completion_tokens" nor "max_tokens". That said, changing those things is quite easy.
- dtagames 3 months agoIf it's easier, just ask Cursor to make the upgrade. Give it a link to the OpenAI doc. You might be surprised at how easy it is.
- simonw 3 months ago
- luke-stanley 3 months agoSimon, I see if via Chat Completions as well as Responses in their API platform playground.
- simonw 3 months agoI just tried sending an o1-pro prompt to the chat completions API in the playground and got:
This is not a chat model and thus not supported in the v1/chat/completions endpoint. Did you mean to use v1/completions?
- luke-stanley 3 months agoSorry, since the Platform UI featured it as an option, I figured OpenAI might enable o1-Pro via the chat completions endpoint, I just got around to testing it, and I also get the same 404 `invalid_request_error` error via the platform UI and API. It's such an odd and old 404 message, to suggest using the old completions API! It's hard to believe it could be an intentional design decision. Maybe they see it as an important feature to avoid wasting (and refunding) o1-pro credit. I noticed that their platform's dashboard queries https://api.openai.com/dashboard/which lists a supported_methods property of models. I can't see anything similar in the huge https://raw.githubusercontent.com/openai/openai-openapi/refs... schema yet (commit ec54f88 right now), and it lacks mention of o1-pro at all. Like the whole developer messages thing, the UX of the API seems like such an after-thought.
- luke-stanley 3 months ago
- simonw 3 months ago
- icelancer 3 months ago
- simonw 3 months agoIt cost me 94 cents to render a pelican riding a bicycle SVG with this one!
Notes and SVG output here: https://simonwillison.net/2025/Mar/19/o1-pro/
- mateus1 3 months agoI’m no expert but that does not look like a 94c pelican to me.
- deciduously 3 months agoBetter than my svg pelican would be, but it's a low bar.
- deciduously 3 months ago
- jascination 3 months agoYour collection of pelicans is so bloody funny, genuinely brightened my day.
I don't know what I was expecting when I clicked the link but it definitely wasn't this: https://simonwillison.net/tags/pelican-riding-a-bicycle/
- qingcharles 3 months agoWhenever you experience a new pelican I always have to check it against your past pelicans to see progress towards the Artificial Super Pelican Singularity:
- orzig 3 months agoAt this point you’d come out ahead just buying a pelican. Even before the tax benefits.
- prawn 3 months agoI have been using ChatGPT to generate 3d models by pasting output into OpenSCAD. Often feels like coaching someone wearing a blindfold, but it can sometimes kick things forward quickly for low effort.
- mateus1 3 months ago
- serjester 3 months agoAssuming a highly motivated office worker spends 6 hours per day listening or speaking, at a salary of $160k per year, that works out to a cost of ≈$10k per 1M tokens.
OpenAI is now within an order of magnitude of a highly skilled humans with their frontier model pricing. o3 pro may change this but at the same time I don’t think they would have shipped this if o3 was right around the corner.
- danpalmer 3 months agoIf you start paying someone and give them some onboarding docs, to a first approximation they'll start doing the job and you'll get value.
If you attach a credit card to o3 and give it some onboarding docs, it'll give you a nice summary of your onboarding docs that you didn't need.
We're a long way from a model doing arbitrary roles. Currently at the very minimum, you need a competent office worker to run the model, filter its output through their judgement, and act on it.
- lherron 3 months agoMore like: every time you tell o3 to do something, it will first reread the onboarding docs (and charge you for doing so) before it does anything else.
- levocardia 3 months agoRight, value per token is much more important (but harder to quantify). A medical AI that could provide a one-paragraph diagnosis and treatment plan for rare / untreatable diseases could be generating thousands of dollars of value per token. Meanwhile, Claude has probably racked up millions of tokens wandering around Mt. Moon aimlessly.
- elicksaur 3 months ago“Untreatable” disease.
Yet somehow the AI knows a treatment?
- elicksaur 3 months ago
- serjester 3 months agoI think that’s the remarkable thing - even with all of its flaws and its insane pricing, there’s plenty of people that will pay for it (myself included).
LLM’s are good at a class of tasks that humans aren’t.
- lherron 3 months ago
- dragonwriter 3 months ago> Assuming a highly motivated office worker spends 6 hours per day listening or speaking, at a salary of $160k per year, that works out to a cost of ≈$10k per 1M tokens.
I guess...if by office worker you mean a manager that does nothing but attend meetings and otherwise talk to people. For other workers you probably want to count the token equivalent of their actual work output and not just the chatting.
- ben_w 3 months agoI suspect inner monologue is the useful metric for token count. I don't know if (any, let alone most or all) human brains think in token-like chunks, but if we do, and that's at 180/minute, thats 180x60x5x48 (working weeks/year) = 20,736,000 tokens/year. At that rate, $160k/year would be ~$7700/million tokens.
My guess is that this is better than a human who would cost $16k/year to hire. But with the logarithmic improvements in quality for linear price increases, I'm not sure it would be good enough to replace a $160k/year worker.
- ben_w 3 months agoJust noticed I missed the 8x in the LHS, but the total is correct:
> 180x60x5x48 (working weeks/year) = 20,736,000 tokens/year
- ben_w 3 months ago
- ben_w 3 months ago
- nebula8804 3 months agoHow do you reconcile issues such as the o1 pro model erroring out every 3rd attempt at an extremely large context? (that still fits but is near the limit)
Every time I try to get this thing to read my codebase and onboarding docs (about 40k line angular codebase) it is "pull your hair out" failing leading to frustration.
- danpalmer 3 months ago
- danpalmer 3 months agoIt has a 2023 knowledge cut-off, and 200k context window... ? That's pretty underwhelming.
- gkoberger 3 months agoOn the flip side, the cutoff date probably makes it a lot more upbeat.
- throw310822 3 months agoDon't know if it's me, but this is really funny.
- throw310822 3 months ago
- bearjaws 3 months agoFor a second I was like "2023 isn't that bad"... and then I realized we're well into 2025...
- gkoberger 3 months ago
- EcommerceFlow 3 months agoo1-pro still holds up to every other release, including Grok 3 think and Claude 3.7 think (haven't tried Max out though), and that's over 3 months ago, practically an eternity in Ai time.
Ironic since I was getting ready to cancel my Pro subscription, but 4.5 is too nice for non-coding/math tasks.
God I can't wait for o3 pro.
- Tiberium 3 months ago"Max" as in "Claude 3.7 Sonnet MAX" is apparently Cursor-specific marketing - by default they don't use all the context of the model and set the thinking budget to a lower value than the maximum allowed. So essentially it's the exact same 3.7 Sonnet model, just with different settings.
- sheepscreek 3 months ago4.5 works on Plus! I know. I was surprised too.
- Tiberium 3 months ago
- jwpapi 3 months agoThose that have tested it and liked it. I feel very confident with Sonnet 3.7 right now,if I would wish for something its it to be faster. Most of the problems I’m facing are like execution problems I just want AI to do it faster than me coding everything on my own.
To me it seems like o1-pro would be to be used as a switch-in tool or to double-check your codebase, than a constant coding assistant? (Even with lower price), as I assume I would need to get done a tremendous amount of work including domain knowledge done to come up for the 10x more speed (estimated) of Sonnet?
- CamperBob2 3 months agoo1-pro can be very useful but it's ridiculously slow. If you find yourself wishing Sonnet 3.7 was faster, you really won't like o1-pro.
I pay for it and will probably keep doing so, but I find that I use it only as a last resort.
- CamperBob2 3 months ago
- WiSaGaN 3 months agoI have always suspected that the o1-Pro is some kind of workflow on the o1 model. Is it possible that it dispatches to say 8 instances of o1 then do some type of aggregation over the results?
- ein0p 3 months agoDid not know it was that expensive to run. I'm going to use it more in my Pro subscription now. I frankly do not notice a huge difference between o1 Pro and o3-mini-high - both fail on the fairly straightforward practical problems I give them.
- _pdp_ 3 months agoAt first I thought, great, we can add it now to our platform. Now that I have seen the price, I am hesitant enabling the model for the majority of users (except rich enterprises) as they will most certainly shoot themselves in the foot.
- danpalmer 3 months ago> they will most certainly shoot themselves in the foot
...and then ask you for a refund or service credit.
- danpalmer 3 months ago
- bakugo 3 months ago> $150/Mtok input, $600/Mtok output
What use case could possibly justify this price?
- refulgentis 3 months agoIt enables obscene unnatural things at a fraction of most SWE hourly rates. One win that jumps to mind was writing a complete implementation of a Windows PCM player, as a flutter plugin, with some unique design properties and emergent API behavior that it needed to replicate from existing iOS/Android code
- zipy124 3 months agoDoes it really? Your average software engineer is like £20-30 an hour, for the cost of 1m output tokens you can get a Dev for a full week.
- sheepscreek 3 months agoThe math doesn’t check out. A day maybe. Also it’s not just about a placeholder dev. The person needs to know your use-case and have the tech chops to deliver successfully in that timeframe.
Now to have that delivered to you in less than an hour? That’s a huge win.
- refulgentis 3 months agoLeaving the dissection of this to the separate reply, let's estimate cost:
- 80 chars per line, 30 occupied (avg'd across 300 KLOC in codebase)
- 500 lines of code
- 15000 characters
- 4 chars / token
- 3750 tokens output
- 10 full iterations, and don't apply cached token pricing that's 90% off
- 37,500 tokens req'd in output
- $600 / 1M tokens
- $0.60 / 1K tokens
- $18
- intelVISA 3 months agoThe avg. SWE is a toss up if they create more issues than they solve over time. Factor in on-boarding, bugs and taking time away from other expensive people becomes >$100/hr real quick.
- sheepscreek 3 months ago
- risyachka 3 months agoMore mediocre software is all the world needs.
- refulgentis 3 months agoA tool is a tool. Your output is what you decide.
- refulgentis 3 months ago
- Snuggly73 3 months agoProbably not great (or even unnatural) example. There are tons of examples of PCM players as Flutter plugins on the net and Gemini from the free AI Studio spits an implementation out in about 20 seconds and 0$.
YMMV
- refulgentis 3 months agoNo, you're wrong. I wish you weren't. I hate posting this stuff because at least a few people reply to the absolutely weakest version of what I actually said.
Go check out flutter_pcm_sound_fork, find me even one package with the same streaming PCM => speakers functionality, and I'll give you $500. All I ask is, as a personal favor to me, you read the part in the Hacker News FAQ about "coming with curiosity"
- refulgentis 3 months ago
- wincy 3 months agoI used O1 Pro to write a .NET authorization filter which when I wrote it I didn’t even know what that was. I was like “I have this problem, how can I fix it” and it just started going and the solution worked the first try. Everyone at work was like “great job!” I guess I did feed it a bunch of surrounding code and the authorization policy, but the policy only allowed us to attach one security trait when we wanted it to be “attach any number of security attributes and verify the user has at least one”. Still, it solved what likely would have been at least a day or two of research in an hour or so conversation.
- alphabettsy 3 months agoIs it secure?
- alphabettsy 3 months ago
- zipy124 3 months ago
- serjester 3 months agoSynthetic data generation. You can have a really powerful, expensive model create evals so you can tune a faster, cheaper system with similar performance.
- jsheard 3 months agoYou could do that, but OpenAI specifically doesn't want you to: https://openai.com/policies/row-terms-of-use/
What you cannot do. You may not use our Services for any illegal, harmful, or abusive activity. For example, you may not: Use Output to develop models that compete with OpenAI.
Presumably you run the risk of getting banned if they realize what you're doing.
- andyferris 3 months ago> You may not use our Services for any illegal, harmful, or abusive activity. For example, you may not: Use Output to develop models that compete with OpenAI.
This reads as if they consider developing models that compete with OpenAI as illegal, harmful or abusive. Which is crazy. (The other dot points in their list in the linked terms seem better).
- echelon 3 months agoScrew their TOS.
OpenAI trained on the world's data. Data they didn't license.
Anyone should be able to "rip them off" and copy their capabilities on the cheap.
- levocardia 3 months agoI wonder if some of the high pricing is specifically an attempt to ward off this sort of "slow distillation" of a powerful model
- littlestymaar 3 months agoI suspect they have no way to enforce that without risking false positive hurting their rich customers (and their business).
- SJC_Hacker 3 months agoIf it was possible:
1) Why wasn't OpenAI doing it themselves?
2) This means we've reached technological singularity if AI models can improve themselves (as in getting a smarter model, not just compressing existing ones like Deepseek)
- serjester 3 months agoSynthetic data is just as useful for building app layers evals. Probably significantly cheaper ways to get the data if you’re training your own model.
- kelseyfrog 3 months agoI compete with AI, not my models.
- andyferris 3 months ago
- jsheard 3 months ago
- icelancer 3 months agoFull file refactoring. But I just use the webUI for this and will continue to at these prices... probably.
- refulgentis 3 months ago
- irthomasthomas 3 months agoo1-pro doesn't support streaming, so it's reasonable to assume that they doing some kind of best-of-n type technique to search over multiple answers.
I think you can probably get similar results for a much lower price using llm-consortium. This lets you prompt as many models as you can afford and then chooses or synthesises the best response from all of them. And it can loop until a confidence threshold is reached.
- NoahZuniga 3 months agoSeems underwhelming when openai's best model, o3, was demoed almost 4 months ago.
- 3 months ago
- katherineingram 3 months ago[dead]
- liu9950 3 months ago[dead]
- ilrwbwrkhv 3 months agoDeepseek r1 is much better than this.
- nsoonhui 3 months agoInteresting take, care to explain more exactly how it is much better?
- flippyhead 3 months agoIt's exactly "much" better!
- flippyhead 3 months ago
- nsoonhui 3 months ago