Calculating OpenAI
and Azure OpenAI Service
Model Usage Costs
6 min read
Apr 22, 2024
ChatGPT
AI
Azure OpenAI Service is a powerful cloud service offered by Microsoft Azure, enabling developers to leverage OpenAI’s state-of-the-art language models. With a range of models available, including gpt-3.5-turbo, gpt-4, DALL-E, and ada, developers can unlock new possibilities in content generation, summarization, semantic search, and even natural language to code translation. In this blog post, we will explore the capabilities of Azure OpenAI Service and delve into its pricing structure.
Tokens
From OpenAI's help center:
Tokens can be thought of as pieces of words. Before the API processes the prompts, the input is broken down into tokens. These tokens are not cut up exactly where the words start or end — tokens can include trailing spaces and even sub-words. Here are some helpful rules of thumb for understanding tokens in terms of lengths:
-
1 token ~= 4 chars in English
-
1 token ~= ¾ words
-
100 tokens ~= 75 words
Or
-
1–2 sentence ~= 30 tokens
-
1 paragraph ~= 100 tokens
-
1,500 words ~= 2048 tokens
To get additional context on how tokens stack up, consider this:
-
Wayne Gretzky’s quote “You miss 100% of the shots you don’t take” contains 11 tokens.
-
OpenAI’s charter contains 476 tokens.
-
The transcript of the US Declaration of Independence contains 1,695 tokens.
How words are split into tokens is also language-dependent. For example, ‘Cómo estás’ (‘How are you’ in Spanish) contains 5 tokens (for 10 chars). The higher token-to-char ratio can make it more expensive to implement the API for languages other than English.
To further explore tokenization, you can use our interactive Tokenizer tool, which allows you to calculate the number of tokens and see how text is broken into tokens. Alternatively, if you’d like to tokenize text programmatically, use Tiktoken as a fast BPE tokenizer specifically used for OpenAI models. Other such libraries you can explore as well include transformers package for Python or the gpt-3-encoder package for node.js.
Depending on the model used, requests can use up to 4,097 tokens shared between prompt and completion. If your prompt is 4,000 tokens, your completion can be 97 tokens at most.
OpenAI and Azure OpenAI Service Model Parity
OpenAI and Azure OpenAI Service both offer access to the same set of state-of-the-art models, though there is a
notable delay in the availability of the latest OpenAI models on the Azure OpenAI Service.
Despite this, the pricing structure for both services remains consistent.
This means that while Azure OpenAI Service users might have to wait a bit longer to access the very
latest models that OpenAI offers, they can expect to pay the same rates as those using the OpenAI API directly.
Pricing Calculator is a tool to calculate your Azure OpenAI Service workloads independently or with
additional Azure services.
Actual Azure OpenAI Service pricing can be found
here, while
OpenAI API pricing can be found here.
Prices are changed regularly, so be sure to check the official pricing pages for the most up-to-date information.
GPT-4 and GPT-3.5 Models
Multiple models, each with different capabilities and price points.
Prices are per 1,000 tokens.
Input and output tokens are counted separately. Also, input and output token price can differ.
For example, if you send 1,000 tokens in prompt and 1,000 tokens in completion, you will be charged for 2,000 tokens.
Current pricing for the selected models and pricing for "1,000 tokens in prompt and 1,000 tokens in completion" scenario:
Model | Input / 1K tokens | Output / 1K tokens | Cost |
---|---|---|---|
gpt-4-1106-preview | $0.01 | $0.03 | $0.04 |
gpt-4-32k | $0.06 | $0.12 | $0.18 |
gpt-3.5-turbo-1106 | $0.001 | $0.002 | $0.003 |
gpt-3.5-turbo-instruct | $0.0015 | $0.0020 | $0.0035 |
E.g., for gpt-4-32k the calculation logic is the following:
(1,000 / 1,000 * $0.06) + (1,000 / 1,000 * $0.12) = $0.06 + $0.12 = $0.18.
As you can see from the example, using gpt-4-32k is the most expensive option while gpt-3.5-turbo-instruct
is the most cost-saving one. You need to select the model that fits your needs and budget.
Assistants API
Assistants API and tools (retrieval, code interpreter) make it easy for developers to build AI assistants within their own applications. Each assistant incurs its own retrieval file storage fee based on the files passed to that assistant. The retrieval tool chunks and indexes your files content in our vector database.
The tokens used for the Assistant API are billed at the chosen language model's per-token input /
output rates and the assistant intelligently chooses which context from the thread to include when calling the model.
Tool | Input |
---|---|
Code interpreter | $0.03 / session |
Retrival | $0.20 / GB / assistant |
Assistant API is missing from Azure OpenAI Service.
Fine-tuned models
Only GPT-3 models (ada, curie, davinci, babbage) are available for fine-tuning (they are called “base” models).
Azure OpenAI fine-tuned models are charged based on three factors:
- – training hours
- – hosting hours
- – inference per 1,000 tokens
The hosting hours cost is important to be aware of since once a fine-tuned model is deployed it continues to incur an hourly cost regardless of whether you’re actively using it. Fine-tuned model costs should be monitored closely.
Fine-tuning pricing in Azure OpenAI Service is the following:
Model | Training / h | Hosting / h | Input / 1K tokens | Output / 1K tokens |
---|---|---|---|---|
Babbage-002 | $34 | $1.70 | $0.0004 | $0.0004 |
Davinci-002 | $68 | $3 | $0.0020 | $0.0020 |
GPT-3.5-Turbo | $102 | $7 | $0.0015 | $0.0020 |
GPT-4 | Waiting | Waiting | Waiting | Waiting |
Image Models
Azure OpenAI Service also includes image models, with pricing based on the number of images processed.
The standard image model, DALL·E 3, is priced as $2 per 100 images.
Embedding Model
In addition to language and image models, Azure OpenAI Service offers embedding model. The pricing for the standard embedding model, Ada, is $0.0001 per 1,000 tokens.
Speech Models
Whisper can transcribe speech into text and translate many languages into English.
Text-to-speech (TTS) can convert text into spoken audio.
Whisper costs $0.006 / minute (rounded to the nearest second), TTS and TTS HD are not available yet in Azure OpenAI Service.
How to Start Using OpenAI Models For Free?
OpenAI offers $5 in free credit that can be used during your first 3 months.
Microsoft provides new Azure users with a complimentary account, including a $200 credit valid for
the first 30 days. This offering is ideal for those exploring
Azure's capabilities without initial investment.
Additionally,
eligible nonprofits benefit significantly with an annual $2000 Azure credit,
supporting their technological growth and innovation. For startups, Microsoft extends a helping
hand through the Microsoft for Startups program, offering valuable Azure credits to fuel early-stage growth.
However, it's important to note that Microsoft may restrict the availability of certain models
within your subscription. Specifically, for accessing GPT-4 models, a separate application is
required, which can be done through this link.
Azure OpenAI Service runs on Azure infrastructure that accrues costs when you deploy new resources.
It’s important to understand that there could be other additional infrastructure costs that might accrue.
Keep in mind that enabling capabilities like sending data to Azure Monitor Logs, alerting, etc.
incurs additional costs for those services. These costs are visible under
those other services and at the subscription level, but aren’t visible when scoped just to your Azure OpenAI resource.