![](/leestott/ollamazure/raw/main/docs/images/ollamazure-logo.png)
⭐ If you like this tool, star it on GitHub — it helps a lot!
Overview • Usage • Azure OpenAI compatibility • Sample code
![ollamazure architecture](/leestott/ollamazure/raw/main/docs/images/architecture.drawio.png)
ollamazure is a local server that emulates Azure OpenAI API on your local machine using Ollama and open-source models.
Using this tool, you can run your own local server that emulates the Azure OpenAI API, allowing you to test your code locally without incurring costs or being rate-limited. This is especially useful for development and testing purposes, or when you need to work offline.
By default, phi3
is used as the model for completions, and all-minilm:l6-v2
for embeddings. You can change these models using the configuration options.
Note
This tool use different models than Azure OpenAI, so you should expect differences in results accuracy and performance. However, the API is compatible so you can use the same code to interact with it.
You need Node.js v20+ and Ollama installed on your machine to use this tool.
You can start the emulator directly using npx
without installing it:
npx ollamazure
Once the server is started, leave it open in a terminal window and you can use the Azure OpenAI API to interact with it. You can find sample code for different languages and frameworks in the sample code section.
For example, if you have an existing project that uses the Azure OpenAI SDK, you can point it to your local server by setting the AZURE_OPENAI_ENDPOINT
environment variable to http://localhost:4041
without changing the rest of your code.
npm install -g ollamazure
Once the installation is completed, start the emulator by running the following command in a terminal:
ollamazure
# or use the shorter alias `oaz`
ollamazure --help
Usage: ollamazure [options]
Emulates Azure OpenAI API on your local machine using Ollama and open-source models.
Options:
--verbose show detailed logs
-y, --yes do not ask for confirmation (default: false)
-m, --model <name> model to use for chat and text completions (default: "phi3")
-e, --embeddings <name> model to use for embeddings (default: "all-minilm:l6-v2")
-d, --use-deployment use deployment name as model name (default: false)
-h, --host <ip> host to bind to (default: "localhost")
-p, --port <number> port to use (default: 4041)
-o, --ollama-url <number> ollama base url (default: "http://localhost:11434")
-v, --version show the current version
--help display help for command
Feature | Supported / with streaming |
---|---|
Completions | ✅ / ✅ |
Chat completions | ✅ / ✅ |
Embeddings | ✅ / - |
JSON mode | ✅ / ✅ |
Function calling | ⛔ / ⛔ |
Reproducible outputs | ✅ / ✅ |
Vision | ⛔ / ⛔ |
Assistants | ⛔ / ⛔ |
Unimplemented features are currently not supported by Ollama, but are being worked on and may be added in the future.
See all code examples in the samples folder.
Azure OpenAI SDK
import { AzureOpenAI } from 'openai';
const openai = new AzureOpenAI({
// This is where you point to your local server
endpoint: 'http://localhost:4041',
// Parameters below must be provided but are not used by the local server
apiKey: '123456',
apiVersion: '2024-02-01',
deployment: 'gpt-4',
});
const chatCompletion = await openai.chat.completions.create({
messages: [{ role: 'user', content: 'Say hello!' }],
});
console.log('Chat completion: ' + chatCompletion.choices[0]!.message?.content);
Alternatively, you can set the AZURE_OPENAI_ENDPOINT
environment variable to http://localhost:4041
instead of passing it to the constructor. Everything else will work the same.
If you're using managed identity, this will work as well unless you're in a local container. In that case, you can use a dummy function () => '1'
for the the azureADTokenProvider
parameter in the constructor.
LangChain.js
import { AzureChatOpenAI } from '@langchain/openai';
// Chat completion
const model = new AzureChatOpenAI({
// This is where you point to your local server
azureOpenAIBasePath: 'http://localhost:4041/openai/deployments',
// Parameters below must be provided but are not used by the local server
azureOpenAIApiKey: '123456',
azureOpenAIApiVersion: '2024-02-01',
azureOpenAIApiDeploymentName: 'gpt-4'
});
const completion = await model.invoke([{ type: 'human', content: 'Say hello!' }]);
console.log(completion.content);
Alternatively, you can set the AZURE_OPENAI_BASE_PATH
environment variable to http://localhost:4041/openai/deployments
instead of passing it to the constructor. Everything else will work the same.
If you're using managed identity this will work the same unless you're in a local container. In that case, you can use a dummy function () => '1'
for the the azureADTokenProvider
parameter in the constructor.
LlamaIndex.TS
import { OpenAI } from "llamaindex";
// Chat completion
const llm = new OpenAI({
azure: {
// This is where you point to your local server
endpoint: 'http://localhost:4041',
// Parameters below must be provided but are not used by the local server
apiKey: '123456',
apiVersion: '2024-02-01',
deployment: 'gpt-4'
}
});
const chatCompletion = await llm.chat({
messages: [{ role: 'user', content: 'Say hello!' }]
});
console.log(chatCompletion.message.content);
Alternatively, you can set the AZURE_OPENAI_ENDPOINT
environment variable to http://localhost:4041
instead of passing it to the constructor. Everything else will work the same.
If you're using managed identity, this will work as well unless you're in a local container. In that case, you can use a dummy function () => '1'
for the the azureADTokenProvider
parameter in the constructor.
Azure OpenAI SDK
from openai import AzureOpenAI
openai = AzureOpenAI(
# This is where you point to your local server
azure_endpoint="http://localhost:4041",
# Parameters below must be provided but are not used by the local server
# api_key="123456",
api_version="2024-02-01"
)
# Chat completion
chat_completion = openai.chat.completions.create(
# Model must be provided but is not used by the local server
model="gpt-4",
messages=[
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Say hello!"}
]
)
print(chat_completion.choices[0].message.content)
Alternatively, you can set the AZURE_OPENAI_ENDPOINT
environment variable to http://localhost:4041
instead of passing it to the constructor. Everything else will work the same.
If you're using managed identity, this will work as well unless you're in a local container. In that case, you can use a dummy function lambda:"1"
for the the azure_ad_token_provider
parameter in the constructor.
LangChain
from langchain_openai import AzureChatOpenAI
# Chat completion
model = AzureChatOpenAI(
# This is where you point to your local server
azure_endpoint="http://localhost:4041",
# Parameters below must be provided but are not used by the local server
api_key="123456",
api_version="2024-02-01",
azure_deployment="gpt-4"
)
chat_completion = model.invoke([{"type": "human", "content": "Say hello!"}])
print(chat_completion.content)
Alternatively, you can set the AZURE_OPENAI_ENDPOINT
environment variable to http://localhost:4041
instead of passing it to the constructor. Everything else will work the same.
If you're using managed identity, this will work as well unless you're in a local container. In that case, you can use a dummy function lambda:"1"
for the the azure_ad_token_provider
parameter in the constructor.
LlamaIndex
from llama_index.core.llms import ChatMessage
from llama_index.llms.azure_openai import AzureOpenAI
# Chat completion
llm = AzureOpenAI(
# This is where you point to your local server
azure_endpoint="http://localhost:4041",
# Parameters below must be provided but are not used by the local server
api_key="123456",
api_version="2024-02-01",
engine="gpt-4"
)
chat_completion = llm.chat([ChatMessage(role="user", content="Say hello!")])
print(chat_completion.message.content)
Alternatively, you can set the AZURE_OPENAI_ENDPOINT
environment variable to http://localhost:4041
instead of passing it to the constructor. Everything else will work the same.
If you're using managed identity, this will work as well unless you're in a local container. In that case, you can use a dummy function lambda:"1"
for the the azure_ad_token_provider
parameter in the constructor.
Azure OpenAI SDK
using Azure;
using Azure.AI.OpenAI;
using OpenAI.Chat;
// Chat completion
AzureOpenAIClient azureClient = new(
new Uri("http://localhost:4041"),
// Must be provided but are not used by the local server
new AzureKeyCredential("123456"));
ChatClient chatClient = azureClient.GetChatClient("gpt-4");
ChatCompletion completion = chatClient.CompleteChat([new UserChatMessage("Say hello!")]);
Console.WriteLine(completion.Content[0].Text);
Semantic Kernel
using Microsoft.SemanticKernel;
using Microsoft.SemanticKernel.Connectors.OpenAI;
var builder = Kernel.CreateBuilder();
// Chat completion
builder.AddAzureOpenAIChatCompletion(
"gpt-4", // Deployment Name (not used by the local server)
"http://localhost:4041", // Azure OpenAI Endpoint
"123456"); // Azure OpenAI Key (not used by the local server)
var kernel = builder.Build();
var chatFunction = kernel.CreateFunctionFromPrompt(@"{{$input}}");
var chatCompletion = await kernel.InvokeAsync(chatFunction, new() { ["input"] = "Say hello!" });
Console.WriteLine(chatCompletion);