NLC AI API Reference

Full API documentation with code examples. NLC ships four protocols: OpenAI Chat, OpenAI Completions, OpenAI Responses, and Anthropic Messages — all under our brand, all billed in credits.

Start here · Cursor / VS Code / OpenAI SDK

Base URL	`https://oddsforge.org/v1`
Endpoint	`/chat/completions` (or `/messages` for Anthropic SDK)
Auth header	`Authorization: Bearer nlk_…` (from Dashboard)
Recommended model	`nlc-shiplow` · budget multi-agent · live default
Other models	`nlc-fast` · `nlc-vision` · `nlc-pro` · `nlc-ultra`

Cursor setup: Settings → Models → Override OpenAI Base URL → paste the Base URL above → paste your nlk_ key → Verify → add custom model id nlc-shiplow. See full guide.

Which endpoint does Cursor call? Cursor uses /chat/completions automatically — you don't pick the path, just the Base URL. If you switch the provider to Anthropic in Cursor's settings, it calls /messages instead. Both work with every NLC model.

Endpoints at a glance

Endpoint	Method	Protocol	Use
`/v1/chat/completions`	POST	OpenAI	Chat (streaming + tools)
`/v1/completions`	POST	OpenAI	Raw text completion (custom prompt templates)
`/v1/responses`	POST	OpenAI Responses	Stateful conversations + tool use (previous_response_id)
`/v1/messages`	POST	Anthropic	Anthropic SDK compatible
`/v1/embeddings`	POST	OpenAI	Embeddings for RAG / semantic search
`/v1/rerank`	POST	NLC	Document reranking for second-stage RAG
`/v1/models`	GET	OpenAI	List chat models (brand ids only)
`/v1/available-models`	GET	NLC	Full catalog with capabilities + pricing
`/v1/images/generations`	POST	OpenAI	Image generation (when configured)

Which endpoint works with which model?

Not every model works on every endpoint. Here's the full compatibility matrix — and the difference between each protocol so you know which one to pick.

Endpoint	PRO	Vision	Fast	ULTRA	Embed	Rerank	Best for
`/v1/chat/completions`	✓	✓	✓	✓	—	—	Chat, code, vision, multi-agent — the main endpoint
`/v1/completions`	✓	✓	✓	✓	—	—	Raw text generation, custom prompt templates, base models
`/v1/responses`	✓	✓	✓	✓	—	—	Stateful conversations (previous_response_id), MCP tools
`/v1/messages`	✓	✓	✓	✓	—	—	Anthropic SDK (Claude Code, Anthropic Python/JS SDK)
`/v1/embeddings`	—	—	—	—	✓	—	Vector embeddings for RAG / semantic search
`/v1/rerank`	—	—	—	—	—	✓	Rerank documents for second-stage RAG retrieval

What's the difference between the protocols?

OpenAI Chat Completions — /v1/chat/completions

The standard endpoint for chat. Send messages (system + user + assistant), get a response. Supports streaming, tool calling, vision (image_url), and structured output (JSON schema). Use this with Cursor, VS Code (Continue, Copilot, Cline), any OpenAI SDK. Works with all 5 chat models (ShipLow, Fast, Vision, PRO, ULTRA MAX).

OpenAI Completions — /v1/completions

Raw text completion — you provide a prompt string (not messages), and the model continues it. Use this when you need custom prompt formatting (few-shot, base models, legacy apps). Same models as Chat Completions.

OpenAI Responses — /v1/responses

Stateful conversations — the server stores your conversation, so you can continue with previous_response_id without resending the full history. Supports MCP/SSE server tools and client function tools. Use this for multi-turn apps where you don't want to manage message history yourself. Same models as Chat Completions.

Anthropic Messages — /v1/messages

For Claude Code, Anthropic Python/JS SDK. Same models, same billing, same NLC key — just a different protocol shape. Base URL is https://oddsforge.org/inference (the SDK appends /v1/messages). Same models as Chat Completions.

Embeddings — /v1/embeddings

Convert text to vectors for RAG / semantic search. Only works with NLC Embed 1.0. Supports variable-length output via the dimensions parameter.

Rerank — /v1/rerank

Rerank a list of documents against a query — second-stage RAG retrieval. Only works with NLC Rerank 1.0. Send a query + array of documents, get back relevance scores.

Authentication

Get your API key from the Dashboard. Pass it as a Bearer token on every request:

Authorization: Bearer nlk_xxxxxxxxxxxxxxxx

For the Anthropic SDK you can also use the x-api-key header (Anthropic convention) — both work.

OpenAI Chat Completions

The main endpoint for chat. Supports streaming, tool calling, structured output, and vision (image_url content parts).

POST https://oddsforge.org/v1/chat/completions

{
  "model": "nlc-shiplow",
  "messages": [
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": "Explain quantum computing in simple terms."}
  ],
  "stream": true,
  "max_tokens": 4096
}

Python

from openai import OpenAI

client = OpenAI(base_url="https://oddsforge.org/v1", api_key="nlk_your_key_here")

response = client.chat.completions.create(
    model="nlc-shiplow",   # budget default · vision + code + multilingual
    messages=[{"role": "user", "content": "Hello!"}]
)
print(response.choices[0].message.content)

JavaScript

import OpenAI from 'openai';
const client = new OpenAI({ baseURL: 'https://oddsforge.org/v1', apiKey: 'nlk_your_key_here' });

const response = await client.chat.completions.create({
    model: 'nlc-shiplow',   // budget default · vision + code + multilingual
    messages: [{ role: 'user', content: 'Hello!' }]
});
console.log(response.choices[0].message.content);

Vision (image input)

Use nlc-vision with image_url content parts (URL or base64 data URI).

response = client.chat.completions.create(
    model="nlc-vision",
    messages=[{
        "role": "user",
        "content": [
            {"type": "text", "text": "What's in this image?"},
            {"type": "image_url", "image_url": {"url": "https://example.com/cat.jpg"}}
        ]
    }]
)

Tool calling (function calling)

response = client.chat.completions.create(
    model="nlc-pro",
    messages=[{"role": "user", "content": "What's the weather in Tokyo?"}],
    tools=[{
        "type": "function",
        "function": {
            "name": "get_weather",
            "description": "Get current weather for a city",
            "parameters": {
                "type": "object",
                "properties": {"city": {"type": "string"}},
                "required": ["city"]
            }
        }
    }],
    tool_choice="auto"
)

Structured output (JSON schema)

response = client.chat.completions.create(
    model="nlc-pro",
    messages=[{"role": "user", "content": "Extract: Alice, 30, engineer"}],
    response_format={"type": "json_schema", "json_schema": {
        "name": "person",
        "schema": {
            "type": "object",
            "properties": {"name": {"type": "string"}, "age": {"type": "number"}, "job": {"type": "string"}},
            "required": ["name", "age", "job"]
        }
    }}
)

OpenAI Completions (raw text)

For raw text generation with custom prompt templates. Use this when you need full control over prompt formatting (base models, few-shot, legacy).

POST https://oddsforge.org/v1/completions

{
  "model": "nlc-fast",
  "prompt": "Once upon a time",
  "max_tokens": 100,
  "temperature": 0.7
}

response = client.completions.create(
    model="nlc-fast",
    prompt="Task: classify sentiment.\nText: I love it.\nSentiment:"
)

OpenAI Responses API

Stateful conversations and advanced tool use. Continue chats with previous_response_id without resending the full history. Supports MCP/SSE server tools and client function tools. Stored by default — set store: false to opt out.

POST https://oddsforge.org/v1/responses

{
  "model": "nlc-pro",
  "input": "What's the capital of France?",
  "max_output_tokens": 200
}

Continue a conversation

first = client.responses.create(model="nlc-pro", input="Tell me a joke")
second = client.responses.create(
    model="nlc-pro",
    input="Tell me another one",
    previous_response_id=first.id  # continues the conversation
)

Stored responses & privacy

Responses are stored upstream so previous_response_id works. Listing and deleting stored responses is not supported on NLC (those operations are account-scoped upstream, so we disable them to keep user data isolated). Set store: false on create to skip storage, or save the response id on your side.

Anthropic Messages API

Use the Anthropic Python or TypeScript SDK against our brand. Base URL is https://oddsforge.org/inference (the SDK appends /v1/messages).

POST https://oddsforge.org/v1/messages

{
  "model": "nlc-pro",
  "max_tokens": 256,
  "messages": [{"role": "user", "content": "Say hello in Spanish. Reply in one word."}]
}

Python (Anthropic SDK)

import anthropic

client = anthropic.Anthropic(
    api_key="nlk_your_key_here",
    base_url="https://oddsforge.org/inference",
)

response = client.messages.create(
    model="nlc-pro",
    max_tokens=256,
    messages=[{"role": "user", "content": "Hello!"}]
)
print(response.content[0].text)

JavaScript (Anthropic SDK)

import Anthropic from "@anthropic-ai/sdk";
const client = new Anthropic({
  apiKey: "nlk_your_key_here",
  baseURL: "https://oddsforge.org/inference",
});
const response = await client.messages.create({
  model: "nlc-pro",
  max_tokens: 256,
  messages: [{ role: "user", content: "Hello!" }],
});
console.log(response.content[0].text);

Note: max_tokens is optional on NLC (required on Anthropic). Server-side tool families (code execution, memory, web fetch, web search) are not supported. Tool calling, streaming, structured output, reasoning, and vision work as expected.

Embeddings

Generate vector embeddings for RAG and semantic search. NLC Embed 1.0 supports variable-length output via the dimensions parameter.

POST https://oddsforge.org/v1/embeddings

{
  "model": "nlc-embed",
  "input": "The quick brown fox jumps over the lazy dog"
}

response = client.embeddings.create(
    model="nlc-embed",
    input="The quick brown fox jumps over the lazy dog",
    dimensions=128  # optional — variable-length embeddings
)
print(response.data[0].embedding[:8])

Rerank (second-stage RAG retrieval)

Rerank a list of documents against a query. Use NLC Rerank 1.0 after a first-stage vector search to boost retrieval quality.

POST https://oddsforge.org/v1/rerank

{
  "model": "nlc-rerank",
  "query": "What is the capital of France?",
  "documents": [
    "Paris is the capital of France.",
    "France is in Western Europe.",
    "Python is a programming language."
  ],
  "top_n": 2,
  "return_documents": true
}

import requests
r = requests.post("https://oddsforge.org/v1/rerank",
    headers={"Authorization": "Bearer nlk_your_key_here"},
    json={
        "model": "nlc-rerank",
        "query": "capital of France",
        "documents": ["Paris is the capital of France.", "Bananas are yellow."],
        "top_n": 1
    })
print(r.json()["results"])

Models

All five chat models support streaming and tool calling. The Fusion Router auto-selects the best model, or pass the model id explicitly. Per-model context/output limits — no global cap, so long answers are supported. ShipLow (budget) and ULTRA MAX (premium) are multi-agent pipelines that route your request to the right executor for the task.

Brand	Model ID	Context	Max Output	Best For
NLC ShipLow 1.0 BUDGET	`nlc-shiplow`	131K	32K	Budget default — six-lane multi-agent (code/agent/math/vision/multilingual/general)
NLC Fast 1.0	`nlc-fast`	1M	900K	Cheap single-model, high-throughput coding
NLC Vision 1.0	`nlc-vision`	256K	200K	Images, visual understanding
NLC PRO 1.0	`nlc-pro`	1M	1.04M	Code, reasoning, long docs, large codebases (flagship)
NLC ULTRA MAX 1.0	`nlc-ultra`	1M	1.04M	Full-stack features (premium multi-agent router)
NLC Embed 1.0	`nlc-embed`	8K	—	Embeddings for RAG / semantic search
NLC Rerank 1.0	`nlc-rerank`	32K	—	Document reranking (second-stage RAG)

Pricing (per 1M tokens)

1 credit = $0.001. Charged per actual upstream token usage returned in usage. Cached input tokens billed at a discount.

Model	Input	Output	Credits / 1K in	Credits / 1K out
NLC ShipLow	$0.30	$0.60	0.30	0.60
NLC Fast	$0.45	$0.90	0.45	0.90
NLC Vision	$1.80	$5.50	1.80	5.50
NLC PRO	$3.20	$9.80	3.20	9.80
NLC ULTRA MAX	$3.50	$11.00	3.50	11.00
NLC Embed	$0.20	—	0.20	—
NLC Rerank	$0.20	—	0.20	—

Advanced features

Streaming — every chat/completions, completions, messages, and responses call supports stream: true (SSE).
Tool calling — function calling and tool_choice: auto on all chat models.
Structured output — response_format: json_schema or json_object.
Vision — image input via image_url content parts (NLC Vision 1.0, NLC ULTRA MAX 1.0).
Prompt caching — automatic; cached input tokens billed at a discount (visible in usage.prompt_tokens_details.cached_tokens).
Multi-agent — NLC ULTRA MAX 1.0 routes through a router model to a vision-capable model (frontend) or a deep-reasoning model (backend) based on your task.
Responses API statefulness — continue conversations via previous_response_id without resending history.

Rate Limits

60 requests per minute per user for chat / completions / responses / messages / embeddings / rerank. 10 per minute for image generation.

Errors

{"error": "out of credits - please top up at the dashboard"}  // 402
{"error": "unauthorized"}                                          // 401
{"error": "rate limit exceeded — try again in a minute"}           // 429
{"error": "please verify your email first"}                        // 403

Upstream errors are forwarded with provider names redacted — customers never see any third-party provider or model name in error messages, only NLC brand names.