NLC AI API Reference

Full API documentation with code examples. NLC ships four protocols: OpenAI Chat, OpenAI Completions, OpenAI Responses, and Anthropic Messages — all under our brand, all billed in credits.

Start here · Cursor / VS Code / OpenAI SDK

Base URLhttps://oddsforge.org/v1
Endpoint/chat/completions  (or /messages for Anthropic SDK)
Auth headerAuthorization: Bearer nlk_… (from Dashboard)
Recommended modelnlc-shiplow · budget multi-agent · live default
Other modelsnlc-fast · nlc-vision · nlc-pro · nlc-ultra

Cursor setup: Settings → Models → Override OpenAI Base URL → paste the Base URL above → paste your nlk_ key → Verify → add custom model id nlc-shiplow. See full guide.

Which endpoint does Cursor call? Cursor uses /chat/completions automatically — you don't pick the path, just the Base URL. If you switch the provider to Anthropic in Cursor's settings, it calls /messages instead. Both work with every NLC model.

Endpoints at a glance

EndpointMethodProtocolUse
/v1/chat/completionsPOSTOpenAIChat (streaming + tools)
/v1/completionsPOSTOpenAIRaw text completion (custom prompt templates)
/v1/responsesPOSTOpenAI ResponsesStateful conversations + tool use (previous_response_id)
/v1/messagesPOSTAnthropicAnthropic SDK compatible
/v1/embeddingsPOSTOpenAIEmbeddings for RAG / semantic search
/v1/rerankPOSTNLCDocument reranking for second-stage RAG
/v1/modelsGETOpenAIList chat models (brand ids only)
/v1/available-modelsGETNLCFull catalog with capabilities + pricing
/v1/images/generationsPOSTOpenAIImage generation (when configured)

Which endpoint works with which model?

Not every model works on every endpoint. Here's the full compatibility matrix — and the difference between each protocol so you know which one to pick.

Endpoint PRO Vision Fast ULTRA Embed Rerank Best for
/v1/chat/completions Chat, code, vision, multi-agent — the main endpoint
/v1/completions Raw text generation, custom prompt templates, base models
/v1/responses Stateful conversations (previous_response_id), MCP tools
/v1/messages Anthropic SDK (Claude Code, Anthropic Python/JS SDK)
/v1/embeddings Vector embeddings for RAG / semantic search
/v1/rerank Rerank documents for second-stage RAG retrieval

What's the difference between the protocols?

OpenAI Chat Completions — /v1/chat/completions

The standard endpoint for chat. Send messages (system + user + assistant), get a response. Supports streaming, tool calling, vision (image_url), and structured output (JSON schema). Use this with Cursor, VS Code (Continue, Copilot, Cline), any OpenAI SDK. Works with all 5 chat models (ShipLow, Fast, Vision, PRO, ULTRA MAX).

OpenAI Completions — /v1/completions

Raw text completion — you provide a prompt string (not messages), and the model continues it. Use this when you need custom prompt formatting (few-shot, base models, legacy apps). Same models as Chat Completions.

OpenAI Responses — /v1/responses

Stateful conversations — the server stores your conversation, so you can continue with previous_response_id without resending the full history. Supports MCP/SSE server tools and client function tools. Use this for multi-turn apps where you don't want to manage message history yourself. Same models as Chat Completions.

Anthropic Messages — /v1/messages

For Claude Code, Anthropic Python/JS SDK. Same models, same billing, same NLC key — just a different protocol shape. Base URL is https://oddsforge.org/inference (the SDK appends /v1/messages). Same models as Chat Completions.

Embeddings — /v1/embeddings

Convert text to vectors for RAG / semantic search. Only works with NLC Embed 1.0. Supports variable-length output via the dimensions parameter.

Rerank — /v1/rerank

Rerank a list of documents against a query — second-stage RAG retrieval. Only works with NLC Rerank 1.0. Send a query + array of documents, get back relevance scores.

Authentication

Get your API key from the Dashboard. Pass it as a Bearer token on every request:

Authorization: Bearer nlk_xxxxxxxxxxxxxxxx

For the Anthropic SDK you can also use the x-api-key header (Anthropic convention) — both work.

OpenAI Chat Completions

The main endpoint for chat. Supports streaming, tool calling, structured output, and vision (image_url content parts).

POST https://oddsforge.org/v1/chat/completions

{
  "model": "nlc-shiplow",
  "messages": [
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": "Explain quantum computing in simple terms."}
  ],
  "stream": true,
  "max_tokens": 4096
}

Python

from openai import OpenAI

client = OpenAI(base_url="https://oddsforge.org/v1", api_key="nlk_your_key_here")

response = client.chat.completions.create(
    model="nlc-shiplow",   # budget default · vision + code + multilingual
    messages=[{"role": "user", "content": "Hello!"}]
)
print(response.choices[0].message.content)

JavaScript

import OpenAI from 'openai';
const client = new OpenAI({ baseURL: 'https://oddsforge.org/v1', apiKey: 'nlk_your_key_here' });

const response = await client.chat.completions.create({
    model: 'nlc-shiplow',   // budget default · vision + code + multilingual
    messages: [{ role: 'user', content: 'Hello!' }]
});
console.log(response.choices[0].message.content);

Vision (image input)

Use nlc-vision with image_url content parts (URL or base64 data URI).

response = client.chat.completions.create(
    model="nlc-vision",
    messages=[{
        "role": "user",
        "content": [
            {"type": "text", "text": "What's in this image?"},
            {"type": "image_url", "image_url": {"url": "https://example.com/cat.jpg"}}
        ]
    }]
)

Tool calling (function calling)

response = client.chat.completions.create(
    model="nlc-pro",
    messages=[{"role": "user", "content": "What's the weather in Tokyo?"}],
    tools=[{
        "type": "function",
        "function": {
            "name": "get_weather",
            "description": "Get current weather for a city",
            "parameters": {
                "type": "object",
                "properties": {"city": {"type": "string"}},
                "required": ["city"]
            }
        }
    }],
    tool_choice="auto"
)

Structured output (JSON schema)

response = client.chat.completions.create(
    model="nlc-pro",
    messages=[{"role": "user", "content": "Extract: Alice, 30, engineer"}],
    response_format={"type": "json_schema", "json_schema": {
        "name": "person",
        "schema": {
            "type": "object",
            "properties": {"name": {"type": "string"}, "age": {"type": "number"}, "job": {"type": "string"}},
            "required": ["name", "age", "job"]
        }
    }}
)

OpenAI Completions (raw text)

For raw text generation with custom prompt templates. Use this when you need full control over prompt formatting (base models, few-shot, legacy).

POST https://oddsforge.org/v1/completions

{
  "model": "nlc-fast",
  "prompt": "Once upon a time",
  "max_tokens": 100,
  "temperature": 0.7
}
response = client.completions.create(
    model="nlc-fast",
    prompt="Task: classify sentiment.\nText: I love it.\nSentiment:"
)

OpenAI Responses API

Stateful conversations and advanced tool use. Continue chats with previous_response_id without resending the full history. Supports MCP/SSE server tools and client function tools. Stored by default — set store: false to opt out.

POST https://oddsforge.org/v1/responses

{
  "model": "nlc-pro",
  "input": "What's the capital of France?",
  "max_output_tokens": 200
}

Continue a conversation

first = client.responses.create(model="nlc-pro", input="Tell me a joke")
second = client.responses.create(
    model="nlc-pro",
    input="Tell me another one",
    previous_response_id=first.id  # continues the conversation
)

Stored responses & privacy

Responses are stored upstream so previous_response_id works. Listing and deleting stored responses is not supported on NLC (those operations are account-scoped upstream, so we disable them to keep user data isolated). Set store: false on create to skip storage, or save the response id on your side.

Anthropic Messages API

Use the Anthropic Python or TypeScript SDK against our brand. Base URL is https://oddsforge.org/inference (the SDK appends /v1/messages).

POST https://oddsforge.org/v1/messages

{
  "model": "nlc-pro",
  "max_tokens": 256,
  "messages": [{"role": "user", "content": "Say hello in Spanish. Reply in one word."}]
}

Python (Anthropic SDK)

import anthropic

client = anthropic.Anthropic(
    api_key="nlk_your_key_here",
    base_url="https://oddsforge.org/inference",
)

response = client.messages.create(
    model="nlc-pro",
    max_tokens=256,
    messages=[{"role": "user", "content": "Hello!"}]
)
print(response.content[0].text)

JavaScript (Anthropic SDK)

import Anthropic from "@anthropic-ai/sdk";
const client = new Anthropic({
  apiKey: "nlk_your_key_here",
  baseURL: "https://oddsforge.org/inference",
});
const response = await client.messages.create({
  model: "nlc-pro",
  max_tokens: 256,
  messages: [{ role: "user", content: "Hello!" }],
});
console.log(response.content[0].text);

Note: max_tokens is optional on NLC (required on Anthropic). Server-side tool families (code execution, memory, web fetch, web search) are not supported. Tool calling, streaming, structured output, reasoning, and vision work as expected.

Embeddings

Generate vector embeddings for RAG and semantic search. NLC Embed 1.0 supports variable-length output via the dimensions parameter.

POST https://oddsforge.org/v1/embeddings

{
  "model": "nlc-embed",
  "input": "The quick brown fox jumps over the lazy dog"
}
response = client.embeddings.create(
    model="nlc-embed",
    input="The quick brown fox jumps over the lazy dog",
    dimensions=128  # optional — variable-length embeddings
)
print(response.data[0].embedding[:8])

Rerank (second-stage RAG retrieval)

Rerank a list of documents against a query. Use NLC Rerank 1.0 after a first-stage vector search to boost retrieval quality.

POST https://oddsforge.org/v1/rerank

{
  "model": "nlc-rerank",
  "query": "What is the capital of France?",
  "documents": [
    "Paris is the capital of France.",
    "France is in Western Europe.",
    "Python is a programming language."
  ],
  "top_n": 2,
  "return_documents": true
}
import requests
r = requests.post("https://oddsforge.org/v1/rerank",
    headers={"Authorization": "Bearer nlk_your_key_here"},
    json={
        "model": "nlc-rerank",
        "query": "capital of France",
        "documents": ["Paris is the capital of France.", "Bananas are yellow."],
        "top_n": 1
    })
print(r.json()["results"])

Models

All five chat models support streaming and tool calling. The Fusion Router auto-selects the best model, or pass the model id explicitly. Per-model context/output limits — no global cap, so long answers are supported. ShipLow (budget) and ULTRA MAX (premium) are multi-agent pipelines that route your request to the right executor for the task.

BrandModel IDContextMax OutputBest For
NLC ShipLow 1.0 BUDGETnlc-shiplow131K32KBudget default — six-lane multi-agent (code/agent/math/vision/multilingual/general)
NLC Fast 1.0nlc-fast1M900KCheap single-model, high-throughput coding
NLC Vision 1.0nlc-vision256K200KImages, visual understanding
NLC PRO 1.0nlc-pro1M1.04MCode, reasoning, long docs, large codebases (flagship)
NLC ULTRA MAX 1.0nlc-ultra1M1.04MFull-stack features (premium multi-agent router)
NLC Embed 1.0nlc-embed8KEmbeddings for RAG / semantic search
NLC Rerank 1.0nlc-rerank32KDocument reranking (second-stage RAG)

Pricing (per 1M tokens)

1 credit = $0.001. Charged per actual upstream token usage returned in usage. Cached input tokens billed at a discount.

ModelInputOutputCredits / 1K inCredits / 1K out
NLC ShipLow$0.30$0.600.300.60
NLC Fast$0.45$0.900.450.90
NLC Vision$1.80$5.501.805.50
NLC PRO$3.20$9.803.209.80
NLC ULTRA MAX$3.50$11.003.5011.00
NLC Embed$0.200.20
NLC Rerank$0.200.20

Advanced features

Rate Limits

60 requests per minute per user for chat / completions / responses / messages / embeddings / rerank. 10 per minute for image generation.

Errors

{"error": "out of credits - please top up at the dashboard"}  // 402
{"error": "unauthorized"}                                          // 401
{"error": "rate limit exceeded — try again in a minute"}           // 429
{"error": "please verify your email first"}                        // 403

Upstream errors are forwarded with provider names redacted — customers never see any third-party provider or model name in error messages, only NLC brand names.