Changelog

Release Notes

What's new in Wombat. Major updates, new model integrations, and platform improvements.

v0.9

June 2026— Latest Model Refresh

Models

OpenAI GPT-5.4 & GPT-5.5

OpenAI's newest flagship generation with frontier reasoning for coding and professional work.

GPT-5.5: New flagship ($5.00/$30.00 per 1M tokens)
GPT-5.4: More affordable frontier model ($2.50/$15.00)
GPT-5.4 mini ($0.75/$4.50) & nano ($0.20/$1.25)
GPT-5.3 Codex: Most capable agentic coding model ($1.75/$14.00)

Models

xAI Grok

First-class Grok support via xAI's API, with the Grok 4.3 flagship and the Grok 4.20 reasoning/non-reasoning pair.

Grok 4.3: Flagship with 1M context ($1.25/$2.50)
Grok 4.20 Reasoning & Non-Reasoning variants
Industry-leading non-hallucination rate and agentic tool calling

Models

Gemini 3.5 Flash & Flash-Lite

Google's latest fast tier, plus the cost-efficient Gemini 2.5 Flash-Lite.

Gemini 3.5 Flash: Frontier speed ($1.50/$9.00)
Gemini 3.1 Flash-Lite: Low-latency, high-volume ($0.25/$1.50)
Gemini 2.5 Flash-Lite: Most cost-efficient ($0.10/$0.40)

Models

Groq Production Lineup

Current Groq-hosted models for ultra-fast inference.

Llama 3.1 8B Instant: Cheapest on Groq ($0.05/$0.08)
Qwen3 32B with switchable thinking modes ($0.29/$0.59)
Kimi K2 0905: 1T-param agentic coding model ($1.00/$3.00)

Deprecation

Groq Model Retirements

Several older Groq-hosted models were decommissioned upstream and have been removed from the model picker. Existing assistants should migrate to the current Groq lineup above.

Gemma2 9B, Mistral Saba 24B, Llama Guard 3 8B
DeepSeek R1 Distill (Llama 70B & Qwen 32B)
Qwen 2.5 32B / Coder 32B, Qwen QWQ 32B
Llama 3 8B/70B (8192), Llama 4 Maverick

Deprecation

Gemini Legacy Models

Google has retired its older Gemini tiers. Move to Gemini 3.5 Flash, 3.1 Flash-Lite, or 2.5 Flash-Lite.

Gemini 2.0 Flash & Flash-Lite shut down June 1, 2026
Gemini 1.5 Flash removed from the current lineup

v0.8

January 2026— Cerebras Integration

Models

Cerebras Models

Ultra-fast inference powered by Wafer-Scale Engine technology.

Llama 3.1 8B at $0.10/$0.10 per 1M tokens
Llama 3.3 70B with function calling at $0.60/$0.60
Qwen 3 32B supporting 29+ languages at $0.20/$0.20
GPT-OSS 120B reasoning model at 3,000 TPS

Enhancement

Speed Improvements

Industry-leading throughput for high-volume applications with simple per-token pricing.

v0.7

December 2025— Gemini 3 Models

Models

Gemini 3 Pro & Flash

Google's latest models with 1M token context and PhD-level reasoning (90.4% on GPQA Diamond).

Gemini 3 Pro: $2.00/$12.00 per 1M tokens
Gemini 3 Flash: 3x faster at $0.50/$3.00
Native multimodal: text, images, audio, video

Feature

Agentic Workflows

Built-in support for real-time tool use, coding, and function calling.

v0.6

November 2025— GPT-5.2 Launch

Models

GPT-5.2

OpenAI's flagship with 400K context, 128K output, and enhanced reasoning tokens.

Pricing: $1.75 input / $14.00 output per 1M tokens
Excels at agentic workflows and multi-step coding
Enhanced structured document analysis

v0.5

August 2025— GPT-5 Model Family

Models

Full GPT-5 Suite

Complete model family from premium to cost-efficient variants.

GPT-5: Premium flagship ($1.25/$10.00)
GPT-5 mini: Balanced performance ($0.25/$2.00)
GPT-5 nano: Fastest, most affordable ($0.05/$0.40)
GPT-5.1 Codex: Agentic coding ($1.50/$12.00)

v0.4

August 2025— GPT OSS Models via Groq

Models

Open Source GPT Models

OpenAI's open-weight models available through Groq's ultra-fast infrastructure.

GPT-OSS 20B: $0.10/$0.50 per 1M tokens
GPT-OSS 120B: $0.15/$0.75 per 1M tokens
Built-in browser search and code execution

v0.3

May 2025— Emotion & Sentiment Analysis

Feature

Emotions Detection

Detect emotions in English conversations for deeper user understanding.

Feature

Sentiment Analytics

Track sentiment for both user input and AI responses with performance stats.

v0.2

April 2025— Intelligent RAG

Feature

RAG Enhancements

AI-driven retrieval that chooses what to search and when, unified across all providers.

Content deduplication and cost savings
Context propagation while needed
Detailed RAG analytics

Models

New Models

LLaMA 4 Scout/Maverik via Groq, OpenAI GPT 4.1 family.

v0.1

March 2025— Multi-Provider Support

Feature

Provider Integration

One API for OpenAI, Gemini, and Groq with automatic failover.

Feature

Observability

Latency metrics, cost breakdowns, and cache savings tracking.

v0.0

September 2024— Initial Release

Feature

Core Platform

Conversation management, WhatsApp integration, RAG model support, and knowledge base.