LLM Clients (Layer 1)¶

Layer 1 provides standardized access to different language model providers through a unified interface. This is the foundation of the Arshai framework - giving you consistent, reliable access to LLMs without vendor lock-in.

LLM Client Implementation

Core Philosophy¶

LLM clients in Arshai follow these principles:

Standardized Interface: All LLM clients implement the ILLM interface, providing consistent methods regardless of the underlying provider.
Direct Configuration: You create and configure clients explicitly using ILLMConfig. No hidden settings or magic configuration.
Full Feature Support: Clients support streaming, structured output, function calling, and background tasks across all providers.
Provider Abstraction: Switch between providers by changing the client instance - your application code remains the same.

Basic Usage Pattern¶

All LLM clients follow this consistent pattern:

from arshai.llms.openai import OpenAIClient
from arshai.core.interfaces.illm import ILLMConfig, ILLMInput

# 1. Configure the client
config = ILLMConfig(
    model="gpt-4o-mini",
    temperature=0.7,
    max_tokens=500
)

# 2. Create the client
llm_client = OpenAIClient(config)

# 3. Prepare input
input_data = ILLMInput(
    system_prompt="You are a helpful assistant",
    user_message="What is the capital of France?"
)

# 4. Get response
response = await llm_client.chat(input_data)
print(response["llm_response"])

Advanced Features¶

Streaming Responses:

async for chunk in llm_client.stream(input_data):
    if chunk.get("llm_response"):
        print(chunk["llm_response"], end="")

Function Calling:

def get_weather(city: str) -> str:
    return f"Weather in {city}: Sunny, 22°C"

input_data = ILLMInput(
    system_prompt="You can check weather for users",
    user_message="What's the weather in Paris?",
    regular_functions={"get_weather": get_weather}
)

Structured Output:

from pydantic import BaseModel

class Analysis(BaseModel):
    sentiment: str
    confidence: float

input_data = ILLMInput(
    system_prompt="Analyze sentiment",
    user_message="I love this product!",
    structure_type=Analysis
)

Background Tasks:

def log_interaction(action: str, user_id: str = "anonymous"):
    # Runs in background, doesn't return to conversation
    print(f"Logged: {action} by {user_id}")

input_data = ILLMInput(
    system_prompt="You are a helpful assistant",
    user_message="Hello!",
    background_tasks={"log_interaction": log_interaction}
)

Available Providers¶

OpenAI (arshai.llms.openai.OpenAIClient)

Models: GPT-4, GPT-4 Turbo, GPT-3.5 Turbo
Features: Chat, streaming, function calling, structured output
Configuration: API key via environment or direct config

Azure OpenAI (arshai.llms.azure.AzureClient)

Models: Azure-hosted OpenAI models
Features: Same as OpenAI with Azure-specific configuration
Configuration: Azure endpoint, API key, deployment names

Google Gemini (arshai.llms.google_genai.GoogleGenAIClient)

Models: Gemini Pro, Gemini Pro Vision
Features: Chat, streaming, function calling (reference implementation)
Configuration: Google AI API key

OpenRouter (arshai.llms.openrouter.OpenRouterClient)

Models: Access to multiple providers through OpenRouter
Features: Unified access to Claude, GPT, Llama, and more
Configuration: OpenRouter API key

Interface Details¶

All clients implement the ILLM interface with these core methods:

async chat(input: ILLMInput) -> Dict[str, Any]: Single-turn conversation that returns the complete response.
async stream(input: ILLMInput) -> AsyncGenerator[Dict[str, Any], None]: Streaming conversation that yields response chunks as they arrive.

The ILLMInput contains:

system_prompt: Instructions defining the AI’s behavior
user_message: The user’s input message
regular_functions: Dict of functions the LLM can call
background_tasks: Dict of fire-and-forget functions
structure_type: Pydantic model for structured output
max_turns: Maximum conversation turns for function calling

Testing and Reliability¶

All LLM clients are tested with identical test scenarios to ensure consistent behavior:

Simple knowledge queries
Structured output generation
Function calling (sequential and parallel)
Background task execution
Streaming capabilities
Usage tracking

This comprehensive testing ensures that switching between providers won’t break your application logic.

Error Handling¶

LLM clients implement defensive error handling:

Rate limiting with automatic retries
Graceful degradation when features aren’t available
Safe handling of provider-specific errors
Comprehensive logging for debugging

Next Steps¶

ILLM Interface Overview - Detailed interface documentation
OpenAI Client - OpenAI-specific implementation details
Extending LLM Clients - Adding support for new providers