OpenAI Client¶
The OpenAI client provides standardized access to OpenAI’s language models through the Arshai framework. It implements the full ILLM interface with support for chat, streaming, function calling, structured output, and background tasks.
Note
This documentation reflects the actual implementation based on tested functionality. All examples are verified through the framework’s test suite.
Configuration¶
Basic Setup:
from arshai.llms.openai import OpenAIClient
from arshai.core.interfaces.illm import ILLMConfig
# Configure the client
config = ILLMConfig(
model="gpt-4o-mini", # Any OpenAI model name
temperature=0.7, # 0.0 = deterministic, 1.0 = creative
max_tokens=500, # Response length limit
top_p=1.0, # Nucleus sampling parameter
frequency_penalty=0.0, # Reduce repetition
presence_penalty=0.0 # Encourage topic diversity
)
# Create client
client = OpenAIClient(config)
Environment Variables:
# Required
export OPENAI_API_KEY="your-openai-api-key"
# Optional - for organization usage
export OPENAI_ORG_ID="your-organization-id"
# Optional - for custom endpoints (e.g., Azure OpenAI)
export OPENAI_BASE_URL="https://your-custom-endpoint.com/v1"
Supported Models¶
The OpenAI client supports all models available through OpenAI’s chat completions API. The client works with any model name that OpenAI’s API accepts, including:
- Current Popular Models (as examples):
gpt-4o: Latest GPT-4 optimized modelgpt-4o-mini: Fast and cost-effective optiongpt-4-turbo: Previous generation high-performance modelgpt-3.5-turbo: Legacy but efficient model
- Model Compatibility
The client dynamically works with OpenAI’s chat completions endpoint, so any model supported by that API will work. This includes:
All current GPT models
Future models as they become available
Custom fine-tuned models in your organization
Regional model variants
Note
No Model Restrictions: The framework doesn’t enforce which models you can use. Simply specify any valid OpenAI model name in your configuration.
Current Models: Check OpenAI’s documentation for the most up-to-date list of available models, pricing, and capabilities.
Basic Usage¶
Simple conversation:
from arshai.core.interfaces.illm import ILLMInput
# Prepare input
input_data = ILLMInput(
system_prompt="You are a helpful travel assistant with expertise in Japanese culture and tourism.",
user_message="I'm planning a trip to Japan. What should I know about Tokyo?"
)
# Get response
response = await client.chat(input_data)
print(response["llm_response"])
print(f"Tokens used: {response['usage']['total_tokens']}")
Streaming responses:
async for chunk in client.stream(input_data):
if chunk.get("llm_response"):
print(chunk["llm_response"], end="", flush=True)
if chunk.get("usage"):
print(f"\nTotal tokens: {chunk['usage']['total_tokens']}")
Function Calling¶
The OpenAI client supports both regular functions and background tasks:
Regular Functions (results integrated into conversation):
def calculate_power(base: float, exponent: float) -> float:
"""Calculate base raised to the power of exponent."""
return base ** exponent
def multiply_numbers(a: float, b: float) -> float:
"""Multiply two numbers together."""
return a * b
input_data = ILLMInput(
system_prompt="You are a mathematics assistant. Use the provided tools for calculations.",
user_message="Calculate 5 to the power of 2, then multiply the result by 3. Show each step.",
regular_functions={
"calculate_power": calculate_power,
"multiply_numbers": multiply_numbers
},
max_turns=10 # Allow multiple function calls
)
response = await client.chat(input_data)
# LLM will call calculate_power(5, 2), get 25, then call multiply_numbers(25, 3)
Background Tasks (fire-and-forget execution):
def log_user_interaction(action: str, details: str = "User interaction"):
"""Log user interactions for analytics (background task)."""
import datetime
timestamp = datetime.datetime.now().isoformat()
print(f"[{timestamp}] ANALYTICS: {action} - {details}")
input_data = ILLMInput(
system_prompt="You are a helpful assistant. For every user interaction, log it for analytics.",
user_message="What is the capital of France?",
background_tasks={
"log_user_interaction": log_user_interaction
}
)
response = await client.chat(input_data)
# LLM answers the question AND calls log_user_interaction in background
Parallel Function Calling:
input_data = ILLMInput(
system_prompt="You can perform multiple calculations simultaneously.",
user_message="Calculate: 3^2, 4^2, and 6*7. You can call multiple functions at once.",
regular_functions={
"calculate_power": calculate_power,
"multiply_numbers": multiply_numbers
}
)
# LLM can execute multiple function calls in parallel for efficiency
Structured Output¶
Generate structured data using Pydantic models:
from pydantic import BaseModel, Field
from typing import List
class SentimentAnalysis(BaseModel):
"""Structured sentiment analysis result."""
topic: str = Field(description="Main topic being analyzed")
sentiment: str = Field(description="Overall sentiment (positive/negative/neutral)")
confidence: float = Field(description="Confidence score between 0.0 and 1.0")
key_points: List[str] = Field(description="List of key points identified")
input_data = ILLMInput(
system_prompt="You are an expert sentiment analyst. Analyze the provided text thoroughly.",
user_message="The new renewable energy project is fantastic! It will create thousands of jobs and reduce emissions significantly.",
structure_type=SentimentAnalysis
)
response = await client.chat(input_data)
analysis = response["llm_response"] # Returns SentimentAnalysis instance
print(f"Topic: {analysis.topic}")
print(f"Sentiment: {analysis.sentiment}")
print(f"Confidence: {analysis.confidence}")
print(f"Key points: {', '.join(analysis.key_points)}")
Streaming with structured output:
# For streaming, use dict-based models with schema method
from typing import TypedDict
class StreamingSentiment(TypedDict):
topic: str
sentiment: str
confidence: float
key_points: List[str]
@classmethod
def model_json_schema(cls):
return {
"type": "object",
"properties": {
"topic": {"type": "string", "description": "Main topic"},
"sentiment": {"type": "string", "description": "Sentiment"},
"confidence": {"type": "number", "minimum": 0, "maximum": 1},
"key_points": {"type": "array", "items": {"type": "string"}}
},
"required": ["topic", "sentiment", "confidence", "key_points"]
}
input_data = ILLMInput(
system_prompt="Analyze sentiment and return structured data.",
user_message="I love this product! It's amazing and works perfectly.",
structure_type=StreamingSentiment
)
async for chunk in client.stream(input_data):
if chunk.get("llm_response") and isinstance(chunk["llm_response"], dict):
result = chunk["llm_response"]
if "sentiment" in result:
print(f"Sentiment: {result['sentiment']}")
Advanced Features¶
- Custom Base URL Support
The client supports custom endpoints for Azure OpenAI or other compatible services:
# Via environment variable
export OPENAI_BASE_URL="https://your-azure-instance.openai.azure.com/openai/deployments/your-deployment/v1"
# The client automatically uses the custom endpoint
- Safe HTTP Configuration
The client includes enhanced HTTP safety features when available:
# Automatically uses SafeHttpClientFactory if available
# Falls back to standard OpenAI client if not
client = OpenAIClient(config) # Safe by default
- Context Management
Use the client as a context manager for proper resource cleanup:
async with OpenAIClient(config) as client:
response = await client.chat(input_data)
# Client automatically closes connections when exiting
Error Handling¶
The OpenAI client implements comprehensive error handling:
Rate Limiting:
import asyncio
async def chat_with_retry(client, input_data, max_retries=3):
"""Example retry logic for rate limiting."""
for attempt in range(max_retries):
try:
return await client.chat(input_data)
except Exception as e:
if "429" in str(e) and attempt < max_retries - 1:
wait_time = 2 ** attempt # Exponential backoff
await asyncio.sleep(wait_time)
continue
raise
Configuration Validation:
# Invalid configuration will raise errors during client creation
try:
config = ILLMConfig(model="invalid-model")
client = OpenAIClient(config)
except ValueError as e:
print(f"Configuration error: {e}")
Network and API Errors:
try:
response = await client.chat(input_data)
except Exception as e:
if "authentication" in str(e).lower():
print("Check your OpenAI API key")
elif "quota" in str(e).lower():
print("API quota exceeded")
else:
print(f"Unexpected error: {e}")
Usage Tracking¶
The client provides detailed usage information compatible with OpenAI’s latest API format:
response = await client.chat(input_data)
if response["usage"]:
usage = response["usage"]
print(f"Input tokens: {usage['input_tokens']}")
print(f"Output tokens: {usage['output_tokens']}")
print(f"Total tokens: {usage['total_tokens']}")
print(f"Thinking tokens: {usage['thinking_tokens']}") # For reasoning models
print(f"Tool calling tokens: {usage['tool_calling_tokens']}") # Function calls
# Provider information
print(f"Provider: {usage['provider']}")
print(f"Model: {usage['model']}")
print(f"Request ID: {usage['request_id']}")
Performance Optimization¶
Model Selection:
# For simple tasks
config_fast = ILLMConfig(model="gpt-4o-mini", temperature=0.3)
# For complex reasoning
config_powerful = ILLMConfig(model="gpt-4o", temperature=0.7)
# For specific capabilities (check OpenAI docs for latest models)
config_latest = ILLMConfig(model="gpt-4o", temperature=0.5)
Token Management:
# Limit response length for cost control
config = ILLMConfig(
model="gpt-4o-mini",
max_tokens=200, # Shorter responses
temperature=0.2 # More focused responses
)
Streaming for Better UX:
# Use streaming for real-time user interfaces
async def stream_to_user(client, input_data):
response_text = ""
async for chunk in client.stream(input_data):
if chunk.get("llm_response"):
new_text = chunk["llm_response"]
# Display incremental text to user
print(new_text[len(response_text):], end="", flush=True)
response_text = new_text
Testing Integration¶
The OpenAI client is thoroughly tested with scenarios including:
Simple knowledge queries with pattern validation
Structured output generation and validation
Sequential function calling with step-by-step execution
Parallel function calling for efficiency
Background task execution with verification
Streaming behavior validation
Usage tracking accuracy
These tests ensure reliable behavior across different use cases and can serve as examples for your own implementations.
Implementation Notes¶
- OpenAI Responses API
The client uses OpenAI’s latest Responses API for structured output and enhanced function calling capabilities.
- Progressive Function Execution
Functions execute immediately when detected during streaming, providing real-time responsiveness.
- Safe HTTP Handling
Enhanced HTTP client configuration when SafeHttpClientFactory is available, with graceful fallback.
- Resource Management
Proper connection cleanup through context managers and destructor methods.
Limitations and Considerations¶
- Rate Limits
OpenAI enforces rate limits based on your plan. Implement retry logic for production use.
- Cost Management
Monitor token usage carefully, especially with advanced models. Consider using max_tokens limits.
- Model Availability
Model names and availability change. The client works with any valid OpenAI model.
- Function Calling Limits
Complex function calling scenarios may hit context length limits. Design functions to be concise.
- Streaming Consistency
Streaming behavior may vary based on response length and complexity. Test thoroughly for your use cases.
Next Steps¶
Azure OpenAI Client - Azure OpenAI integration
Extending LLM Clients - Creating custom LLM clients
Agents (Layer 2) - Building agents with LLM clients