Groq Provider
The Groq provider enables access to Groq's ultra-fast LLM inference engine with support for various open-weight models.
Supported Models
python
from aicore.models_metadata import METADATA
# List available Groq models
groq_models = [model for model in METADATA if model.startswith("groq-")]
print(groq_models)
Configuration
YAML Configuration
yaml
provider: groq
api_key: "your_api_key_here" # Get from Groq console
model: "meta-llama/llama-4-maverick-17b-128e-instruct" # Default model
temperature: 0.7 # Optional
max_tokens: 1024 # Optional
Python Configuration
python
from aicore.llm.config import LlmConfig
config = LlmConfig(
provider="groq",
api_key="your_api_key_here",
model="meta-llama/llama-4-maverick-17b-128e-instruct",
temperature=0.7,
max_tokens=1024
)
Key Features
- Ultra-low latency: Optimized for high-speed inference
- Streaming support: Real-time token streaming
- Cost tracking: Automatic token counting and cost estimation
- Multiple model support: Access to various open-weight models
Usage Examples
Basic Completion
python
from aicore.llm import Llm
llm = Llm(config=config)
response = llm.complete("Explain quantum computing in simple terms")
print(response)
Advanced Usage
python
# With conversation history
messages = [
{"role": "system", "content": "You are a helpful assistant"},
{"role": "user", "content": "What's the weather today?"}
]
response = llm.chat_complete(messages)
Best Practices
Model Selection: Choose the appropriate model for your use case
Performance Tuning:
- Adjust
temperature
for creativity vs consistency - Set
max_tokens
to control response length
- Adjust
Error Handling:
- The provider implements automatic retries for transient errors
- See retry mechanism for details
For advanced usage, refer to the base provider documentation.