Core LLaMA plugin for Eliza OS that provides local Large Language Model capabilities.
The LLaMA plugin serves as a foundational component of Eliza OS, providing local LLM capabilities using LLaMA models. It enables efficient and customizable text generation with both CPU and GPU support.
- Local LLM Support: Run LLaMA models locally
- GPU Acceleration: CUDA support for faster inference
- Flexible Configuration: Customizable parameters for text generation
- Message Queuing: Efficient handling of multiple requests
- Automatic Model Management: Download and verification systems
npm install @elizaos/plugin-llama
The plugin can be configured through environment variables:
LLAMALOCAL_PATH=your_model_storage_path
OLLAMA_MODEL=optional_ollama_model_name
import { createLlamaPlugin } from "@elizaos/plugin-llama";
// Initialize the plugin
const llamaPlugin = createLlamaPlugin();
// Register with Eliza OS
elizaos.registerPlugin(llamaPlugin);
Provides local LLM capabilities using LLaMA models.
- Model: Hermes-3-Llama-3.1-8B (8-bit quantized)
- Source: Hugging Face (NousResearch/Hermes-3-Llama-3.1-8B-GGUF)
- Context Size: 8192 tokens
- Inference: CPU and GPU (CUDA) support
-
Text Generation
- Completion-style inference
- Temperature control
- Stop token configuration
- Frequency and presence penalties
- Maximum token limit control
-
Model Management
- Automatic model downloading
- Model file verification
- Automatic retry on initialization failures
- GPU detection for acceleration
-
Performance
- Message queuing system
- CUDA acceleration when available
- Configurable context size
- Model Initialization Failures
Error: Model initialization failed
- Verify model file exists and is not corrupted
- Check available system memory
- Ensure CUDA is properly configured (if using GPU)
- Performance Issues
Warning: No CUDA detected - local response will be slow
- Verify CUDA installation if using GPU
- Check system resources
- Consider reducing context size
Enable debug logging for detailed troubleshooting:
process.env.DEBUG = "eliza:plugin-llama:*";
- Node.js 16.x or higher
- Minimum 8GB RAM recommended
- CUDA-compatible GPU (optional, for acceleration)
- Sufficient storage for model files
-
Model Selection
- Choose appropriate model size
- Use quantized versions when possible
- Balance quality vs speed
-
Resource Management
- Monitor memory usage
- Configure appropriate context size
- Optimize batch processing
-
GPU Utilization
- Enable CUDA when available
- Monitor GPU memory
- Balance CPU/GPU workload
For issues and feature requests, please:
- Check the troubleshooting guide above
- Review existing GitHub issues
- Submit a new issue with:
- System information
- Error logs
- Steps to reproduce
This plugin integrates with and builds upon:
- LLaMA - Base language model
- node-llama-cpp - Node.js bindings
- GGUF - Model format
Special thanks to:
- The LLaMA community for model development
- The Node.js community for tooling support
- The Eliza community for testing and feedback
This plugin is part of the Eliza project. See the main project repository for license information.