Skip to content

πŸ€–πŸ’‘ LiveIdeaBench: Evaluating LLMs' Scientific Creativity and Idea Generation with Minimal Context

Notifications You must be signed in to change notification settings

x66ccff/liveideabench

Repository files navigation

πŸ€–πŸ’‘ LiveIdeaBench: Evaluating LLMs' Scientific Creativity and Idea Generation with Minimal Context

"It's not like finding a needle in a haystack, it is like creating new needles."

πŸ† Leaderboard: http://liveideabench.com πŸ’‘

Dataset

Hugging Face Models

Paper

arXiv

πŸ§ βœ¨πŸŽ‰ News (2025/1/27): Latest Dataset Update on Hugging Face!

We are excited to announce that the latest dataset, including supplementary tests for models like deepseek-R1, deepseek-V3, minimax-01, phi-4, and Opus, has been uploaded to Hugging Face! πŸš€

Check it out here: https://huggingface.co/datasets/6cf/liveideabench-DLC-250127


LiveIdeaBench Evaluation Framework

LiveIdeaBench Evaluation Framework Leaderboard

Evaluation Instruction

Database Initialization

Run the Python script to initialize the database:

python -c "from utils.database import init_database; init_database()"

Configuring API Keys

Before running the program, you need to configure at least one API key:

  1. Create an apikey file and write your OpenRouter API key:

    echo "your-openrouter-api-key" > apikey

    Alternatively, set environment variables:

    export OPENROUTER_API_KEY="your-openrouter-api-key"
    export STEP_API_KEY="your-step-api-key"
    export GEMINI_API_KEYS="key1,key2,key3"

Running Examples

Generate and evaluate ideas using a specified model:

# Generate ideas using a specified model
python run.py --idea_model "openai/gpt-4o-mini"

# Use a specific provider
python run.py --idea_model "openai/gpt-4o-mini" --provider openrouter
# Use a single keyword:

python run.py --idea_model "openai/gpt-4o-mini" --keyword "relativity"
# Use multiple keywords:

python run.py --idea_model "openai/gpt-4o-mini" --keyword "relativity" "periodic table"
# Do not specify a keyword (use all keywords):

python run.py --idea_model "openai/gpt-4o-mini"

Database Export

python view_database.py      

Then, run stats.ipynb, to generate data/data.parquet

Evaluate Fluency

python hash.py

Supported Model Providers

  • openrouter (default)
  • gemini
  • stepfun
  • ollama

File Structure

  • run.py: Main program
  • config.py: Configuration management
  • utils/LLM.py: LLM interaction and processing
  • utils/database.py: Database management
  • utils/prompts.json: Prompt templates

Bibtex

@article{ruan2024liveideabench,
title={LiveIdeaBench: Evaluating LLMs' Scientific Creativity and Idea Generation with Minimal Context},
author={Kai Ruan and Xuan Wang and Jixiang Hong and Peng Wang and Yang Liu and Hao Sun},
journal={arXiv preprint arXiv:2412.17596},
year={2024}
}