π€π‘ LiveIdeaBench: Evaluating LLMs' Scientific Creativity and Idea Generation with Minimal Context
"It's not like finding a needle in a haystack, it is like creating new needles."
π Leaderboard: http://liveideabench.com π‘
We are excited to announce that the latest dataset, including supplementary tests for models like deepseek-R1, deepseek-V3, minimax-01, phi-4, and Opus, has been uploaded to Hugging Face! π
Check it out here: https://huggingface.co/datasets/6cf/liveideabench-DLC-250127
Run the Python script to initialize the database:
python -c "from utils.database import init_database; init_database()"
Before running the program, you need to configure at least one API key:
-
Create an
apikey
file and write your OpenRouter API key:echo "your-openrouter-api-key" > apikey
Alternatively, set environment variables:
export OPENROUTER_API_KEY="your-openrouter-api-key" export STEP_API_KEY="your-step-api-key" export GEMINI_API_KEYS="key1,key2,key3"
Generate and evaluate ideas using a specified model:
# Generate ideas using a specified model
python run.py --idea_model "openai/gpt-4o-mini"
# Use a specific provider
python run.py --idea_model "openai/gpt-4o-mini" --provider openrouter
# Use a single keyword:
python run.py --idea_model "openai/gpt-4o-mini" --keyword "relativity"
# Use multiple keywords:
python run.py --idea_model "openai/gpt-4o-mini" --keyword "relativity" "periodic table"
# Do not specify a keyword (use all keywords):
python run.py --idea_model "openai/gpt-4o-mini"
python view_database.py
Then, run stats.ipynb
, to generate data/data.parquet
python hash.py
- openrouter (default)
- gemini
- stepfun
- ollama
run.py
: Main programconfig.py
: Configuration managementutils/LLM.py
: LLM interaction and processingutils/database.py
: Database managementutils/prompts.json
: Prompt templates
@article{ruan2024liveideabench,
title={LiveIdeaBench: Evaluating LLMs' Scientific Creativity and Idea Generation with Minimal Context},
author={Kai Ruan and Xuan Wang and Jixiang Hong and Peng Wang and Yang Liu and Hao Sun},
journal={arXiv preprint arXiv:2412.17596},
year={2024}
}