🤖💡 LiveIdeaBench: Evaluating LLMs' Scientific Creativity and Idea Generation with Minimal Context

"It's not like finding a needle in a haystack, it is like creating new needles."

🏆 Leaderboard: http://liveideabench.com 💡

Dataset

Paper

🧠✨🎉 News (2025/1/27): Latest Dataset Update on Hugging Face!

We are excited to announce that the latest dataset, including supplementary tests for models like deepseek-R1, deepseek-V3, minimax-01, phi-4, and Opus, has been uploaded to Hugging Face! 🚀

Check it out here: https://huggingface.co/datasets/6cf/liveideabench-DLC-250127

LiveIdeaBench Evaluation Framework

Evaluation Instruction

Database Initialization

Run the Python script to initialize the database:

python -c "from utils.database import init_database; init_database()"

Configuring API Keys

Before running the program, you need to configure at least one API key:

Create an apikey file and write your OpenRouter API key:

echo "your-openrouter-api-key" > apikey

Alternatively, set environment variables:

export OPENROUTER_API_KEY="your-openrouter-api-key"
export STEP_API_KEY="your-step-api-key"
export GEMINI_API_KEYS="key1,key2,key3"

Running Examples

Generate and evaluate ideas using a specified model:

# Generate ideas using a specified model
python run.py --idea_model "openai/gpt-4o-mini"

# Use a specific provider
python run.py --idea_model "openai/gpt-4o-mini" --provider openrouter

# Use a single keyword:

python run.py --idea_model "openai/gpt-4o-mini" --keyword "relativity"
# Use multiple keywords:

python run.py --idea_model "openai/gpt-4o-mini" --keyword "relativity" "periodic table"
# Do not specify a keyword (use all keywords):

python run.py --idea_model "openai/gpt-4o-mini"

Database Export

python view_database.py

Then, run stats.ipynb, to generate data/data.parquet

Evaluate Fluency

python hash.py

Supported Model Providers

openrouter (default)
gemini
stepfun
ollama

File Structure

run.py: Main program
config.py: Configuration management
utils/LLM.py: LLM interaction and processing
utils/database.py: Database management
utils/prompts.json: Prompt templates

Bibtex

@article{ruan2024liveideabench,
title={LiveIdeaBench: Evaluating LLMs' Scientific Creativity and Idea Generation with Minimal Context},
author={Kai Ruan and Xuan Wang and Jixiang Hong and Peng Wang and Yang Liu and Hao Sun},
journal={arXiv preprint arXiv:2412.17596},
year={2024}
}

Name		Name	Last commit message	Last commit date
Latest commit x66ccff Update README.md Mar 25, 2025 042e536 · Mar 25, 2025 History 40 Commits
assets		assets
csvs		csvs
keywords_data		keywords_data
results		results
utils		utils
.gitignore		.gitignore
README.md		README.md
config.py		config.py
environment.yaml		environment.yaml
hash.py		hash.py
run.py		run.py
stats.ipynb		stats.ipynb
view_database.py		view_database.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🤖💡 LiveIdeaBench: Evaluating LLMs' Scientific Creativity and Idea Generation with Minimal Context

Dataset

Paper

🧠✨🎉 News (2025/1/27): Latest Dataset Update on Hugging Face!

LiveIdeaBench Evaluation Framework

Evaluation Instruction

Database Initialization

Configuring API Keys

Running Examples

Database Export

Evaluate Fluency

Supported Model Providers

File Structure

Bibtex

About

Releases

Packages

Languages

x66ccff/liveideabench

Folders and files

Latest commit

History

Repository files navigation

🤖💡 LiveIdeaBench: Evaluating LLMs' Scientific Creativity and Idea Generation with Minimal Context

Dataset

Paper

🧠✨🎉 News (2025/1/27): Latest Dataset Update on Hugging Face!

LiveIdeaBench Evaluation Framework

Evaluation Instruction

Database Initialization

Configuring API Keys

Running Examples

Database Export

Evaluate Fluency

Supported Model Providers

File Structure

Bibtex

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages