Named Entity Recognition with LLMs and LRMs

🎓 Master Thesis Project

This repository contains a Named Entity Recognition (NER) framework that explores the capabilities of Large Language Models (LLMs) and Large Reasoning Models (lRMs) for extracting entities from medical texts. The project features both an interactive demo using Together AI and experimental pipelines using Ollama for research purposes.

🚀 Features

Interactive web demo for real-time NER extraction
Support for multiple languages (English, Spanish, Italian, more can be added easily)
Modular architecture for easy dataset and model integration
Comprehensive evaluation pipeline for NER experiments
Support for various medical datasets (MultiCardioNER, PharmaCoNER)

💻 Setup

Prerequisites

Python 3.12+
Together AI API key (for demo only)
Ollama (for experiments only)

Installation

# Clone repository
git clone https://github.com/alexfdez1010/ner-llm
cd ner-llm

# Install dependencies
pip install -r requirements.txt

Configuration

For the demo (app.py), set your Together AI API key:
- Change the name of the file .streamlit/secrets.toml.example to .streamlit/secrets.toml file and set your API key:
```
TOGETHER_API_KEY = "your-api-key"
```

For experiments (main.py), install Ollama:

# Install Ollama (macOS/Linux)
curl https://ollama.ai/install.sh | sh

# Start Ollama service
ollama serve

🎯 Usage

Interactive Demo

The demo uses Together AI's models for real-time NER:

streamlit run app.py

Also, you can check the demo here. Note: the LLM used in the demo has a rate limit of 6 requests per minute as it is a free endpoint.

Running Experiments

The experimental pipeline uses Ollama models:

python main.py --model "deepseek-r1:7b" --dataset "multicardioner_track1"

Available models:

deepseek-r1 (7B, 8B, 14B, 32B)
phi3.5 (3.6B)
granite3.1-dense (8B)
falcon3 (10B)
llama3.2-vision (11B)
phi4 (14B)
qwen2.5 (32B)

Supported datasets:

MultiCardioNER Track 1
PharmaCoNER
MultiCardioNER Track 2 (English, Spanish, Italian)

📁 Project Structure

ner-llm/
├── ai/                     # AI components
│   ├── extractor_ner.py   # NER extraction logic
│   ├── llm.py             # LLM integrations
│   └── prompts.py         # Prompt templates dataset/
├── datasets/               # Where datasets are stored
├── datasets_info/          # Dataset definitions
├── experiments/           # Shell scripts for experiments
├── graphs/                 # Graphs generated from the results
├── model/                  # Data models
├── tests/                  # Test suite (includes unit, integration, and e2e)
├── app.py                 # Interactive demo
└── main.py               # Experimental pipeline

🔧 Extending the Framework

Adding a New Dataset

Create a new dataset info class in datasets_info/:

from datasets_info.dataset_info_interface import DatasetInfo

class NewDatasetInfo(DatasetInfo):
    def load_dataset(self) -> Dataset:
        """Load the dataset.
        
        Returns:
            Dataset: The loaded dataset
        """

    def categories(self) -> List[Category]:
        """Get the categories of the dataset.
        
        Returns:
            List[Category]: List of categories in the dataset
        """
    
    def language(self) -> str:
        """Get the language of the dataset.
        
        Returns:
            str: Language code of the dataset
        """
    
    def example_prompt(self) -> str:
        """Get an example prompt for the dataset.
        
        Returns:
            str: Example prompt for the dataset
        """

Register in main.py:

DATASETS = {
    "new_dataset": ("datasets_info.new_dataset", "NewDataset"),
    # ...
}

Adding a New Model

For experiments, add new Ollama models to MODELS in main.py:

MODELS = [
    "your-new-model",
    # ...
]

📊 Results

Experiment results are saved to results.csv for analysis.

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Named Entity Recognition with LLMs and LRMs

🎓 Master Thesis Project

🚀 Features

💻 Setup

Prerequisites

Installation

Configuration

🎯 Usage

Interactive Demo

Running Experiments

📁 Project Structure

🔧 Extending the Framework

Adding a New Dataset

Adding a New Model

📊 Results

📄 License

About

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 57 Commits
.streamlit		.streamlit
.vscode		.vscode
ai		ai
datasets		datasets
datasets_info		datasets_info
experiments		experiments
graphs		graphs
model		model
tests		tests
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
app.py		app.py
dataset.py		dataset.py
main.py		main.py
pipeline.py		pipeline.py
requirements.txt		requirements.txt
results.csv		results.csv
statistics_datasets.py		statistics_datasets.py
utils.py		utils.py

License

alexfdez1010/ner-llm

Folders and files

Latest commit

History

Repository files navigation

Named Entity Recognition with LLMs and LRMs

🎓 Master Thesis Project

🚀 Features

💻 Setup

Prerequisites

Installation

Configuration

🎯 Usage

Interactive Demo

Running Experiments

📁 Project Structure

🔧 Extending the Framework

Adding a New Dataset

Adding a New Model

📊 Results

📄 License

About

Topics

Resources

License

Stars

Watchers

Forks

Languages