Scientific PDF Analysis Tool

A Python application that processes scientific papers in PDF format and generates detailed scientific analyses using AI models (OpenAI GPT or DeepSeek).

Features

Text extraction from PDF files
Scientific analysis using AI models (OpenAI GPT-4/3.5 or DeepSeek)
Multiple AI provider support
Batch PDF processing
Automatic result reporting
Error handling and retry mechanism
Configurable model selection

Installation

Clone the project:

git clone [repository-url]

Install required packages:

pip install -r requirements.txt

Create a .env file in the project root directory:
- Copy .env.example to .env
- Add your API keys (OpenAI and/or DeepSeek)
- Select your preferred API provider and model

Example .env file:

# API Configuration
OPENAI_API_KEY=your-openai-api-key-here
DEEPSEEK_API_KEY=your-deepseek-api-key-here

# Model Selection
API_PROVIDER=openai     # Options: openai, deepseek
MODEL_NAME=gpt-4       # OpenAI options: gpt-4, gpt-3.5-turbo
                      # DeepSeek options: deepseek-chat, deepseek-coder

You can get your API keys from:

OpenAI API key: https://platform.openai.com/api-keys
DeepSeek API key: https://platform.deepseek.ai/api-keys

Usage

Place your PDF files in the Papers directory:
- The program will process all PDF files in this directory
- Files are processed sequentially in alphabetical order
- Each PDF is analyzed independently
Run the program:

python gpt_paper.py

Check results in the Result directory:
- For each PDF, a corresponding .txt file is created
- Output files are named as: [original_pdf_name]_output.txt
- Results include detailed scientific analysis in a structured format

Important Notes

API Token Usage

Each PDF analysis consumes API tokens
Token consumption depends on:
- Length of the PDF document
- Selected model (GPT-4 costs more than GPT-3.5-turbo)
- Selected API provider (pricing varies between OpenAI and DeepSeek)
- Number of API calls made
Monitor your API usage at:
- OpenAI: https://platform.openai.com/usage
- DeepSeek: https://platform.deepseek.ai/usage

Processing Time

Processing time varies based on:
- PDF size and complexity
- Number of files being processed
- API response time
- Rate limiting and retry mechanisms
- Selected API provider and model

Best Practices

Start with a small number of PDFs to test the system
Monitor the console output for processing status
Keep PDFs in English for best results
Ensure PDFs are text-searchable (not scanned images)
Check your API keys have sufficient credits before processing large batches
Compare results between different models and providers

Output Format

The analysis report generated for each PDF includes:

Article Citation
- Title
- Authors
- Journal
- Volume and Issue
- Publication Date
- DOI
- Publisher
Research Purpose and Hypothesis
- Research Topic
- Hypothesis/Problem Statement
Participants and Study Area
- Participant Information
- Study Area
Methodology
- Data Collection Method
- Tools/Instruments Used
- Data Analysis Method
Results
- Key Findings
- Statistical Results
Authors' Recommendations and Discussion
- Research Success Status
- Authors' Recommendations and Future Research
Scientific Contribution and Strengths/Weaknesses
- Strengths
- Weaknesses
Summary and Scientific Evaluation

Author

Burak Can KARA Email: burakcankara@gmail.com

License

This project is licensed under the MIT License - see the LICENSE file for details.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Scientific PDF Analysis Tool

Features

Installation

Usage

Important Notes

API Token Usage

Processing Time

Best Practices

Output Format

Author

License

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
Papers		Papers
Result		Result
.env.example		.env.example
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
gpt_paper.py		gpt_paper.py
requirements.txt		requirements.txt

License

bcankara/gptPaper

Folders and files

Latest commit

History

Repository files navigation

Scientific PDF Analysis Tool

Features

Installation

Usage

Important Notes

API Token Usage

Processing Time

Best Practices

Output Format

Author

License

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages