Skip to content

A Python application that processes scientific papers in PDF format and generates detailed scientific analyses using AI models (OpenAI GPT or DeepSeek).

License

Notifications You must be signed in to change notification settings

bcankara/gptPaper

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Scientific PDF Analysis Tool

A Python application that processes scientific papers in PDF format and generates detailed scientific analyses using AI models (OpenAI GPT or DeepSeek).

Features

  • Text extraction from PDF files
  • Scientific analysis using AI models (OpenAI GPT-4/3.5 or DeepSeek)
  • Multiple AI provider support
  • Batch PDF processing
  • Automatic result reporting
  • Error handling and retry mechanism
  • Configurable model selection

Installation

  1. Clone the project:
git clone [repository-url]
  1. Install required packages:
pip install -r requirements.txt
  1. Create a .env file in the project root directory:
    • Copy .env.example to .env
    • Add your API keys (OpenAI and/or DeepSeek)
    • Select your preferred API provider and model

Example .env file:

# API Configuration
OPENAI_API_KEY=your-openai-api-key-here
DEEPSEEK_API_KEY=your-deepseek-api-key-here

# Model Selection
API_PROVIDER=openai     # Options: openai, deepseek
MODEL_NAME=gpt-4       # OpenAI options: gpt-4, gpt-3.5-turbo
                      # DeepSeek options: deepseek-chat, deepseek-coder

You can get your API keys from:

Usage

  1. Place your PDF files in the Papers directory:

    • The program will process all PDF files in this directory
    • Files are processed sequentially in alphabetical order
    • Each PDF is analyzed independently
  2. Run the program:

python gpt_paper.py
  1. Check results in the Result directory:
    • For each PDF, a corresponding .txt file is created
    • Output files are named as: [original_pdf_name]_output.txt
    • Results include detailed scientific analysis in a structured format

Important Notes

API Token Usage

  • Each PDF analysis consumes API tokens
  • Token consumption depends on:
    • Length of the PDF document
    • Selected model (GPT-4 costs more than GPT-3.5-turbo)
    • Selected API provider (pricing varies between OpenAI and DeepSeek)
    • Number of API calls made
  • Monitor your API usage at:

Processing Time

  • Processing time varies based on:
    • PDF size and complexity
    • Number of files being processed
    • API response time
    • Rate limiting and retry mechanisms
    • Selected API provider and model

Best Practices

  • Start with a small number of PDFs to test the system
  • Monitor the console output for processing status
  • Keep PDFs in English for best results
  • Ensure PDFs are text-searchable (not scanned images)
  • Check your API keys have sufficient credits before processing large batches
  • Compare results between different models and providers

Output Format

The analysis report generated for each PDF includes:

  1. Article Citation

    • Title
    • Authors
    • Journal
    • Volume and Issue
    • Publication Date
    • DOI
    • Publisher
  2. Research Purpose and Hypothesis

    • Research Topic
    • Hypothesis/Problem Statement
  3. Participants and Study Area

    • Participant Information
    • Study Area
  4. Methodology

    • Data Collection Method
    • Tools/Instruments Used
    • Data Analysis Method
  5. Results

    • Key Findings
    • Statistical Results
  6. Authors' Recommendations and Discussion

    • Research Success Status
    • Authors' Recommendations and Future Research
  7. Scientific Contribution and Strengths/Weaknesses

    • Strengths
    • Weaknesses
  8. Summary and Scientific Evaluation

Author

Burak Can KARA Email: burakcankara@gmail.com

License

This project is licensed under the MIT License - see the LICENSE file for details.

About

A Python application that processes scientific papers in PDF format and generates detailed scientific analyses using AI models (OpenAI GPT or DeepSeek).

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages