A local AI-powered tool that converts PDF documents into engaging audio's such as podcasts, using local LLMs and TTS models.
- PDF text extraction and processing
- Customizable podcast generation with different styles and lengths
- Support for various LLM providers (OpenAI, Groq, LMStudio, Ollama, Azure)
- Text-to-Speech conversion with voice selection
- Fully configurable pipeline
- Preference-based content focus
- Programmatic API for integration in other projects
- FastAPI server for web-based access
- Example podcast included for demonstration
- Python 3.12+
- Local LLM server (optional, for local inference)
- Local TTS server (optional, for local audio generation)
- At least 8GB RAM (16GB+ recommended for local models)
- 10GB+ free disk space
pip install local-notebooklm
- Clone the repository:
git clone https://github.com/Goekdeniz-Guelmez/Local-NotebookLM.git
cd Local-NotebookLM
- Create and activate a virtual environment (conda works too):
python -m venv venv
source venv/bin/activate # On Windows, use: venv\Scripts\activate
- Install the required packages:
pip install -r requirements.txt
- Follow one installation type (docker, docker-compose, uv) at https://github.com/remsky/Kokoro-FastAPI
- Test in your browser that http://localhost:8880/v1 return the json: {"detail":"Not Found"}
The repository includes an example podcast in examples/podcast.wav
to demonstrate the quality and format of the output. The models used are: gpt4o and Mini with tts-hs on Azure. You can listen to this example to get a sense of what Local-NotebookLM can produce before running it on your own PDFs.
You can use the default configuration or create a custom JSON config file with the following structure:
{
"Co-Host-Speaker-1-Voice": "af_sky+af_bella",
"Co-Host-Speaker-2-Voice": "af_echo",
"Co-Host-Speaker-3-Voice": "af_nova",
"Co-Host-Speaker-4-Voice": "af_shimmer",
"Host-Speaker-Voice": "af_alloy",
"Small-Text-Model": {
"provider": {
"name": "groq",
"key": "your-api-key"
},
"model": "llama-3.2-90b-vision-preview"
},
"Big-Text-Model": {
"provider": {
"name": "groq",
"key": "your-api-key"
},
"model": "llama-3.2-90b-vision-preview"
},
"Text-To-Speech-Model": {
"provider": {
"name": "custom",
"endpoint": "http://localhost:8880/v1",
"key": "not-needed"
},
"model": "kokoro",
"audio_format": "wav"
},
"Step1": {
"system": "",
"max_tokens": 1028,
"temperature": 0.7,
"chunk_size": 1000,
"max_chars": 100000
},
"Step2": {
"system": "",
"max_tokens": 8126,
"temperature": 1,
"chunk_token_limit": 2000,
"overlap_percent": 10
},
"Step3": {
"system": "",
"max_tokens": 8126,
"temperature": 1,
"chunk_token_limit": 2000,
"overlap_percent": 20
}
}
The following provider options are supported:
-
OpenAI: Use OpenAI's API
"provider": { "name": "openai", "key": "your-openai-api-key" }
-
Groq: Use Groq's API for faster inference
"provider": { "name": "groq", "key": "your-groq-api-key" }
-
Azure OpenAI: Use Azure's OpenAI service
"provider": { "name": "azure", "key": "your-azure-api-key", "endpoint": "your-azure-endpoint", "version": "api-version" }
-
LMStudio: Use a local LMStudio server
"provider": { "name": "lmstudio", "endpoint": "http://localhost:1234/v1", "key": "not-needed" }
-
Ollama: Use a local Ollama server
"provider": { "name": "ollama", "endpoint": "http://localhost:11434", "key": "not-needed" }
-
Google generative AI: Use Google's API
"provider": { "name": "google", "key": "your-google-genai-api-key" }
-
Anthropic: Use Anthropic's API
"provider": { "name": "anthropic", "key": "your-anthropic-api-key" }
-
Elevenlabs: Use Elevenlabs's API
"provider": { "name": "elevenlabs", "key": "your-elevenlabs-api-key" }
-
Custom: Use any OpenAI-compatible API
"provider": { "name": "custom", "endpoint": "your-custom-endpoint", "key": "your-api-key-or-not-needed" }
Run the script with the following command:
python -m local_notebooklm.start --pdf PATH_TO_PDF [options]
Option | Description | Default |
---|---|---|
--pdf |
Path to the PDF file (required) | - |
--config |
Path to custom config file | Uses base_config |
--format |
Output format type (summary, podcast, article, interview, panel-discussion, debate, narration, storytelling, explainer, lecture, tutorial, q-and-a, news-report, executive-brief, meeting, analysis) | podcast |
--length |
Content length (short, medium, long, very-long) | medium |
--style |
Content style (normal, casual, formal, technical, academic, friendly, gen-z, funny) | normal |
--preference |
Additional focus preferences or instructions | None |
--language |
Language the audio should be in | english |
--output-dir |
Directory to store output files | ./output |
Local-NotebookLM now supports both single-speaker and two-speaker formats:
Single-Speaker Formats:
- summary
- narration
- storytelling
- explainer
- lecture
- tutorial
- news-report
- executive-brief
- analysis
Two-Speaker Formats:
- podcast
- interview
- panel-discussion
- debate
- q-and-a
- meeting
Multi-Speaker Formats:
- panel-discussion (3, 4, or 5 speakers)
- debate (3, 4, or 5 speakers)
Basic usage:
python -m local_notebooklm.start --pdf documents/research_paper.pdf
Customized podcast:
python -m local_notebooklm.start --pdf documents/research_paper.pdf --format podcast --length long --style casual
With custom preferences:
python -m local_notebooklm.start --pdf documents/research_paper.pdf --preference "Focus on practical applications and real-world examples"
Using custom config:
python -m local_notebooklm.start --pdf documents/research_paper.pdf --config custom_config.json --output-dir ./my_podcast --language german
You can also use Local-NotebookLM programmatically in your Python code:
from local_notebooklm.processor import podcast_processor
success, result = podcast_processor(
pdf_path="documents/research_paper.pdf",
config_path="config.json",
format_type="interview",
length="long",
style="professional",
preference="Focus on the key technical aspects",
output_dir="./test_output",
language="english"
)
if success:
print(f"Successfully generated podcast: {result}")
else:
print(f"Failed to generate podcast: {result}")
Local-NotebookLM now includes a user-friendly Gradio web interface that makes it easy to use the tool without command line knowledge:
python -m local_notebooklm.web_ui
By default, the web UI runs locally on http://localhost:7860. You can access it from your browser.
The main interface of the Local-NotebookLM web UI
Option | Description | Default |
---|---|---|
--share |
Make the UI accessible over the network | False |
--port |
Specify a custom port | 7860 |
Basic local usage:
python -m local_notebooklm.web_ui
Share with others on your network:
python -m local_notebooklm.web_ui --share
Use a custom port:
python -m local_notebooklm.web_ui --port 8080
The web interface provides all the same options as the command line tool in an intuitive UI, making it easier for non-technical users to generate audio content from PDFs.
Start the FastAPI server to access the functionality via a web API:
python -m local_notebooklm.server
By default, the server runs on http://localhost:8000. You can access the API documentation at http://localhost:8000/docs.
- Extracts text from PDF documents
- Cleans and formats the content
- Removes irrelevant elements like page numbers and headers
- Handles LaTeX math expressions and special characters
- Splits content into manageable chunks for processing
- Generates an initial podcast script based on the extracted content
- Applies the specified style (casual, formal, technical, academic)
- Formats content according to the desired length (short, medium, long, very-long)
- Structures content for a conversational format
- Incorporates user-specified format type (summary, podcast, article, interview)
- Rewrites content specifically for better text-to-speech performance
- Creates a two-speaker conversation format
- Adds speech markers and natural conversation elements
- Optimizes for natural flow and engagement
- Incorporates user preferences for content focus
- Formats output as a list of speaker-text tuples
- Converts the optimized text to speech using the specified TTS model
- Applies different voices for each speaker
- Generates individual audio segments for each dialogue part
- Concatenates segments into a final audio file
- Maintains consistent audio quality and sample rate
flowchart TD
subgraph "Main Controller"
processor["podcast_processor()"]
end
subgraph "AI Services"
smallAI["Small Text Model Client"]
bigAI["Big Text Model Client"]
ttsAI["Text-to-Speech Model Client"]
end
subgraph "Step 1: PDF Processing"
s1["step1()"]
validate["validate_pdf()"]
extract["extract_text_from_pdf()"]
chunk1["create_word_bounded_chunks()"]
process["process_chunk()"]
end
subgraph "Step 2: Transcript Generation"
s2["step2()"]
read2["read_input_file()"]
gen2["generate_transcript()"]
chunk2["Chunking with Overlap"]
end
subgraph "Step 3: TTS Optimization"
s3["step3()"]
read3["read_pickle_file()"]
gen3["generate_rewritten_transcript()"]
genOverlap["generate_rewritten_transcript_with_overlap()"]
validate3["validate_transcript_format()"]
end
subgraph "Step 4: Audio Generation"
s4["step4()"]
load4["load_podcast_data()"]
genAudio["generate_speaker_audio()"]
concat["concatenate_audio_files()"]
end
%% Flow connections
processor --> s1
processor --> s2
processor --> s3
processor --> s4
processor -.-> smallAI
processor -.-> bigAI
processor -.-> ttsAI
%% Step 1 flow
s1 --> validate
validate --> extract
extract --> chunk1
chunk1 --> process
process -.-> smallAI
%% Step 2 flow
s2 --> read2
read2 --> gen2
gen2 --> chunk2
gen2 -.-> bigAI
%% Step 3 flow
s3 --> read3
read3 --> gen3
read3 --> genOverlap
gen3 --> validate3
genOverlap --> validate3
gen3 -.-> bigAI
genOverlap -.-> bigAI
%% Step 4 flow
s4 --> load4
load4 --> genAudio
genAudio --> concat
genAudio -.-> ttsAI
%% Data flow
pdf[("PDF File")] --> s1
s1 --> |"cleaned_text.txt"| file1[("Cleaned Text")]
file1 --> s2
s2 --> |"data.pkl"| file2[("Transcript")]
file2 --> s3
s3 --> |"podcast_ready_data.pkl"| file3[("Optimized Transcript")]
file3 --> s4
s4 --> |"podcast.wav"| fileAudio[("Final Audio")]
%% Styling
classDef controller fill:#f9d5e5,stroke:#333,stroke-width:2px
classDef ai fill:#eeeeee,stroke:#333,stroke-width:1px
classDef step fill:#d0e8f2,stroke:#333,stroke-width:1px
classDef data fill:#fcf6bd,stroke:#333,stroke-width:1px,stroke-dasharray: 5 5
class processor controller
class smallAI,bigAI,ttsAI ai
class s1,s2,s3,s4,validate,extract,chunk1,process,read2,gen2,chunk2,read3,gen3,genOverlap,validate3,load4,genAudio,concat step
class pdf,file1,file2,file3,fileAudio data
Local-NotebookLM now supports multiple languages. You can specify the language when using the programmatic API or through the command line.
Important Note: When using a non-English language, ensure that both your selected LLM and TTS models support the desired language. Language support varies significantly between different models and providers. For optimal results, verify that your chosen models have strong capabilities in your target language before processing.
The pipeline generates the following files:
step1/extracted_text.txt
: Raw text extracted from the PDFstep1/clean_extracted_text.txt
: Cleaned and processed textstep2/data.pkl
: Initial transcript datastep3/podcast_ready_data.pkl
: TTS-optimized conversation datastep4/segments/podcast_segment_*.wav
: Individual audio segmentsstep4/podcast.wav
: Final concatenated podcast audio file
-
PDF Extraction Fails
- Try a different PDF file
- Check if the PDF is password-protected
- Ensure the PDF contains extractable text (not just images)
-
API Connection Errors
- Verify your API keys are correct
- Check your internet connection
- Ensure the API endpoints are accessible
-
Out of Memory Errors
- Reduce the chunk size in the configuration
- Use a smaller model
- Close other memory-intensive applications
-
Audio Quality Issues
- Try different TTS voices
- Adjust the sample rate in the configuration
- Check if the TTS server is running correctly
If you encounter issues not covered here, please:
- Check the logs for detailed error messages
- Open an issue on the GitHub repository with details about your problem
- Include the error message and steps to reproduce the issue
- Python 3.12+
- PyPDF2
- tqdm
- numpy
- soundfile
- requests
- pathlib
- fastapi
- uvicorn
Full requirements are listed in requirements.txt
.
- This project uses various open-source libraries and models
- Special thanks to the developers of LLaMA, OpenAI, and other AI models that make this possible
For more information, visit the GitHub repository.
Best Gökdeniz Gülmez
The Local-NotebookLM software suite was developed by Gökdeniz Gülmez. If you find Local-NotebookLM useful in your research and wish to cite it, please use the following BibTex entry:
@software{
Local-NotebookLM,
author = {Gökdeniz Gülmez},
title = {{Local-NotebookLM}: A Local-NotebookLM to convert PDFs into Audio.},
url = {https://github.com/Goekdeniz-Guelmez/Local-NotebookLM},
version = {0.1.5},
year = {2025},
}