Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AttributeError: 'list' object has no attribute 'split' when running deep research #1279

Open
TineKolenik opened this issue Mar 19, 2025 · 2 comments

Comments

@TineKolenik
Copy link

TineKolenik commented Mar 19, 2025

When I try to use GPTR's deep research (PIP package, Version: 0.12.12; python 3.13), I get

File "..\gpt_researcher\skills\deep_research.py", line 17, in count_words return len(text.split())

This happens after already collecting, verifying and ranking top sources.

This is my config.json:

{
    "RETRIEVER": "duckduckgo, arxiv",
    "EMBEDDING": "openai:text-embedding-3-small",
    "FAST_LLM": "openai:gpt-4o-mini",
    "SMART_LLM": "openai:gpt-4o",
    "STRATEGIC_LLM": "openai:gpt-4o",
    "LANGUAGE": "english",
    "CURATE_SOURCES": true,
    "FAST_TOKEN_LIMIT": 2000,
    "SMART_TOKEN_LIMIT": 4000,
    "STRATEGIC_TOKEN_LIMIT": 4000,
    "BROWSE_CHUNK_MAX_LENGTH": 8192,
    "SUMMARY_TOKEN_LIMIT": 700,
    "TEMPERATURE": 0.55,
    "REPORT_FORMAT": "IEEE",
    "MAX_ITERATIONS": 3,
    "AGENT_ROLE": null,
    "MAX_SUBTOPICS": 5,
    "SCRAPER": "bs",
    "MAX_SCRAPER_WORKERS": 15,
    "DOC_PATH": null,
    "USER_AGENT": null,
    "MEMORY_BACKEND": "local",
    "deep_research_breadth": 4,
    "deep_research_depth": 2,
    "deep_research_concurrency": 4,
    "total_words": 2500
}

My code:

from gpt_researcher import GPTResearcher
from gpt_researcher.utils.enum import ReportType, Tone
import asyncio
import os

os.environ["OPENAI_API_KEY"] = MY_KEY

query = MY_QUERY

async def main():
    # Initialize researcher with deep research type
    researcher = GPTResearcher(
        query=query,
        report_type="deep",
        config_path="config_research_deep.json"
    )

    # Run research
    research_data = await researcher.conduct_research()

    # Generate report
    report = await researcher.write_report()
    print(report)


if __name__ == "__main__":
    asyncio.run(main())

I also tried a yaml, but that didn't work, despite https://docs.gptr.dev/docs/gpt-researcher/gptr/deep_research mentioning a yaml file.

BTW, normal research works fine.

@ElishaKay
Copy link
Collaborator

ElishaKay commented Mar 20, 2025

Welcome @TineKolenik

Can you try to minimize the values within your config.json and try again?

Alternatively, remove the config_path parameter from the GPTResearcher run - instead, create a minimalistic .env file in your root directory with:

OPENAI_API_KEY=
TAVILY_API_KEY=
DOC_PATH=./my-docs

Then, try running your script again and let us know if it fails - I'd like to try to get to the root cause

@TineKolenik
Copy link
Author

TineKolenik commented Mar 24, 2025

@ElishaKay thanks for the welcome and the quick help. Here are my findings:

Suggestions 1 - Config.json value minimization:

I reduced the config.json to:

{
    "EMBEDDING": "openai:text-embedding-3-small",
    "FAST_LLM": "openai:gpt-4o-mini",
    "SMART_LLM": "openai:gpt-4o",
    "STRATEGIC_LLM": "openai:gpt-4o",
    "LANGUAGE": "english",
    "CURATE_SOURCES": true,
    "deep_research_breadth": 4,
    "deep_research_depth": 2,
    "deep_research_concurrency": 4,
    "total_words": 2500
}

And I added a Tavily API key to use for web search. This reproduced the same error. As before, it does a bunch of research, calculates costs, ranks sources, etc., but in the end comes up with the same error.

Suggestion 2 - config_path removal and .env utilization:

This worked successfully, however, the result was very lacking and short (below 800 words, even though the default length is 2500). The culprit does seem to be either the config_path or one of the values still present above, and I feel that having a config is absolutely necessary to generate good results. Do you have any idea how to produce longer lengths without utilizing config or how to get it to work with config_path? This would also be useful for other stuff (citation style, if I wanted to change to another web search option ...).

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants