Questions AI Project

Project Description

The "Questions" project is part of Harvard's CS50 AI course. It focuses on developing an AI system capable of answering questions by performing document retrieval and passage retrieval from a text corpus. The AI utilizes term frequency-inverse document frequency (tf-idf) to identify relevant documents and passages in response to user queries. This project helps in understanding the implementation of natural language processing (NLP) and information retrieval techniques.

Project Goal

The primary goal of this project is to implement a question-answering system that efficiently identifies and returns the most relevant passages from a set of documents. The system aims to enhance understanding of NLP and information retrieval techniques by leveraging tf-idf scoring.

Implementation

The project is implemented in Python using the NLTK library. The main steps include:

Loading Files: Load all text files from a specified directory.
Tokenization: Convert documents into a list of words, filtering out punctuation and stopwords.
Computing IDFs: Calculate Inverse Document Frequency (IDF) values for each word in the corpus.
Query Processing: Tokenize and process user queries.
Document Scoring: Score documents based on tf-idf and identify top matches.
Sentence Extraction and Scoring: Extract sentences from top documents and score them based on query relevance.

How to Use

Setup Environment: Ensure you have Python and NLTK installed. Download necessary NLTK data.
```
pip install nltk
```
```
import nltk
nltk.download('punkt')
nltk.download('stopwords')
```
Prepare Corpus: Place your text files in a directory (e.g., corpus).

Run the Project: Call the main function with the corpus directory and queries.

path_to_corpus_directory = 'corpus'
queries = [
    "What are the types of supervised learning?",
    "How do neurons connect in a neural network?",
    "When was Python 3.0 released?"
]
main(path_to_corpus_directory, queries)

Example Queries

"What are the types of supervised learning?"
"How do neurons connect in a neural network?"
"When was Python 3.0 released?"

Output

For each query, the system prints the most relevant answer along with the source document.

More Information

For more details on the project, please visit the CS50 AI Project page.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
corpus		corpus
corpus2		corpus2
LICENSE.txt		LICENSE.txt
README.md		README.md
questions.ipynb		questions.ipynb
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Questions AI Project

Project Description

Project Goal

Implementation

How to Use

Example Queries

Output

More Information

About

Releases

Packages

Languages

License

SavinRazvan/questions

Folders and files

Latest commit

History

Repository files navigation

Questions AI Project

Project Description

Project Goal

Implementation

How to Use

Example Queries

Output

More Information

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages