TTDS Movie Search IR Project 2020

This is a group project for the course TTDS (Text Technologies for Data Science) at the University of Edinburgh. We received the "Best Project Award" for the term 2019-2020 and the webpage will be soon hosted in one of the University's servers.

Currently it can be accessed from this link: http://167.71.139.222

Summary

It is a webpage that finds quotes and quote details for popular movies and TV shows. Given a query, it returns the most relevant quotes, along with the corresponding movie details. The collection of documents contains 77,584,425 quotes from 218,000 movies and TV shows. It uses a multi-level inverted index occupying 21.7GB on disk. The ranking algorithm uses BM25 and logarithmic popularity weighting for phrase search. Instead of quotes, the website can also perform a second search for full movies using weighted TFIDF algorithm. It also contains advanced search features, such as filtering by year, actors and keywords, and it includes genre filtering and query completion.

The project consists of 4 basic parts:

data_collection
ir_eval: IR indexing and ranking
gui: gui build with React
api: api built with Flask

Read the README files under each project for more details.

Installation

You need to have Python 3 installed.

Create a local environment and install the requirements:

python3 -m venv .env
source .env/bin/activate
pip install -r requirements.txt
pip install -e .

Run the front-end

cd gui
npm start

If the folder node_modules doesn't exist under the gui folder, run npm install first and then npm start

Run the back-end

cd api
./run.sh

Testing

Add your tests to the tests/ folder. The file names should start with test_. Each test method should also start with test_. See tests/test_phrase_search.py for an example. Execute the following command to run tests:

python -m unittest discover -p test*.py

Name		Name	Last commit message	Last commit date
Latest commit History 171 Commits
api		api
data_collection		data_collection
db		db
gui		gui
ir_eval		ir_eval
query_completion		query_completion
tests		tests
.gitignore		.gitignore
README.md		README.md
TTDS_CW3_Report_Final.pdf		TTDS_CW3_Report_Final.pdf
package-lock.json		package-lock.json
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

TTDS Movie Search IR Project 2020

Summary

Installation

Run the front-end

Run the back-end

Testing

About

Releases

Packages

Contributors 6

Languages

marinapts/ttds_movie_search

Folders and files

Latest commit

History

Repository files navigation

TTDS Movie Search IR Project 2020

Summary

Installation

Run the front-end

Run the back-end

Testing

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 6

Languages

Packages