Multithread Map-Reduce Inverted Index

Overview

This project implements a parallel inverted index using the Map-Reduce paradigm inspired by Google's MapReduce framework, implemented with Pthreads. Given a set of input text files, the program extracts unique words, determines which files contain them, and outputs a structured index for efficient word lookups.

Performance

Running with 4 Mapper threads (M=4) and 4 Reducer threads (R=4):

Execution time: 0.39 seconds
Speedup: 3.03x

Features

Implements Map-Reduce for parallel word indexing
Uses Pthreads for parallelization
Outputs results in sorted files categorized alphabetically
Supports dynamic workload distribution for efficiency
Compatible with Docker for easy testing and deployment

How it works

Mapper Phase:
- Reads assigned files
- Extracts words and converts them to lowercase
- Outputs {word, file_ID} pairs
Reducer Phase:
- Aggregates {word, {file_IDs}} entries
- Groups words by their starting letter
- Sorts words by frequency across files
Output:
- Generates one file per alphabet letter (a.txt, b.txt, etc.)
- Words inside each file are sorted by frequency (descending)
- If multiple words have the same frequency, they are sorted alphabetically

Usage

Compile using Makefile:
```
make
```
Run the program:
```
./map_reduce <Map_num_threads> <Reduce_num_threads> <input_file>
```
- Map_num_threads: Number of threads for the Mapper phase
- Reduce_num_threads: Number of threads for the Reducer phase
- input_file: Path to the input file containing the list of text files to index

Example

To run the program with 2 Mapper threads, 3 Reducer threads, and the input file input.txt:
```
./map_reduce 2 3 input.txt
```
Where input.txt contains the following:
```
3
file1.txt
file2.txt
file3.txt
```

The program will output the inverted index in the following format:

a.txt
```
and:[2]
are:[1]
as:[1]
```

b.txt

blue:[1 2]
birds:[1]
bright:[1]
brightly:[3]
by:[2]

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
checker		checker
readme_imgs		readme_imgs
src		src
Dockerfile		Dockerfile
README.md		README.md
README_assignment.pdf		README_assignment.pdf
docker-compose.yml		docker-compose.yml
run_with_docker.sh		run_with_docker.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Multithread Map-Reduce Inverted Index

Overview

Performance

Features

How it works

Usage

Example

References

About

Releases

Packages

Languages

robertpaulp/map-reduce-inverted-index

Folders and files

Latest commit

History

Repository files navigation

Multithread Map-Reduce Inverted Index

Overview

Performance

Features

How it works

Usage

Example

References

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages