GitHub - Wuj94/ADM_HW4: Repository for the 4th homework of the course ADM @ Sapienza University of Rome

ADM_HW4

Repository for the 4th homework of the course ADM @ Sapienza University of Rome from group group #23 composed by Francisca Alliende, Giuseppe Calabrese and Francesco Russo

Incoming, a summary of the files of this repository. To access to a document just press the link in the name of the corresponding file.

Homework_4

Jupiter Notebook, with the code and coments of the entire homework

First Part: Does basic house information reflect house's description?

Modules:

scraper.py: functions related to the scraping process.
preprocessing.py: functions related to the preprocesing process.
matrixbuilder.py: functions that build the information and the description matrices.
clustering.py: functions of k-means++, Elbow Method and Jaccard Similarity.
wordcloudgenerator.py: wordcloud generator function.
mykmeans.py: homemade k-means algorithm.

Databases:

datasetindex.csv: database with all the announcements after the scrapping process.
datasetIndex_preprocessed.csv: database that contains the data from "datasetindex.csv", prepocessed.
datastIndex_infmatrix.csv: database with the informatrion matrix. Input for clusterization.
datastIndex_tfidf.csv: database, with the description matrix. Input for clusterization. Unfortunately not available due to its weight.

Second Part: Find the duplicates!

All the code and comments of this parts, they are contained in the main file Homework_4

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ADM_HW4

Homework_4

First Part: Does basic house information reflect house's description?

Modules:

Databases:

Second Part: Find the duplicates!

About

Releases

Packages

Contributors 3

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 79 Commits
.gitignore		.gitignore
Homework_4.ipynb		Homework_4.ipynb
README.md		README.md
clustering.py		clustering.py
datasetIndex.csv		datasetIndex.csv
datasetIndex_preprocessed.csv		datasetIndex_preprocessed.csv
datastIndex_infmatrix.csv		datastIndex_infmatrix.csv
matrixbuilder.py		matrixbuilder.py
mykmeans.py		mykmeans.py
preprocessing.py		preprocessing.py
scraper.py		scraper.py
wordcloudgenerator.py		wordcloudgenerator.py

Wuj94/ADM_HW4

Folders and files

Latest commit

History

Repository files navigation

ADM_HW4

Homework_4

First Part: Does basic house information reflect house's description?

Modules:

Databases:

Second Part: Find the duplicates!

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages