Skip to content

Latest commit

 

History

History
11 lines (9 loc) · 881 Bytes

GreatDatasets.md

File metadata and controls

11 lines (9 loc) · 881 Bytes

GreatDatasets

link description Loc
https://gittables.github.io/ GitTables: a large-scale corpus of relational tables. -
https://github.com/facebookresearch/PyTorch-BigGraph BigGraph: pre-trained embeddings on the full Wikidata graph. /local/data/embedding_dataset/BigGraph
https://research.yandex.com/datasets/biganns DEEP1B: image beddings for similarity search /local/data/pqdata/deep1b
http://corpus-texmex.irisa.fr/ BIGANN(SIFT1B): SIFT descriptors extracted from images. /local/data/pqdata/sift
https://research.google.com/youtube8m/ YouTube-8M: labeled video dataset with audio-visual features. /local/data/embedding_dataset/YT8M
https://laion.ai/laion-5b-a-new-era-of-open-large-scale-multi-modal-datasets/ LAION-5B: image-text multi-model dataset /local/data/embedding_dataset/laion5b