link | description | Loc |
---|---|---|
https://gittables.github.io/ | GitTables: a large-scale corpus of relational tables. | - |
https://github.com/facebookresearch/PyTorch-BigGraph | BigGraph: pre-trained embeddings on the full Wikidata graph. | /local/data/embedding_dataset/BigGraph |
https://research.yandex.com/datasets/biganns | DEEP1B: image beddings for similarity search | /local/data/pqdata/deep1b |
http://corpus-texmex.irisa.fr/ | BIGANN(SIFT1B): SIFT descriptors extracted from images. | /local/data/pqdata/sift |
https://research.google.com/youtube8m/ | YouTube-8M: labeled video dataset with audio-visual features. | /local/data/embedding_dataset/YT8M |
https://laion.ai/laion-5b-a-new-era-of-open-large-scale-multi-modal-datasets/ | LAION-5B: image-text multi-model dataset | /local/data/embedding_dataset/laion5b |