diff --git a/dataset_preparation/amazon_products/readme.md b/dataset_preparation/amazon_products/readme.md
new file mode 100644
index 000000000..8e9a6e5a9
--- /dev/null
+++ b/dataset_preparation/amazon_products/readme.md
@@ -0,0 +1,5 @@
+This dataset contains around 2M vectors for amazon products. 
+The embeddings are generated using cohere-english-light model (https://huggingface.co/Cohere/Cohere-embed-english-light-v3.0)
+The base text used for generating embeddings is title + description of products
+The queries are modifications of randomly sampled products from the base: after sampling, we prompt GPT-3.5 to output a simple query phrase for which the product is a suitable result, and embed that phrase using the cohere model.
+We also choose brands from the appropriate category of the query and provide them as OR filters. The item price of the sampled item is used as indicative for a PRICE range filter.