Merge branch 'main' of github.com:artefactory/choice-learn-private

artefactory · Apr 12, 2024 · 186e28b · 186e28b
2 parents 7a7a477 + 73f5843
commit 186e28b
Show file tree

Hide file tree

Showing 70 changed files with 10,222 additions and 8,042 deletions.
diff --git a/.gitignore b/.gitignore
@@ -139,3 +139,5 @@ secrets/*
 
 # Specific data
 choice_learn/datasets/data/expedia.csv
+choice_learn/datasets/cache/*
+!choice_learn/datasets/cache/.gitkeep
diff --git a/README.md b/README.md
@@ -13,8 +13,8 @@
 
 <img src="docs/choice_learn_official_logo.png" width="256">
 
-Choice-Learn is a Python package designed to help you build discrete choice models.
-The package provides ready-to-use datasets and models from the litterature. It also provides a lower level use if you want to customize any model or create your own from scratch. In particular you will find efficient data handling to limit RAM usage and structure commons to any choice model.
+Choice-Learn is a Python package designed to help you estimate discrete choice models and use them (e.g., assortment optimization plug-in).
+The package provides ready-to-use datasets and models from the litterature. It also provides a lower level use if you wish to customize any model or create your own from scratch. In particular you will find efficient data handling to limit RAM usage and structure common to any choice model.
 
 Choice-Learn uses NumPy and pandas as data backend engines and TensorFlow for models.
 
@@ -24,7 +24,6 @@ This repository contains a private version of the package.
 
 - [choice-learn-private](#choice-learn-private)
   - [Introduction - Discrete Choice Modelling](#introduction---discrete-choice-modelling)
-  - [Table of Contents](#table-of-contents)
   - [What's in there ?](#whats-in-there)
   - [Getting Started](#getting-started---fast-track)
   - [Installation](#installation)
@@ -34,45 +33,46 @@ This repository contains a private version of the package.
 
 ## Introduction - Discrete Choice Modelling
 
-Discrete choice models aim at explaining or predicting a choice from a set of alternatives. Well known use-cases include analyzing people choice of mean of transport or products purchases in stores.
+Discrete choice models aim at explaining or predicting choices over a set of alternatives. Well known use-cases include analyzing people's choice of mean of transport or products purchases in stores.
 
 If you are new to choice modelling, you can check this [resource](https://www.publichealth.columbia.edu/research/population-health-methods/discrete-choice-model-and-analysis). The different notebooks from the [Getting Started](#getting-started---fast-track) section can also help you understand choice modelling and more importantly help you for your usecase.
 
 ## What's in there ?
 
 ### Data
-- Generic dataset handling with the ChoiceDataset class [[Example]](https://github.com/artefactory/choice-learn-private/blob/main/notebooks/choice_learn_introduction_data.ipynb)
+- Generic dataset handling with the ChoiceDataset class [[Example]](notebooks/introduction/2_data_handling.ipynb)
 - Ready-To-Use datasets:
   - [SwissMetro](./choice_learn/datasets/data/swissmetro.csv.gz) [[2]](#citation)
   - [ModeCanada](./choice_learn/datasets/data/ModeCanada.csv.gz) [[3]](#citation)
   - The [Train](./choice_learn/datasets/data/train_data.csv.gz) [[5]](#citation)
   - The [Heating](./choice_learn/datasets/data/heating_data.csv.gz) & [Electricity](./choice_learn/datasets/data/electricity.csv.gz) datasets from Kenneth Train described [here](https://rdrr.io/cran/mlogit/man/Electricity.html) and [here](https://rdrr.io/cran/mlogit/man/Heating.html)
   - The [TaFeng](./choice_learn/datasets/data/ta_feng.csv.zip) dataset from [Kaggle](https://www.kaggle.com/datasets/chiranjivdas09/ta-feng-grocery-dataset)
-  - The IDCM-2013 [Expedia](./choice_learn/datasets/expedia.py) dataset from [Kaggle](https://www.kaggle.com/c/expedia-personalized-sort) [[6]](#citation)
+  - The ICDM-2013 [Expedia](./choice_learn/datasets/expedia.py) dataset from [Kaggle](https://www.kaggle.com/c/expedia-personalized-sort) [[6]](#citation)
 
-### Models
+### Model estimation
 - Ready-to-use models:
-  - Conditional MultiNomialLogit [[4]](#citation)[[Example]](https://github.com/artefactory/choice-learn-private/blob/main/notebooks/choice_learn_introduction_clogit.ipynb)
-  - Latent Class MultiNomialLogit [[Example]](https://github.com/artefactory/choice-learn-private/blob/main/notebooks/latent_class_model.ipynb)
-  - RUMnet [[1]](#citation)[[Example]](https://github.com/artefactory/choice-learn-private/blob/main/notebooks/rumnet_example.ipynb)
+  - Conditional MultiNomialLogit [[4]](#citation)[[Example]](notebooks/introduction/3_model_clogit.ipynb)
+  - Latent Class MultiNomialLogit [[Example]](notebooks/models/latent_class_model.ipynb)
+  - RUMnet [[1]](#citation)[[Example]](notebooks/models/rumnet.ipynb)
+  - TasteNet [[7]](#citation)[[Example]](notebooks/models/tastenet.ipynb)
 - (WIP) - Ready-to-use models to be implemented:
-  - Nested MultiNomialLogit
-  - [TasteNet](https://arxiv.org/abs/2002.00922)
+  - Nested Logit
   - [SHOPPER](https://projecteuclid.org/journals/annals-of-applied-statistics/volume-14/issue-1/SHOPPER--A-probabilistic-model-of-consumer-choice-with-substitutes/10.1214/19-AOAS1265.full)
   - Others ...
-- Custom modelling is made easy by subclassing the ChoiceModel class [[Example]](https://github.com/artefactory/choice-learn-private/blob/main/notebooks/custom_model.ipynb)
+- Custom modelling is made easy by subclassing the ChoiceModel class [[Example]](notebooks/introduction/4_model_customization.ipynb)
 
-### Different tools
-- Assortment optimization from model [[Example]](https://github.com/artefactory/choice-learn-private/blob/main/notebooks/assortment_example.ipynb)
+### Auxiliary tools
+- Assortment optimization algorithms [[Example]](notebooks/auxiliary_tools/assortment_example.ipynb) [[8]](#citation)
 - (WIP) Standardization of evaluation protocols
 - (WIP) Interfaces
 
-## Getting Started - Fast Track
+## Getting Started
 
-You can find the following notebooks to help you getting started with the package:
-- [Introduction to data management](notebooks/choice_learn_introduction_data.ipynb)
-- [Introduction to modelling with the conditional logit model on ModeCanada dataset](notebooks/choice_learn_introduction_clogit.ipynb)
-- [Introduction to custom modelling with the ModeCanada dataset](notebooks/custom_model.ipynb)
+You can find the following [notebooks](notebooks/introduction/) to help you getting started with the package:
+- [Generic and simple introduction](notebooks/introduction/1_introductive_example.ipynb)
+- [Detailed explanations of data handling depending on the data format](notebooks/introduction/2_data_handling.ipynb)
+- [A detailed example of conditional logit estimation](notebooks/introduction/3_model_clogit.ipynb)
+- [Introduction to custom modelling and more complex parametrization](notebooks/introduction/4_model_customization.ipynb)
 
 ## Installation
 
@@ -109,51 +109,42 @@ pip install choice-learn
 ## Usage
 ```python
 from choice_learn.data import ChoiceDataset
-from choice_learn.models import ConditionalMNL, RUMnet
+from choice_learn.models import ConditionalLogit, RUMnet
 
 # Instantiation of a ChoiceDataset from a pandas.DataFrame
 # Onl need to specify how the file is encoded:
 dataset = ChoiceDataset.from_single_long_df(df=transport_df,
                                             items_id_column="alt",
-                                            contexts_id_column="case",
+                                            choices_id_column="case",
                                             choices_column="choice",
-                                            contexts_features_columns=["income"],
-                                            contexts_items_features_columns=["cost", "freq", "ovt", "ivt"],
+                                            shared_features_columns=["income"],
+                                            items_features_columns=["cost", "freq", "ovt", "ivt"],
                                             choice_format="item_id")
 
 # Initialization of the model
-model = ConditionalMNL(optimizer="lbfgs")
+model = ConditionalLogit()
 
 # Creation of the different weights:
 
-
 # add_coefficients adds one coefficient for each specified item_index
 # intercept, and income are added for each item except the first one that needs to be zeroed
-model.add_coefficients(coefficient_name="beta_inter",
-                       feature_name="intercept",
+model.add_coefficients(feature_name="intercept",
                        items_indexes=[1, 2, 3])
-model.add_coefficients(coefficient_name="beta_income",
-                       feature_name="income",
+model.add_coefficients(feature_name="income",
                        items_indexes=[1, 2, 3])
-
-# ivt is added for each item:
-model.add_coefficients(coefficient_name="beta_ivt",
-                       feature_name="ivt",
+model.add_coefficients(feature_name="ivt",
                        items_indexes=[0, 1, 2, 3])
 
 # shared_coefficient add one coefficient that is used for all items specified in the items_indexes:
 # Here, cost, freq and ovt coefficients are shared between all items
-model.add_shared_coefficient(coefficient_name="beta_cost",
-                             feature_name="cost",
+model.add_shared_coefficient(feature_name="cost",
                              items_indexes=[0, 1, 2, 3])
-model.add_shared_coefficient(coefficient_name="beta_freq",
-                             feature_name="freq",
+model.add_shared_coefficient(feature_name="freq",
                              items_indexes=[0, 1, 2, 3])
-model.add_shared_coefficient(coefficient_name="beta_ovt",
-                             feature_name="ovt",
+model.add_shared_coefficient(feature_name="ovt",
                              items_indexes=[0, 1, 2, 3])
 
-history = model.fit(dataset, epochs=1000, get_report=True)
+history = model.fit(dataset, get_report=True)
 print("The average neg-loglikelihood is:", model.evaluate(dataset).numpy())
 print(model.report)
 ```
@@ -183,7 +174,9 @@ The use of this software is under the MIT license, with no limitation of usage,
 [3][Applications and Interpretation of Nested Logit Models of Intercity Mode Choice](https://trid.trb.org/view/385097), Forinash, C., V.; Koppelman, F., S. (1993)\
 [4][The Demand for Local Telephone Service: A Fully Discrete Model of Residential Calling Patterns and Service Choices](https://www.jstor.org/stable/2555538), Train K., E.; McFadden, D., L.; Moshe, B. (1987)\
 [5] [Estimation of Travel Choice Models with Randomly Distributed Values of Time](https://ideas.repec.org/p/fth/lavaen/9303.html), Ben-Akiva, M.; Bolduc, D.; Bradley, M. (1993)\
-[6] [Personalize Expedia Hotel Searches - ICDM 2013](https://www.kaggle.com/c/expedia-personalized-sort), Ben Hamner, A.; Friedman, D.; SSA_Expedia. (2013)
+[6] [Personalize Expedia Hotel Searches - ICDM 2013](https://www.kaggle.com/c/expedia-personalized-sort), Ben Hamner, A.; Friedman, D.; SSA_Expedia. (2013)\
+[7] [A Neural-embedded Discrete Choice Model: Learning Taste Representation with Strengthened Interpretability](https://arxiv.org/abs/2002.00922), Han, Y.; Calara Oereuran F.; Ben-Akiva, M.; Zegras, C. (2020)\
+[8] [A branch-and-cut algorithm for the latent-class logit assortment problem](https://www.sciencedirect.com/science/article/pii/S0166218X12001072), Méndez-Díaz, I.; Miranda-Bront, J. J.; Vulcano, G.; Zabala, P. (2014)
 
 ### Code and Repositories
 - [1][RUMnet](https://github.com/antoinedesir/rumnet)