Skip to content

Commit

Permalink
Merge branch 'main' of github.com:artefactory/choice-learn-private
Browse files Browse the repository at this point in the history
  • Loading branch information
VincentAuriau committed Apr 12, 2024
2 parents 7a7a477 + 73f5843 commit 186e28b
Show file tree
Hide file tree
Showing 70 changed files with 10,222 additions and 8,042 deletions.
2 changes: 2 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -139,3 +139,5 @@ secrets/*

# Specific data
choice_learn/datasets/data/expedia.csv
choice_learn/datasets/cache/*
!choice_learn/datasets/cache/.gitkeep
77 changes: 35 additions & 42 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,8 +13,8 @@

<img src="docs/choice_learn_official_logo.png" width="256">

Choice-Learn is a Python package designed to help you build discrete choice models.
The package provides ready-to-use datasets and models from the litterature. It also provides a lower level use if you want to customize any model or create your own from scratch. In particular you will find efficient data handling to limit RAM usage and structure commons to any choice model.
Choice-Learn is a Python package designed to help you estimate discrete choice models and use them (e.g., assortment optimization plug-in).
The package provides ready-to-use datasets and models from the litterature. It also provides a lower level use if you wish to customize any model or create your own from scratch. In particular you will find efficient data handling to limit RAM usage and structure common to any choice model.

Choice-Learn uses NumPy and pandas as data backend engines and TensorFlow for models.

Expand All @@ -24,7 +24,6 @@ This repository contains a private version of the package.

- [choice-learn-private](#choice-learn-private)
- [Introduction - Discrete Choice Modelling](#introduction---discrete-choice-modelling)
- [Table of Contents](#table-of-contents)
- [What's in there ?](#whats-in-there)
- [Getting Started](#getting-started---fast-track)
- [Installation](#installation)
Expand All @@ -34,45 +33,46 @@ This repository contains a private version of the package.

## Introduction - Discrete Choice Modelling

Discrete choice models aim at explaining or predicting a choice from a set of alternatives. Well known use-cases include analyzing people choice of mean of transport or products purchases in stores.
Discrete choice models aim at explaining or predicting choices over a set of alternatives. Well known use-cases include analyzing people's choice of mean of transport or products purchases in stores.

If you are new to choice modelling, you can check this [resource](https://www.publichealth.columbia.edu/research/population-health-methods/discrete-choice-model-and-analysis). The different notebooks from the [Getting Started](#getting-started---fast-track) section can also help you understand choice modelling and more importantly help you for your usecase.

## What's in there ?

### Data
- Generic dataset handling with the ChoiceDataset class [[Example]](https://github.com/artefactory/choice-learn-private/blob/main/notebooks/choice_learn_introduction_data.ipynb)
- Generic dataset handling with the ChoiceDataset class [[Example]](notebooks/introduction/2_data_handling.ipynb)
- Ready-To-Use datasets:
- [SwissMetro](./choice_learn/datasets/data/swissmetro.csv.gz) [[2]](#citation)
- [ModeCanada](./choice_learn/datasets/data/ModeCanada.csv.gz) [[3]](#citation)
- The [Train](./choice_learn/datasets/data/train_data.csv.gz) [[5]](#citation)
- The [Heating](./choice_learn/datasets/data/heating_data.csv.gz) & [Electricity](./choice_learn/datasets/data/electricity.csv.gz) datasets from Kenneth Train described [here](https://rdrr.io/cran/mlogit/man/Electricity.html) and [here](https://rdrr.io/cran/mlogit/man/Heating.html)
- The [TaFeng](./choice_learn/datasets/data/ta_feng.csv.zip) dataset from [Kaggle](https://www.kaggle.com/datasets/chiranjivdas09/ta-feng-grocery-dataset)
- The IDCM-2013 [Expedia](./choice_learn/datasets/expedia.py) dataset from [Kaggle](https://www.kaggle.com/c/expedia-personalized-sort) [[6]](#citation)
- The ICDM-2013 [Expedia](./choice_learn/datasets/expedia.py) dataset from [Kaggle](https://www.kaggle.com/c/expedia-personalized-sort) [[6]](#citation)

### Models
### Model estimation
- Ready-to-use models:
- Conditional MultiNomialLogit [[4]](#citation)[[Example]](https://github.com/artefactory/choice-learn-private/blob/main/notebooks/choice_learn_introduction_clogit.ipynb)
- Latent Class MultiNomialLogit [[Example]](https://github.com/artefactory/choice-learn-private/blob/main/notebooks/latent_class_model.ipynb)
- RUMnet [[1]](#citation)[[Example]](https://github.com/artefactory/choice-learn-private/blob/main/notebooks/rumnet_example.ipynb)
- Conditional MultiNomialLogit [[4]](#citation)[[Example]](notebooks/introduction/3_model_clogit.ipynb)
- Latent Class MultiNomialLogit [[Example]](notebooks/models/latent_class_model.ipynb)
- RUMnet [[1]](#citation)[[Example]](notebooks/models/rumnet.ipynb)
- TasteNet [[7]](#citation)[[Example]](notebooks/models/tastenet.ipynb)
- (WIP) - Ready-to-use models to be implemented:
- Nested MultiNomialLogit
- [TasteNet](https://arxiv.org/abs/2002.00922)
- Nested Logit
- [SHOPPER](https://projecteuclid.org/journals/annals-of-applied-statistics/volume-14/issue-1/SHOPPER--A-probabilistic-model-of-consumer-choice-with-substitutes/10.1214/19-AOAS1265.full)
- Others ...
- Custom modelling is made easy by subclassing the ChoiceModel class [[Example]](https://github.com/artefactory/choice-learn-private/blob/main/notebooks/custom_model.ipynb)
- Custom modelling is made easy by subclassing the ChoiceModel class [[Example]](notebooks/introduction/4_model_customization.ipynb)

### Different tools
- Assortment optimization from model [[Example]](https://github.com/artefactory/choice-learn-private/blob/main/notebooks/assortment_example.ipynb)
### Auxiliary tools
- Assortment optimization algorithms [[Example]](notebooks/auxiliary_tools/assortment_example.ipynb) [[8]](#citation)
- (WIP) Standardization of evaluation protocols
- (WIP) Interfaces

## Getting Started - Fast Track
## Getting Started

You can find the following notebooks to help you getting started with the package:
- [Introduction to data management](notebooks/choice_learn_introduction_data.ipynb)
- [Introduction to modelling with the conditional logit model on ModeCanada dataset](notebooks/choice_learn_introduction_clogit.ipynb)
- [Introduction to custom modelling with the ModeCanada dataset](notebooks/custom_model.ipynb)
You can find the following [notebooks](notebooks/introduction/) to help you getting started with the package:
- [Generic and simple introduction](notebooks/introduction/1_introductive_example.ipynb)
- [Detailed explanations of data handling depending on the data format](notebooks/introduction/2_data_handling.ipynb)
- [A detailed example of conditional logit estimation](notebooks/introduction/3_model_clogit.ipynb)
- [Introduction to custom modelling and more complex parametrization](notebooks/introduction/4_model_customization.ipynb)

## Installation

Expand Down Expand Up @@ -109,51 +109,42 @@ pip install choice-learn
## Usage
```python
from choice_learn.data import ChoiceDataset
from choice_learn.models import ConditionalMNL, RUMnet
from choice_learn.models import ConditionalLogit, RUMnet

# Instantiation of a ChoiceDataset from a pandas.DataFrame
# Onl need to specify how the file is encoded:
dataset = ChoiceDataset.from_single_long_df(df=transport_df,
items_id_column="alt",
contexts_id_column="case",
choices_id_column="case",
choices_column="choice",
contexts_features_columns=["income"],
contexts_items_features_columns=["cost", "freq", "ovt", "ivt"],
shared_features_columns=["income"],
items_features_columns=["cost", "freq", "ovt", "ivt"],
choice_format="item_id")

# Initialization of the model
model = ConditionalMNL(optimizer="lbfgs")
model = ConditionalLogit()

# Creation of the different weights:


# add_coefficients adds one coefficient for each specified item_index
# intercept, and income are added for each item except the first one that needs to be zeroed
model.add_coefficients(coefficient_name="beta_inter",
feature_name="intercept",
model.add_coefficients(feature_name="intercept",
items_indexes=[1, 2, 3])
model.add_coefficients(coefficient_name="beta_income",
feature_name="income",
model.add_coefficients(feature_name="income",
items_indexes=[1, 2, 3])

# ivt is added for each item:
model.add_coefficients(coefficient_name="beta_ivt",
feature_name="ivt",
model.add_coefficients(feature_name="ivt",
items_indexes=[0, 1, 2, 3])

# shared_coefficient add one coefficient that is used for all items specified in the items_indexes:
# Here, cost, freq and ovt coefficients are shared between all items
model.add_shared_coefficient(coefficient_name="beta_cost",
feature_name="cost",
model.add_shared_coefficient(feature_name="cost",
items_indexes=[0, 1, 2, 3])
model.add_shared_coefficient(coefficient_name="beta_freq",
feature_name="freq",
model.add_shared_coefficient(feature_name="freq",
items_indexes=[0, 1, 2, 3])
model.add_shared_coefficient(coefficient_name="beta_ovt",
feature_name="ovt",
model.add_shared_coefficient(feature_name="ovt",
items_indexes=[0, 1, 2, 3])

history = model.fit(dataset, epochs=1000, get_report=True)
history = model.fit(dataset, get_report=True)
print("The average neg-loglikelihood is:", model.evaluate(dataset).numpy())
print(model.report)
```
Expand Down Expand Up @@ -183,7 +174,9 @@ The use of this software is under the MIT license, with no limitation of usage,
[3][Applications and Interpretation of Nested Logit Models of Intercity Mode Choice](https://trid.trb.org/view/385097), Forinash, C., V.; Koppelman, F., S. (1993)\
[4][The Demand for Local Telephone Service: A Fully Discrete Model of Residential Calling Patterns and Service Choices](https://www.jstor.org/stable/2555538), Train K., E.; McFadden, D., L.; Moshe, B. (1987)\
[5] [Estimation of Travel Choice Models with Randomly Distributed Values of Time](https://ideas.repec.org/p/fth/lavaen/9303.html), Ben-Akiva, M.; Bolduc, D.; Bradley, M. (1993)\
[6] [Personalize Expedia Hotel Searches - ICDM 2013](https://www.kaggle.com/c/expedia-personalized-sort), Ben Hamner, A.; Friedman, D.; SSA_Expedia. (2013)
[6] [Personalize Expedia Hotel Searches - ICDM 2013](https://www.kaggle.com/c/expedia-personalized-sort), Ben Hamner, A.; Friedman, D.; SSA_Expedia. (2013)\
[7] [A Neural-embedded Discrete Choice Model: Learning Taste Representation with Strengthened Interpretability](https://arxiv.org/abs/2002.00922), Han, Y.; Calara Oereuran F.; Ben-Akiva, M.; Zegras, C. (2020)\
[8] [A branch-and-cut algorithm for the latent-class logit assortment problem](https://www.sciencedirect.com/science/article/pii/S0166218X12001072), Méndez-Díaz, I.; Miranda-Bront, J. J.; Vulcano, G.; Zabala, P. (2014)

### Code and Repositories
- [1][RUMnet](https://github.com/antoinedesir/rumnet)
Expand Down
Loading

0 comments on commit 186e28b

Please sign in to comment.