The jsdp Library

The example presented in section Stochastic Dynamic Programming in Java leveraged high level modelling constructs (functional interfaces, lambda expressions, MapReduce, etc) available in Java 8 to implement a forward recursion algorithm to tackle two specific problems.

jsdp builds upon these modelling constructs and provides an additional layer of abstraction to model problems of decision making under uncertainty via Stochastic Dynamic Programming.

The library features off-the-shelf algorithms (forward recursion, backward recursion, scenario reduction, etc) to tackle generic problems of decision making under uncertainty with univariate or multivariate state descriptors.

In what follows, we will survey key modelling constructs offered by jsdp and we will demonstrate its flexibility on a well-known problem from stochastic inventory control.

Key Library Classes

For a complete overview on the library classes please refer to the library javadoc.

The key abstractions required to model a stochastic dynamic program are provided in package jsdp.sdp. More specifically,

State
Action
ImmediateValueFunction
TransitionProbability
StateTransitionFunction and RandomOutcomeFunction

The abstraction for the functional equation is provided by class Recursion, which generalizes BackwardRecursion and ForwardRecursion.

In order to model a problem with jsdp one may define concrete implementations of abstract classes in package jsdp.sdp. An example of this approach is given in package jsdp.app.lotsizing. This is a cumbersome solution that should be adopted only for complex problems.

In most cases, it is sufficient to rely upon general purpose concrete implementations in package jsdp.sdp.impl. In what follows, we shall illustrate this latter solution. A generic skeleton for a stochastic dynamic program developed in jsdp is provided in package jsdp.app.skeleton, the following example extends this skeleton.

A Motivating Example: Stochastic Inventory Control

Consider a T-period inventory control problem. At the beginning of each period the firm should decide how many units of a product should be produced. If production takes place for x units, where x > 0, we incur a production cost c(x). This cost comprises both a fix and a variable component: c(x) = 0, if x = 0; c(x) = a+v*x, otherwise. The order is delivered immediately at the beginning of the period.

Demand in each period t is normally distributed with known mean and coefficient of variation. Demand is observed in each period only after production has occurred.

After meeting current period's demand holding cost of $h per unit is incurred for any item that is carried over from one period to the next. Unmet demand in any given period is backordered at cost $b per unit per period.

The initial inventory is known.

Stochastic Inventory Control as a Stochastic Dynamic Program

We formulate the stochastic inventory control problem as a Stochastic Dynamic Program.

There are T periods in the planning horizon
The state s in period t represents the initial inventory level at the beginning of period t
The action given state s in period t is the order quantity Q
The transition probability p[i,j,a] from state i to state j when action a is taken in state i is derived from the probability distribution of the demand d in period t
The immediate cost c(t,s,Q) incurred if action Q is taken in state s at the beginning of period t is a+E[h max(s+Q-d,0)+p min(s+Q-d,0)] if Q > 0, E[h max(s+Q-d,0)+p min(s+Q-d,0)] otherwise

The functional equation is

v(t,s)=min c(t,s,Q) + E[v(t+1,s+Q-d)]

with boundary condition v(T,s)=min c(T,s,Q). Let i be the initial inventory, the aim is to find v(1,i).

Given the functional equation, an optimal betting policy can be obtained via forward recursion or backward recursion.

Stochastic Inventory Control via jsdp

We next illustrate how to model the previously introduced stochastic inventory control problem in jsdp.

We create a class StochasticInventoryControl and build a main method as follows.

public static void main(String args[]){
   ...
}

First we define problem parameters.

/*******************************************************************
 * Problem parameters
 */
double fixedOrderingCost = 300; 
double proportionalOrderingCost = 0; 
double holdingCost = 1;
double penaltyCost = 10;
  
double[] meanDemand = {10,20,15,20,15,10};
double coefficientOfVariation = 0.2;

We next define the quantile at which probability distributions used will be truncated.

double truncationQuantile = 0.999;

We introduce probability distributions used in our model

// Random variables

Distribution[] distributions = IntStream.iterate(0, i -> i + 1)
                                        .limit(meanDemand.length)
                                        //.mapToObj(i -> new NormalDist(meanDemand[i],meanDemand[i]*coefficientOfVariation)) //Alternative formulation using the normal distribution.
                                        .mapToObj(i -> new PoissonDist(meanDemand[i]))
                                        .toArray(Distribution[]::new);

We create two arrays storing the lower bound and the upper bound of random variable supports.

double[] supportLB = IntStream.iterate(0, i -> i + 1)
                              .limit(meanDemand.length)
                              .mapToDouble(i -> distributions[i].inverseF(1-truncationQuantile))
                              .toArray();
double[] supportUB = IntStream.iterate(0, i -> i + 1)
                              .limit(meanDemand.length)
                              .mapToDouble(i -> distributions[i].inverseF(truncationQuantile))
                              .toArray();

We define a variable representing the initial inventory level.

double initialInventory = 0;

We then proceed to the model definition

/*******************************************************************
 * Model definition
 */

The first step is to characterize the state space

// State space
  
double stepSize = 1;       //Stepsize must be 1 for discrete distributions
double minState = -50;     //Inventory level lower bound in each period
double maxState = 150;     //Inventory level upper bound in each period

The following instruction bounds the state space

StateImpl.setStateBoundaries(stepSize, minState, maxState);

Next we introduce a functional interface that dynamically computes the ArrayList<Action> feasibleActions that stores feasible actions for a given state s

// Actions
  
Function<State, ArrayList<Action>> buildActionList = s -> {
   StateImpl state = (StateImpl) s;
   ArrayList<Action> feasibleActions = new ArrayList<Action>();
   for(double i = state.getInitialState(); 
       i <= StateImpl.getMaxState(); 
       i += StateImpl.getStepSize()){
      feasibleActions.add(new ActionImpl(state, i - state.getInitialState()));
   }
   return feasibleActions;
};

The next functional interface defines the idempotent action: an action that does not affect the state variable.

Function<State, Action> idempotentAction = s -> new ActionImpl(s, 0.0);

The following functional interfaces define the immediate value function

// Immediate Value Function
  
ImmediateValueFunction<State, Action, Double> immediateValueFunction = (initialState, action, finalState) -> {
   ActionImpl a = (ActionImpl)action;
   StateImpl fs = (StateImpl)finalState;
   double orderingCost = 
         a.getAction() > 0 ? (fixedOrderingCost + a.getAction()*proportionalOrderingCost) : 0;
   double holdingAndPenaltyCost =   
         holdingCost*Math.max(fs.getInitialState(),0) + penaltyCost*Math.max(-fs.getInitialState(),0);
   return orderingCost+holdingAndPenaltyCost;
};

and the random outcome function, which - given present state, future state, and action chosen - returns the associated random outcome. This is essentially equivalent to defining the dynamics - i.e. state transition function - of the system; however, adopting a random outcome function in place of a state transition function is desirable we aim to implement sampling.

// Random Outcome Function
  
RandomOutcomeFunction<State, Action, Double> randomOutcomeFunction = (initialState, action, finalState) -> {
   double realizedDemand = ((StateImpl)initialState).getInitialState() +
                           ((ActionImpl)action).getAction() -
                           ((StateImpl)finalState).getInitialState();
   return realizedDemand;
};

To summarize, in order to model a problem the key steps are the following: bound the state space, define the functional interfaces to compute the action list, the immediate value function, and the random outcome function for the system.

Once these modeling components are defined, we can move to the solution process.

/*******************************************************************
 * Solve
 */

The first step is to determine if we want to sample the state space or to adopt an exhaustive state space enumeration.

SamplingScheme.NONE carries out an exhaustive state space enumeration; in the following example we will adopt simple random sampling.

// Sampling scheme
  
SamplingScheme samplingScheme = SamplingScheme.SIMPLE_RANDOM_SAMPLING;
int maxSampleSize = 100;
double reductionFactorPerStage = 1;

the maximum sample size (maxSampleSize) refers to the sample size adopted for random variables at stage 1; if the reduction factor per stage (reductionFactorPerStage) is 1, this sample size will be adopted also in subsequent stages, otherwise the sample size in subsequent stages will shrink according to the rule

 Math.ceil(maxSamples/Math.pow(reductionFactorPerStage, period));

where period denotes the stage for which we are computing the the sample size.

Finally, we proceed and apply backward recursion to solve the problem. The discount factor (discountFactor) captures value discounting from one period to the next. stateSpaceLowerBound and loadFactor are parameters utilised to initialise hashtables. In this specific instance, we use HashType.THASHMAP as hash table. Enum HashType contains other possible choices, including a hash table (MapDB) that provide memory or disk storage.

 // Value Function Processing Method: backward recursion

 double discountFactor = 1.0;
 int stateSpaceLowerBound = 10000000;
 float loadFactor = 0.8F;
 BackwardRecursionImpl recursion = new BackwardRecursionImpl(OptimisationDirection.MIN,
                                                              distributions,
                                                              supportLB,
                                                              supportUB,
                                                              immediateValueFunction,
                                                              randomOutcomeFunction,
                                                              buildActionList,
                                                              idempotentAction,
                                                              discountFactor,
                                                              samplingScheme,
                                                              maxSampleSize,
                                                              reductionFactorPerStage,
                                                              stateSpaceLowerBound,
                                                              loadFactor,
                                                              HashType.THASHMAP);

Backward recursion is invoked as follows.

System.out.println("--------------Backward recursion--------------");

recursion.runBackwardRecursionMonitoring();

System.out.println();

double ETC = recursion.getExpectedCost(initialInventory);
StateDescriptorImpl initialState = new StateDescriptorImpl(0, initialInventory);
double action = recursion.getOptimalAction(initialState).getAction();
long percent = recursion.getMonitoringInterfaceBackward().getPercentCPU();

System.out.println("Expected total cost (assuming an initial inventory level "+initialInventory+"): "+ETC);
System.out.println("Optimal initial action: "+action);
System.out.println("Time elapsed: "+recursion.getMonitoringInterfaceBackward().getTime());
System.out.println("Cpu usage: "+percent+"% ("+Runtime.getRuntime().availableProcessors()+" cores)");
System.out.println();

The complete code is available here.

After running the code the output obtained is

--------------Backward recursion--------------
Expected total cost (assuming an initial inventory level 0.0): 567.7537178866613
Optimal initial action: 91.0
Time elapsed: 10
Cpu usage: 168% (4 cores)

Note that the complete code includes additional elements related to charting and simulation of the optimal policy.

A modified version of the code that deals with the problem in which a capacity constraint on the order quantity is enforced on the order quantity in each period is available here.