Approximate dynamic programming for weakly coupled Markov decision processes with perfect and imperfect information
Salemi Parizi, Mahshid
MetadataShow full item record
A broad range of optimization problems in applications such as healthcare operations, revenue management, telecommunications, high-performance computing, logistics and transportation, business analytics, and defense, have the following form. Heterogeneous service requests arrive dynamically and stochastically over slotted time. A request may require multiple resources to complete. The decision-maker may collect a reward on successfully completing a service request, and may also incur costs for rejecting requests or for delaying service. The decision-maker's goal is to choose how to dynamically allocate limited resources to various service requests so as to optimize a certain performance-metric. Despite the prevalence of these problems, a majority of existing research focuses only on their stylized models. While such stylized models are often insightful, several experts have commented in recent literature reviews that their applicability is limited in practice. On the other hand, more realistic models of such problems are computationally difficult to solve owing to the curse of dimensionality. The research objective of this dissertation is to build Markov decision process (MDP) models of four classes of dynamic resource allocation problems under uncertainty, and then to develop algorithms for their approximate solution. Specifically, most MDP models in this dissertation will possess the so-called weakly coupled structure. That is, the MDP is composed of several sub-MDPs; the reward is additively separable and the transition probabilities are multiplicatively separable over these sub-MDPs; and the sub-MDPs are joined only via linking constraints on the actions they choose. The dissertation proposes mathematical programming-based and simulation-based approximate dynamic programming methods for their solution. Performance of these methods is compared against one-another and against heuristic resource allocation policies. An outline of this dissertation is described below. Chapter 1 investigates a class of scheduling problems where dynamically and stochastically arriving appointment requests are either rejected or booked for future slots. A customer may cancel an appointment. A customer who does not cancel may fail to show up. The planner may overbook appointments to mitigate the detrimental effects of cancellations and no-shows. A customer needs multiple renewable resources. The system receives a reward for providing service; and incurs costs for rejecting requests, appointment delays, and overtime. Customers are heterogeneous in all problem parameters. The chapter provides a weakly coupled MDP formulation of these problems. Exact solution of this MDP is intractable. An approximate dynamic programming method rooted in Lagrangian relaxation, affine value function approximation, and constraint generation is applied to this weakly coupled MDP. This method is compared with a myopic scheduling heuristic on 1800 problem instances. These numerical experiments show that there was a statistically significant difference in the performance of the two methods in 77% of these instances. Of these statistically significant instances, the Lagrangian method outperformed the myopic method in 97% instances. Chapter 2 focuses on a class of non-preemptive scheduling problems, where a decision-maker stochastically and dynamically receives requests to work on heterogeneous projects over discrete time. The projects are comprised of precedence-constrained tasks that require multiple resources with limited availabilities. Incomplete projects are held in virtual queues with finite capacities. When a queue is full, an arriving project must be rejected. The projects differ in their stochastic arrival patterns; completion rewards; rejection, waiting and operating costs; activity-on-node networks and task durations; queue capacities; and resource requirements. The decision-maker's goal is to choose which tasks to start in each time-slot to maximize the infinite-horizon discounted expected profit. The chapter provides a weakly coupled MDP formulation of such dynamic resource-constrained project scheduling problems (DRCPSPs). Unfortunately, existing mathematical programming-based approximate dynamic programming techniques (similar to those in Chapter 1) are computationally tedious for DRCPSPs owing to their exceedingly large scale and complex combinatorial structure. Therefore, the chapter applies a simulation-based policy iteration method that uses least-squares fitting to tune the parameters of a value function approximation. The performance of this method is numerically compared against a myopic scheduling heuristic on 480 randomly generated problem instances. These numerical experiments show that the difference between the two methods statistically significant in about 60%of the instances. The approximate policy iteration method outperformed the myopic heuristic in 74% of these statistically significant instances. In Chapters 1 and 2, the decision-maker is assumed to know all parameters that describe the weakly coupled MDPs. Chapter 3 investigates an extension where the decision-maker only has imperfect information about the weakly coupled MDP. Rather than only focusing on weakly coupled MDPs that arise in specific applications as in Chapters 1 and 2, Chapter 3 works with general weakly coupled MDPs. Two different scenarios with imperfect information are studied. In the first case, the transition probabilities for each subproblem are unknown to the decision-maker. In particular, these transition probabilities are parameterized, and the decision-maker does not know the values of these parameters. The decision-maker begins with prior probabilistic beliefs about these parameters and updates these beliefs using Bayes' Theorem as the state evolution is observed. This yields a Bayes-adaptive weakly coupled MDP formulation whose exact solution is intractable. Computationally tractable approximate dynamic programing methods that combine semi-stochastic certainty equivalent control or Thompson sampling with Lagrangian relaxation are proposed. These ideas are applied to a class of dynamic stochastic resource allocation problems and numerical results are presented.In the second case, the decision-maker cannot observe the actual state of the system, but only receives a noisy signal about it. The decision-maker thus needs to probabilistically infer the actual state. This yields a partially observable weakly coupled MDP formulation whose exact solution is also intractable. Computationally tractable approximate dynamic programming methods rooted in semi-stochastic certainty equivalent control and Thompson sampling are again proposed. These ideas are applied to a restless multi-armed bandit problem and numerical results are presented. Chapter 4 investigates a class of sequential auction design problems under imperfect information. There, the resource corresponds to the seller's inventory on hand, which is to be allocated to dynamically and stochastically arriving buyers' requests (bids). In particular, the seller needs to decide lot-sizes in a sequential, multi-unit auction setting, where bidder demand and bid distributions are not known in their entirety. The chapter formulates a Bayes-adaptive MDP to study a profit maximization problem in this scenario. The number of bidders is Poisson distributed with a Gamma prior on its mean, and the bid distribution is categorical with a Dirichlet prior. The seller updates these beliefs using data collected over auctions while simultaneously making lot-sizing decisions until all inventory is depleted. Exact solution of this Bayes-adaptive MDP is intractable. The chapter proposes three approximation methods (semi-stochastic certainty equivalent control, knowledge gradient, and Thompson sampling) and compares them via numerical experiments.