Evaluating Reinforcement Learning Approaches for Supply Chain Optimization: A Comparative Study of Approximate SARSA and REINFORCE Algorithms
Abstract
This paper presents a comparative analysis of re-inforcement learning (RL) algorithms applied to supply chain optimization problems. Specifically, I implement and evaluate three RL agents: an approximate SARSA agent, a REINFORCE agent, and a baseline heuristic based on the (,Q)-Policy within a supply chain environment. The supply chain model under consideration involves a factory and multiple warehouses with the goal of determining optimal production and inventory levels to manage seasonal demand fluctuations. I address the curse of di-mensionality by utilizing function approximation techniques and policy search methods. My results demonstrate that while both approximate SARSA and REINFORCE algorithms outperform the baseline heuristic, REINFORCE with linear feature-mapping consistently yields superior performance, particularly in complex scenarios with multiple warehouses. The approximate SARSA approach, though effective in simpler environments, exhibits limitations in handling more intricate setups. The findings suggest that feature engineering plays a critical role in the effectiveness of RL algorithms for supply chain optimization, and future work may benefit from exploring deep learning techniques to capture complex, non-linear dependencies in supply chain dynamics.