Modern large-scale computing deployments consist of complex applications running over machine clusters. An important issue in these is the offering of elasticity, i.e., the dynamic allocation of resources to applications to meet fluctuating workload demands. Threshold based approaches are typically employed, yet they are difficult to calibrate and optimize. Approaches based on reinforcement learning (RL) have been proposed, but they require a large number of states in order to model complex application behavior. Methods that adaptively partition the state space have been proposed, but their partitioning criteria and strategies are sub-optimal. In this work we present MDP DT, a novel full-model based reinforcement learning algorithm for elastic resource management that employs adaptive state space partitioning. We propose two novel statistical criteria and three strategies and we experimentally
prove that they correctly decide both where and when to partition, outperforming existing approaches. We experimentally evaluate MDP DT in a real large scale cluster over variable not-encountered workloads and we show that it takes more informed decisions compared to static, model-free and thresholdapproaches, while requiring a minimal amount of training data. We experimentally show that this adaptation enabled MDP DT to optimize the achieved profit while being 40% cheaper than calibrated RL and threshold approaches.