One concern of AutoML systems is how to discover the best pipeline configuration to solve a particular task in the shortest amount of time. Recent approaches tackle the problem using techniques based on learning a model that helps relate the configuration space and the objective being optimized. However, relying on such a model poses some difficulties. First, both pipelines and datasets have to be represented with meta-features. Second, there exists a strong dependence on the chosen model and its hyperparameters. In this paper, we present a simple yet effective model-free reinforcement learning approach based on an adaptation of the Monte Carlo tree search (MCTS) algorithm for trees and context-free grammars. We run experiments on the OpenML-CC18 benchmark suite and show superior performance compared to the state-of-the-art.