Recent works have shown the incredible promise of using neural networks for the task of program synthesis from input-output examples. In this paper, we propose using Transformer-based architectures as a baseline for the program synthesis task on the Karel dataset. Specifically, we leverage DistillGPT as our decoder model to perform program synthesis for the Karel DSL. We show that changing the model architecture from an LSTM to a transformer based architecture, we are able to significantly improve on supervised learning approaches obtaining a top-1 generalization accuracy of 82.4%. Further, applying execution guided search on the output beams of the model increases the accuracy of our approach to 89.64%.