We propose a framework for learning single-stage optimized policies in a way that is fair with respect to membership in a sensitive group. Unlike the fair prediction setting, we attempt to design an unseen, future policy that can reduce disparities while maximizing utility. Unlike other fair policy works, we focus on a pragmatic view: we ask what is the best we can do with an action space available to us, and without relying on counterfactuals of the protected attributes, or planning on an idealized 'fair world'. Specifically, we examine two scenarios: when it is not possible or necessary to reduce historical disparities among groups, but we can maintain or avoid increasing them by introduction of a new policy; and when it is possible to reduce disparities while considering maximizing outcomes. We formulate controlling disparities in these two scenarios as avoiding difference of individual effects between a new and an old policy and as smoothing out differences of expected outcomes across a space of sensitive attributes. We propose two policy design methods that can leverage observational data using causal assumptions and illustrate their uses on experiments with semi-synthetic models.