User modeling is of great importance in personalization services. Many existing methods treat users as sequences of interacted items to encode item transition patterns. Another line of research models users as graphs in which interacted items are modelled as nodes, and item-item relations are modelled as edges. The graph-based user modeling provides more flexibility to encode complex item relationships of different types (e.g. co-occurrence, similarity) but usually overlooks sequential patterns. Here, we introduce a novel user representation, Heterogeneous User Graph (HUG), which unifies sequence and graph-based user modeling to take advantage of both methods. An HUG is associated with two types of edges: sequential edges that preserve the sequential information and collaborative edges that store different item-item relationship. To learn users' latent representations from their HUGs, we propose a multi-head attention based architecture called Heterogeneous User Graph Transformer (HUGT). HUGT is developed on the basis of SASRec and can concurrently capture the sequential transition pattern and complex graph topology. We conduct experiments on four real world datasets from three different application domains. Experimental results show that (1) jointly modeling users as sequences and graphs with HUG provides better recommendation performance over sequence-only and graph-only user modeling; (2) HUGT is effective in learning user latent representations from HUGs; (3) HUGT outperforms the baselines by up to 10\% on datasets with long sequences and aligns with the state-of-the-art performance on datasets with short sequences.