Timezone: »
Spotlight
Entropy Rate Estimation for Markov Chains with Large State Space
Yanjun Han · Jiantao Jiao · Chuan-Zheng Lee · Tsachy Weissman · Yihong Wu · Tiancheng Yu
Entropy estimation is one of the prototypical problems in distribution property testing. To consistently estimate the Shannon entropy of a distribution on $S$ elements with independent samples, the optimal sample complexity scales sublinearly with $S$ as $\Theta(\frac{S}{\log S})$ as shown by Valiant and Valiant \cite{Valiant--Valiant2011}. Extending the theory and algorithms for entropy estimation to dependent data, this paper considers the problem of estimating the entropy rate of a stationary reversible Markov chain with $S$ states from a sample path of $n$ observations. We show that
\begin{itemize}
\item Provided the Markov chain mixes not too slowly, \textit{i.e.}, the relaxation time is at most $O(\frac{S}{\ln^3 S})$, consistent estimation is achievable when $n \gg \frac{S^2}{\log S}$.
\item Provided the Markov chain has some slight dependency, \textit{i.e.}, the relaxation time is at least $1+\Omega(\frac{\ln^2 S}{\sqrt{S}})$, consistent estimation is impossible when $n \lesssim \frac{S^2}{\log S}$.
\end{itemize}
Under both assumptions, the optimal estimation accuracy is shown to be $\Theta(\frac{S^2}{n \log S})$. In comparison, the empirical entropy rate requires at least $\Omega(S^2)$ samples to be consistent, even when the Markov chain is memoryless. In addition to synthetic experiments, we also apply the estimators that achieve the optimal sample complexity to estimate the entropy rate of the English language in the Penn Treebank and the Google One Billion Words corpora, which provides a natural benchmark for language modeling and relates it directly to the widely used perplexity measure.
Author Information
Yanjun Han (Stanford University)
Jiantao Jiao (University of California, Berkeley)
Chuan-Zheng Lee (Stanford University)
Tsachy Weissman (Stanford University)
Yihong Wu (Yale University)
Tiancheng Yu (Tsinghua University)
Related Events (a corresponding poster, oral, or spotlight)
-
2018 Poster: Entropy Rate Estimation for Markov Chains with Large State Space »
Wed. Dec 5th through Thu the 6th Room Room 210 #82
More from the Same Authors
-
2021 Poster: Optimal prediction of Markov chains with and without spectral gap »
Yanjun Han · Soham Jana · Yihong Wu -
2021 Poster: On the Value of Interaction and Function Approximation in Imitation Learning »
Nived Rajaraman · Yanjun Han · Lin Yang · Jingbo Liu · Jiantao Jiao · Kannan Ramchandran -
2020 Poster: Minimax Optimal Nonparametric Estimation of Heterogeneous Treatment Effects »
Zijun Gao · Yanjun Han -
2020 Spotlight: Minimax Optimal Nonparametric Estimation of Heterogeneous Treatment Effects »
Zijun Gao · Yanjun Han -
2019 Workshop: Information Theory and Machine Learning »
Shengjia Zhao · Jiaming Song · Yanjun Han · Kristy Choi · Pratyusha Kalluri · Ben Poole · Alexandros Dimakis · Jiantao Jiao · Tsachy Weissman · Stefano Ermon -
2019 Poster: Batched Multi-armed Bandits Problem »
Zijun Gao · Yanjun Han · Zhimei Ren · Zhengqing Zhou -
2019 Oral: Batched Multi-armed Bandits Problem »
Zijun Gao · Yanjun Han · Zhimei Ren · Zhengqing Zhou -
2018 Poster: The Nearest Neighbor Information Estimator is Adaptively Near Minimax Rate-Optimal »
Jiantao Jiao · Weihao Gao · Yanjun Han -
2018 Spotlight: The Nearest Neighbor Information Estimator is Adaptively Near Minimax Rate-Optimal »
Jiantao Jiao · Weihao Gao · Yanjun Han -
2018 Poster: Data Amplification: A Unified and Competitive Approach to Property Estimation »
Yi Hao · Alon Orlitsky · Ananda Theertha Suresh · Yihong Wu