# Variable-order Markov model

﻿
Variable-order Markov model

Variable-order Markov (VOM) models are an important class of models that extend the well known Markov chain models. In contrast to the Markov chain models, where each random variable in a sequence with a Markov property depends on a fixed number of random variables, in VOM models this number of conditioning random variables may vary based on the specific observed realization.

This realization sequence is often called the "context"; therefore the VOM models are also called "context trees" cite journal|last = Rissanen|first = J.|title = A Universal Data Compression System|journal = IEEE Transactions on Information Theory|volume = 29|issue = 5|date = Sep 1983|pages = 656–664|url = http://ieeexplore.ieee.org/xpl/freeabs_all.jsp?isnumber=22734&arnumber=1056741|doi = 10.1109/TIT.1983.1056741] . The flexibility in the number of conditioning random variables turns out to be of real advantage for many applications, such as statistical analysis, classification and prediction.cite journal|last = Shmilovici|first = A.|coauthors = Ben-Gal, I.|title = Using a VOM Model for Reconstructing Potential Coding Regions in EST Sequences|journal = Computational Statistics|volume = 22|issue = 1|date = 2007|pages = 49–69|url=http://www.springerlink.com/content/a447865604519210/|doi = 10.1007/s00180-007-0021-8] cite journal|last = Begleiter|first = R.|coauthors = El-Yaniv, R. and Yona, G.|title = On Prediction Using Variable Order Markov Models|journal = Journal of Artificial Intelligence Research|volume = 22|date = 2004|pages = 385–421|url = http://www.jair.org/media/1491/live-1491-2335-jair.pdf] cite journal|last = Ben-Gal|first = I.|coauthors = Morag, G. and Shmilovici, A.|title = CSPC: A Monitoring Procedure for State Dependent Processes|journal = Technometrics|volume = 45|issue = 4|date = 2003|pages = 293–311|url = http://www.eng.tau.ac.il/~bengal/Technometrics_final.pdf|doi = 10.1198/004017003000000122]

Example

Consider for example a sequence of random variables, each of which takes a value from the ternary alphabet {"a", "b", "c"}. Specifically, consider the string "aaabcaaabcaaabcaaabc...aaabc" constructed from infinite concatenations of the sub-string "aaabc".

The VOM model of maximal order 2 can approximate the above string using "only" the following five conditional probability components: {Pr("a"|"aa") = 0.5, Pr("b"|"aa") = 0.5, Pr("c"|"b") = 1.0, Pr("a"|"c")= 1.0, Pr("a"|"ca")= 1.0}.

In this example, Pr("c"|"ab") = Pr("c"|"b") = 1.0; therefore, the shorter context "b" is sufficient to determine the next character. Similarly, the VOM model of maximal order 3 can generate the string exactly using only five conditional probability components, which are all equal to 1.0. To construct the Markov chain of order 1 for the next character in that string, one must estimate the following 9 conditional probability components: {Pr("a"|"a"), Pr("a"|"b"), Pr("a"|"c"), Pr("b"|"a"), Pr("b"|"b"), Pr("b"|"c"), Pr("c"|"a"), Pr("c"|"b"), Pr("c"|"c")}. To construct the Markov chain of order 2 for the next character, one must estimate 27 conditional probability components: {Pr("a"|"aa"), Pr("a"|"ab"), ..., Pr("c"|"cc")}. And to construct the Markov chain of order three for the next character one must estimate the following 81 conditional probability components: {Pr("a"|"aaa"), Pr("a"|"aab"), ..., Pr("c"|"ccc")}.

In practical settings there is seldom sufficient data to accurately estimate the exponentially increasing number of conditional probability components as the order of the Markov chain increases.

The variable-order Markov model assumes that in realistic settings, there are certain realizations of states (represented by contexts) in which some past states are independent from the future states; accordingly, "a great reduction in the number of model parameters can be achieved."

Definition

Let $A$ be a state space (finite alphabet) of size |A|.

Consider a sequence with the Markov property $x_1^\left\{n\right\}=x_1x_2dots x_n$ of $n$ realizations of random variables, where $x_iin A$ is the state (symbol) at position $i$ 1≤$i$$n$, and the concatenation of states $x_i$ and $x_\left\{i+1\right\}$ is denoted by $x_ix_\left\{i+1\right\}$.

Given a training set of observed states, $x_1^\left\{n\right\}$, the construction algorithm of the VOM models learns a model $P$ that provides a probability assignment for each state in the sequence given its past (previously observed symbols) or future states. Specifically, the learner generates a conditional probability distribution $P\left(x_i|s\right)$ for a symbol $x_i in A$ given a context $sin A^*$, where the * sign represents a sequence of states of any length, including the empty context.

VOM models attempt to estimate conditional distributions of the form $P\left(x_i|s\right)$ where the context length |$s$|≤$D$ varies depending on the available statistics. In contrast, conventional Markov models attempt to estimate these conditional distributions by assuming a fixed contexts' length |$s$|=$D$ and, hence, can be considered as special cases of the VOM models.

Effectively, for a given training sequence, the VOM models are found to obtain better model parameterization than the fixed-order Markov models that leads to a better variance-bias tradeoff of the learned models.

Application areas

Various efficient algorithms have been devised for estimating the parameters of the VOM model.

VOM models have been successfully applied to areas such as machine learning, information theory and bioinformatics, including specific applications such as coding and data compression, document compression, classification and identification of DNA and protein sequences, statistical process control, spam filteringcite journal|last = Bratko|first = A.|coauthors = Cormack, G. V., Filipic, B., Lynam, T. and Zupan, B.|title = Spam Filtering Using Statistical Data Compression Models|journal = Journal of Machine Learning Research|volume = 7|date = 2006|pages = 2673–2698|url = http://www.jmlr.org/papers/volume7/bratko06a/bratko06a.pdf] and others.

ee also

* Markov chain
* Examples of Markov chains
* Variable order Bayesian network
* Markov process
* Markov chain Monte Carlo
* Semi-Markov process
* Bioinformatics
* Machine learning
* Artificial intelligence

References

Wikimedia Foundation. 2010.

### Look at other dictionaries:

• Markov model — In probability theory, a Markov model is a stochastic model that assumes the Markov property. Generally, this assumption enables reasoning and computation with the model that would otherwise be intractable. Contents 1 Introduction 2 Markov chain… …   Wikipedia

• Variable-order Bayesian network — (VOBN) models provide an important extension of both the Bayesian network models and the variable order Markov models. VOBN models are used in machine learning in general and have shown great potential in bioinformatics applications.cite… …   Wikipedia

• Hidden Markov model — Probabilistic parameters of a hidden Markov model (example) x mdash; states y mdash; possible observations a mdash; state transition probabilities b mdash; output probabilitiesA hidden Markov model (HMM) is a statistical model in which the system …   Wikipedia

• Markov chain — A simple two state Markov chain. A Markov chain, named for Andrey Markov, is a mathematical system that undergoes transitions from one state to another, between a finite or countable number of possible states. It is a random process characterized …   Wikipedia

• Markov process — In probability theory and statistics, a Markov process, named after the Russian mathematician Andrey Markov, is a time varying random phenomenon for which a specific property (the Markov property) holds. In a common description, a stochastic… …   Wikipedia

• Model selection — is the task of selecting a statistical model from a set of candidate models, given data. In the simplest cases, a pre existing set of data is considered. However, the task can also involve the design of experiments such that the data collected is …   Wikipedia

• Graphical model — In probability theory, statistics, and machine learning, a graphical model (GM) is a graph that represents independencies among random variables by a graph in which each node is a random variable, and the missing edges between the nodes represent …   Wikipedia

• Continuous-time Markov process — In probability theory, a continuous time Markov process is a stochastic process { X(t) : t ≥ 0 } that satisfies the Markov property and takes values from a set called the state space; it is the continuous time version of a Markov chain. The… …   Wikipedia

• Semi-Markov process — A continuous time stochastic process is called a semi Markov process or Markov renewal process if the embedded jump chain (the discrete process registering what values the process takes) is a Markov chain, and where the holding times (time… …   Wikipedia

• Mixture model — See also: Mixture distribution In statistics, a mixture model is a probabilistic model for representing the presence of sub populations within an overall population, without requiring that an observed data set should identify the sub population… …   Wikipedia