Manual Simulation of Markov Chain in R

Manual simulation of Markov Chain in R

Let

alpha <- c(1, 1) / 2
mat <- matrix(c(1 / 2, 0, 1 / 2, 1), nrow = 2, ncol = 2) # Different than yours

be the initial distribution and the transition matrix. Your func2 only finds n-th step distribution, which isn't needed, and doesn't simulate anything. Instead we may use

chainSim <- function(alpha, mat, n) {
  out <- numeric(n)
  out[1] <- sample(1:2, 1, prob = alpha)
  for(i in 2:n)
    out[i] <- sample(1:2, 1, prob = mat[out[i - 1], ])
  out
}

where out[1] is generated using only the initial distribution and then for subsequent terms we use the transition matrix.

Then we have

set.seed(1)
# Doing once
chainSim(alpha, mat, 1 + 5)
# [1] 2 2 2 2 2 2

so that the chain initiated at 2 and got stuck there due to the specified transition probabilities.

Doing it for 100 times we have

# Doing 100 times
sim <- replicate(chainSim(alpha, mat, 1 + 5), n = 100)
rowMeans(sim - 1)
# [1] 0.52 0.78 0.87 0.94 0.99 1.00

where the last line shows how often we ended up in state 2 rather than 1. That gives one (out of many) reasons why 100 repetitions are more informative: we got stuck at state 2 doing just a single simulation, while repeating it for 100 times we explored more possible paths.

Then the conditional probability can be found with

mean(sim[2, sim[1, ] == 1] == 1)
# [1] 0.4583333

while the true probability is 0.5 (given by the upper left entry of the transition matrix).

Manual simulation of Markov Chain in R (2)

There are two issues:

Transposed Matrix

If you check the matrix you have input, it is the transpose of what you wanted:

> mat
     [,1] [,2]
[1,]  0.5    0
[2,]  0.5    1

So, change that.

States are not Chained

In the step function, the returned state is not used to initiate the subsequent state. Instead, X0 just keeps getting passed in repeatedly:

for (i in 1:n1) 
{
    X <- nextX(X0, mat1)
    vec[i] <- X
}

Honestly, you don't need X0 at all. Just change all the X0s in the step function to X and it should work.

Manual simulation of Markov Chain in R (3)

I didn't inspect the rest of your code, but it seems that only prob has a mistake; you are mixing up rows with columns and instead it should be

prob <- function(simMat, fromStep, toStep, fromState, toState)
  mean(simMat[simMat[, fromStep + 1] == fromState, toStep + 1] == toState)

Then NaN still remains a valid possibility for the following reason. We are looking at a conditional probability P(X₁=1|X₀=1) which, by definition, is well defined only when P(X₀=1)>0. The same holds with sample estimates: if there are no cases where X₀=1, then the "denominator" in the mean inside of prob is zero. Thus, it cannot and should not be fixed (i.e., returning 0 in those cases would be wrong).

R library for discrete Markov chain simulation

A while back I wrote a set of functions for simulation and estimation of Discrete Markov Chain probability matrices: http://www.feferraz.net/files/lista/DTMC.R.

Relevant code for what you're asking:

simula <- function(trans,N) {
        transita <- function(char,trans) {
                sample(colnames(trans),1,prob=trans[char,])
        }

 sim <- character(N)
 sim[1] <- sample(colnames(trans),1)
 for (i in 2:N) {
  sim[i] <- transita(sim[i-1],trans)
 }

 sim
}

#example
#Obs: works for N >= 2 only. For higher order matrices just define an
#appropriate mattrans
mattrans <- matrix(c(0.97,0.03,0.01,0.99),ncol=2,byrow=TRUE)
colnames(mattrans) <- c('0','1')
row.names(mattrans) <- c('0','1')
instancia <- simula(mattrans,255) # simulates 255 steps in the process

Simulate Markov Chain using R

I am not clear what you are exactly looking for. If you are trying to obtain the transition matrix after 10 and 100 times, you can keep track of the evolution progress via Reduce, where option accumulate should be set to TRUE

P10 <- Reduce(`%*%`,replicate(10,P,simplify = FALSE),accumulate = TRUE)
P100 <- Reduce(`%*%`,replicate(100,P,simplify = FALSE),accumulate = TRUE)

markov chain simulation using R

The problem is your calculation of P1^k. In R, using ^ takes the power element by element, not through matrix multiplication. To calculate the matrix power of P1, you need to multiply it out, or use the %^% operator from the expm package. (There may be other implementations of this operation in other packages, too.)

So write your loop like this:

# Calculate probabilities for 50 steps.
P1tok <- diag(4)   # the identity matrix
for(k in 1:50){
  P1tok <- P1tok %*% P1     # multiply by another P1
  nsteps <- initState*P1tok
  one[k] <- nsteps[1,1]
  two[k] <- nsteps[1,2]
  three[k] <- nsteps[1,3]
  four[k] <- nsteps[1,4]
}

Assigning Unique Colors To Multiple Lines on a Graph

You could use a palette, e.g. built in rainbow. However 100 colors may not be very distinguishable.

clr <- rainbow(num.chains)  ## create `num.chains` colors
matplot(chain.states, type='l', lty=1, col=clr, ylim=c(0, 9), 
        ylab='state', xlab='time')
abline(h=1, lty=3)
abline(h=8, lty=3)

Markov Chain simulation, calculating limit distribution

Your function correctly stores a single Markov chain realization of length Nsim to x, but then prop1, ..., prop4 aren't really proportions of ones, ..., fours; they seem to be more related to the expected value in this whole chain. You also overestimate the number of fours, but @StéphaneLaurent's answer deals with that too.

Then, once fixed, your approach with a very large Nsim works because starting from, say, step 30 we are already close to the stationary distribution, and while the initial 30 values are "noisy", they become negligible with a large Nsim.

An alternative approach would be to focus on P^k for some large and fixed k, which should be less efficient, but probably more intuitive. In particular, in that case we simulate many (for the law of large number to work) realizations of relatively long (as for something close to the limiting distribution to be at work) Markov chains. Also, the simulation can be written much more compactly. In particular, consider a generalization of my other answer:

chainSim <- function(alpha, mat, n) {
  out <- numeric(n)
  out[1] <- sample(1:ncol(mat), 1, prob = alpha)
  for(i in 2:n)
    out[i] <- sample(1:ncol(mat), 1, prob = mat[out[i - 1], ])
  out
}

Now let's simulate 30000 chains of length 30, again starting from state 1, as in your case. This gives (see also here)

set.seed(1)
k <- 30
n <- 30000
table(replicate(chainSim(c(1, 0, 0, 0), M, k), n = n)[k, ]) / n
#         1         2         3         4 
# 0.1557333 0.3442333 0.3490333 0.1510000

where

M
#       [,1]  [,2]  [,3]  [,4]
# [1,] 0.180 0.274 0.426 0.120
# [2,] 0.171 0.368 0.274 0.188
# [3,] 0.161 0.339 0.375 0.125
# [4,] 0.079 0.355 0.384 0.182

with

M <- structure(c(0.18, 0.171, 0.161, 0.079, 0.274, 0.368, 0.339, 0.355, 
                 0.426, 0.274, 0.375, 0.384, 0.12, 0.188, 0.125, 0.182), .Dim = c(4L, 4L))

In this way we approximate the stationary distribution using n observations of a state in the k-th step.