# Introduction

In Mathematical Theory of Evidence Glenn Shafer talked about how Dempster’s rule of combination generalizes Bayesian conditioning. In this document we investigate numerically how a simple Bayesian model can be encoded into the language of belief function.

Recall the Bayes Rule of conditioning in simple terms:

$P(H|E) = \dfrac{P(H) \cdot P(E|H)} {P(E)}$ Let’s see how this is translated in the belief functions setup.

# 1. Simple Bayes Example

In particular, the Bayesian belief functions concentrates their masses on the singletons only, unlike more general basic mass assignment functions. For instance, in a frame $$\Theta=\{a,b,c\}$$, basic mass assignment $$m(\{a\})=0.2$$, $$m(\{b\})=0.3$$ and $$m(\{c\})=0.5$$ defines a Bayesian belief function.

In the Bayesian language, this is the prior distribution $$P(H)$$. Function bca is used to set the distribution of H.

## The prior distribution H
##   H specnb mass
## 1 a      1  0.2
## 2 b      2  0.3
## 3 c      3  0.5

Bayes_Rule.R The law of conditional probability is a special case of Dempster’s rule of combination that all the masses focus on the event is conditioned. For instance, basic mass assignment focuses all the masses on subset $$E =\{b,c\}$$. Hence, using function bca, we set $$m(\{b,c\})=1$$.

## Setting an Event E = {b,c} with mass = 1
##   Event specnb mass
## 1 b + c      4    1

Bayes_Rule.R

Now we set the computation of Bayes’s Theorem in motion.

In a first step, we use function dsrwon to combine our two basic mass assignments H and Event. The non-normalized Dempster Rule of combination gives a mass distribution H_Event composed of two parts:

1. the distribution of the product $$P(H) \cdot P(E|H)$$ on $$\Theta$$;
2. a mass allotted to the empty set $$m(\varnothing)$$.
## The combination of H and Event E
##   H_Event specnb mass
## 1       ø      1  0.2
## 2       b      3  0.3
## 3       c      4  0.5

Bayes_Rule.R It turns out that we can obtain the marginal $$P(E)$$ from $$m(\varnothing)$$: $P(E) = 1 - m(\varnothing)$.

Hence, $$P(E)$$ is nothing else than the normalization constant of Dempster’s rule of combination.

In our second step of computation we us function nzdsr, to apply the normalization constant to distribution H_Event, which gives the posterior distribution $$P(H|E)$$

## The posterior distribution P(H|E)
##   H_given_E specnb  mass
## 1         b      2 0.375
## 2         c      3 0.625

Bayes_Rule.R

Note that since H_given_E is defined only on singletons and the mass allocated to $$\Theta$$ is zero. Hence $$bel(\cdot) = P(\cdot) = Pl(\cdot)$$, as shown by the following table.

##         bel disbel unc  plau rplau
## a     0.000  1.000   0 0.000 0.000
## b     0.375  0.625   0 0.375 0.600
## c     0.625  0.375   0 0.625 1.667
## frame 1.000  0.000   0 1.000   Inf

Bayes_Rule.R

# 2. Example with two variables

In the first example, the conditioning event was a subset of the frame $$\Theta$$ of variable H. We now show the computation of Bayes’s rule of conditioning by Dempster’s Rule in the case of two variables.

Let’s say we have the variable H defined on $$\Theta = \{a, b, c\}$$ as before.

## The prior distribution
##   X specnb mass
## 1 a      1  0.2
## 2 b      2  0.3
## 3 c      3  0.5

Bayes_Rule.R let’s add a second variable E with three outcomes $$\Lambda =\{d, e, f\}$$ .

$$P(\{d|a\})=0.1$$, $$P(\{d|b\})=0.2$$ and $$P(\{d|c\})=0.7$$.

This distribution will be encoded in the product space $$\Theta \times \Lambda$$ by setting

$$m(\{a,d\}) = 0.1$$; $$m(\{b,d\}) = 0.2$$; $$m(\{c,d\}) = 0.7$$

We now do this using function bcaRel.

## Specify information on variables, description matrix and mass vector
## Identifying variables and frames
##      varnb size
## [1,]     1    3
## [2,]     4    3
## Note that variables numbers must be in increasing order
## The description matrix of the relation between X and E
##      a b c d e f
## [1,] 1 0 0 1 0 0
## [2,] 0 1 0 1 0 0
## [3,] 0 0 1 1 0 0
## [4,] 1 1 1 1 1 1
## Note Columns of matrix must follow variables ordering.
## Mass specifications
##      specnb mass
## [1,]      1  0.1
## [2,]      2  0.2
## [3,]      3  0.7
## [4,]      4  0.0
## The relation between Evidence E and X
##   rel_EX specnb mass
## 1    a d      1  0.1
## 2    b d      2  0.2
## 3    c d      3  0.7

Bayes_Rule.R

Now we combine Prior $$P(X)$$ with rel_EX. But first, we need to extent X to the space $$\Theta \times \Lambda$$.

## Prior X extended in product space of (X,E
##            X_xtnd specnb mass
## 1 a d + a e + a f      1  0.2
## 2 b d + b e + b f      2  0.3
## 3 c d + c e + c f      3  0.5

Bayes_Rule.R Combine X extended and E_X in the product space $$\Theta \times \Lambda$$.

## Mass distribution of the combination of X extended and E_X
##   comb_X_EX specnb mass
## 1         ø      1 0.57
## 2       a d      2 0.02
## 3       b d      3 0.06
## 4       c d      4 0.35

Bayes_Rule.R As we can see, we have

1. the distribution of the product $$P(H) \cdot P(E|H)$$ on $$\Theta \times \Lambda$$;

2. a mass allotted to the empty set $$m(\varnothing)$$, which is $$1 - P(E)$$.

Using function nzdsr, we apply the normalization constant to obtain the desired result. Then, using function elim, we obtain the marginal of X, which turns out to be $$P(X | E = d)$$

## The normalized mass distribution of the combination of X extended and E_X
##   norm_comb_X_EX specnb               mass
## 1            a d      1 0.0465116279069768
## 2            b d      2   0.13953488372093
## 3            c d      3  0.813953488372093
## The posterior distribution P(X|E) for (a,d), (b,d), (c,d), after eliminating variable E
##   dist_XgE specnb               mass
## 1        a      1 0.0465116279069768
## 2        b      2   0.13953488372093
## 3        c      3  0.813953488372093

Bayes_Rule.R