This vignette summarizes the workflow for simulating a host and symbiont tree using the `sim_cophylo_bdp`

function.

This model has a number of parameters that are input into the `sim_cophylo_bdp`

function. Let us assume that we just want to simulate to a time of 2.0 units. We can then use `calculate_expected_leaves_sptree`

and `calculate_expected_leaves_locustree`

to figure out the average number of extant tips that will be on our host and symbiont trees respectively.

```
library(treeducken)
#> Loading required package: ape
set.seed(42)
calculate_expected_leaves_sptree(1.0, 0.5, 2.0)
exp_tips_host <-# on average we will get about 5 tips
# maybe we want something more like 8, so let's decrease the extinction rate
calculate_expected_leaves_sptree(1.0, 0.3, 2.0)
exp_tips_host <-# that looks about right, let's assume there are no host speciations that are
# not cospeciations
0.0
h_lambda <- 0.3
h_mu <- 1.0
c_lambda <-# now we need to worry about the symbiont tree since only cospeciations are
# occurring
# we can use the same math as for the locus tree simulator
calculate_expected_leaves_locustree(t = 2.0,
exp_tips_symb <-dup_rate = 0.2,
loss_rate = 0.1, 8)
# a little less than 10 symbiont tips on average
0.2
s_lambda <- 0.1
s_mu <-# let's assume that when symbionts speciate that they always inherit their
# ancestor's host repertoire
0.0 s_her <-
```

Now we have all the necessary parameters set with some idea of what they mean.

We will now use the main function, `sim_cophylo_bdp`

with the parameters we set in the previous section. Let’s simulate ten sets and then we can print out a summary for each set.

```
# simulate 10 paired trees with our set parameters.
sim_cophylo_bdp(hbr = h_lambda,
host_symb_sets <-hdr = h_mu,
sbr = s_lambda,
cosp_rate = c_lambda,
sdr = s_mu,
host_exp_rate = s_her,
time_to_sim = 2.0,
numbsim = 10)
print(host_symb_sets, details = TRUE)
#> 10 cophylogenetic sets.
#> Cophylogenetic set 1 :
#> Symbiont tree has 20 tips.
#> Host tree has 18 tips.
#> Cophylogenetic set 2 :
#> Symbiont tree has 16 tips.
#> Host tree has 14 tips.
#> Cophylogenetic set 3 :
#> Symbiont tree has 3 tips.
#> Host tree has 3 tips.
#> Cophylogenetic set 4 :
#> Symbiont tree has 15 tips.
#> Host tree has 12 tips.
#> Cophylogenetic set 5 :
#> Symbiont tree has 8 tips.
#> Host tree has 6 tips.
#> Cophylogenetic set 6 :
#> Symbiont tree has 2 tips.
#> Host tree has 2 tips.
#> Cophylogenetic set 7 :
#> Symbiont tree has 4 tips.
#> Host tree has 3 tips.
#> Cophylogenetic set 8 :
#> Symbiont tree has 14 tips.
#> Host tree has 11 tips.
#> Cophylogenetic set 9 :
#> Symbiont tree has 5 tips.
#> Host tree has 4 tips.
#> Cophylogenetic set 10 :
#> Symbiont tree has 4 tips.
#> Host tree has 4 tips.
```

Notice that some of these are not particularly interesting with only 2 host or symbiont tips. This is expected behavior, and occurs since there is a nonzero probability that no events occur in the time we simulated.

Let’s check out one set.

```
# # plot one or two of them
host_symb_sets[[1]]
tree_set_of_interest <-print(tree_set_of_interest)
#>
#> Host Tree:
#>
#>
#> Phylogenetic tree with 18 tips and 17 internal nodes.
#>
#> Tip labels:
#> H1, H2, X3, H4, H5, X6, ...
#>
#> Rooted; includes branch lengths.
#>
#>
#> Symbiont Tree:
#>
#>
#> Phylogenetic tree with 20 tips and 19 internal nodes.
#>
#> Tip labels:
#> X1, X2, S3, X4, X5, S6, ...
#>
#> Rooted; includes branch lengths.
#>
#> Association Matrix:
#>
#>
#> There are 15 rows (i.e. extant symbionts).
#> There are 14 cols (i.e. extant hosts).
```

`plot(tree_set_of_interest, col = "red", lty = "dotted")`

Notice that the output of `sim_cophylo_bdp`

is a list of `cophy`

objects. Each `cophy`

object is composed of 4 elements: `host_tree`

, `symb_tree`

, `association_mat`

, an `event_history`

. The `host_tree`

an `symb_tree`

are both of the `phylo`

object (from the ape package). The `association_mat`

which is a matrix with number of rows equal to the number of extant tips in the symbiont tree, and number of columns equal to the number of extant tips in the host tree. The `event_history`

is a data frame which shows the full history of the simulation. It contains which events occur on which symbiont and/or host and at which time. The following event code is used: “C”- cospeciation, “HG” - host gain (a host speciation), “HL” - host less (host extinction), “SG” - symbiont gain (symbiont speciation), “SL” - symbiont loss (symbiont extinction), “AG” - association gain, and “AL” - association loss. This bit is a mess at present so please let me know if you think of ways I could trim this down.

We can also perform the parafit test using the results.

```
tree_set_of_interest$host_tree
host_tree <- tree_set_of_interest$symb_tree
symb_tree <- tree_set_of_interest$association_mat
a <- parafit_stat(host_tr = host_tree, symb_tr = symb_tree, assoc_mat = a)
d <-parafit_test(host_tr = host_tree,
symb_tr = symb_tree,
assoc_mat = a,
D = d,
reps = 99)
#> [1] 0.01
```

We can also do a full summary of the results with the following function. This tabulates the number of each event in the `event_history`

data frame, and then performs the Parafit test of Legendre et al. 2004.

```
# of course that bit is maybe a little too verbose
# we may be interested in a lot of trees
# we can instead use cophylo_summary_stat
cophy_summary_stat(host_symb_sets)
host_symb_summary_df <-#> Calculating for replicate 1
#> Calculating for replicate 2
#> Calculating for replicate 3
#> Calculating for replicate 4
#> Calculating for replicate 5
#> Calculating for replicate 6
#> Warning in treeducken::parafit_stat(cophy_obj[[cophy_obj_indx]]$host_tree, : 'host_tr' must be a tree with more than 2 extant tips to calculate the parafit stat returning NA.
#> Calculating for replicate 7
#> Warning in treeducken::parafit_stat(cophy_obj[[cophy_obj_indx]]$host_tree, : 'host_tr' must be a tree with more than 2 extant tips to calculate the parafit stat returning NA.
#> Calculating for replicate 8
#> Calculating for replicate 9
#> Calculating for replicate 10
host_symb_summary_df#> Cospeciations Host_Speciations Host_Extinctions Symbiont_Speciations
#> 1 17 NA 4 2
#> 2 13 NA NA 2
#> 3 2 NA NA NA
#> 4 11 NA 3 3
#> 5 5 NA NA 2
#> 6 1 NA NA NA
#> 7 2 NA 1 1
#> 8 10 NA NA 3
#> 9 3 NA NA 1
#> 10 3 NA 1 NA
#> Symbiont_Extinctions Parafit_Stat Parafit_P-value
#> 1 5 1467.6208709 0.01
#> 2 1 844.7867082 0.01
#> 3 NA 0.2159458 0.06
#> 4 4 431.3475091 0.01
#> 5 NA 78.3393946 0.01
#> 6 NA NA NA
#> 7 1 NA NA
#> 8 1 650.1323319 0.01
#> 9 1 121.7864797 0.03
#> 10 1 3.7077994 0.04
```

To model gene tree-species tree discordance on these trees we use the three-tree model (Rasmussen and Kellis 2012). This model has three levels: species trees, locus trees, and gene trees. The species tree models speciation and extinction, the locus tree models gene birth, death and transfers, and the gene tree models incomplete lineage sorting. Transfers can occur randomly to any recipient throughout a tree or to species that are more closely related to the transfer donor species.

We can use either symbiont or host trees to simulate locus trees. First we will use the host tree:

```
sim_locustree_bdp(host_tree,
host_tree_locus_trees <-gbr = 0.4,
gdr = 0.2,
lgtr = 0.0,
num_loci = 10)
host_tree_locus_trees#> 10 phylogenetic trees
```

`plot(host_tree_locus_trees)`

Now we will use the symbiont tree:

```
sim_locustree_bdp(symb_tree,
symb_tree_locus_trees <-gbr = 0.2,
gdr = 0.1,
lgtr = 0.1,
num_loci = 10)
str(symb_tree_locus_trees[[1]])
#> List of 6
#> $ edge : num [1:38, 1:2] 21 21 23 23 25 25 27 27 26 26 ...
#> $ edge.length: num [1:38] 0.391 0.5276 0.9907 0.1069 0.0804 ...
#> $ Nnode : int 19
#> $ tip.label : chr [1:20] "S20_1" "S19_1" "X4_1" "S14_1" ...
#> $ root.edge : num 0.0557
#> $ node.label : chr [1:19] "" "" "" "" ...
#> - attr(*, "class")= chr "phylo"
```

Using these locus trees we can simulate under the multi-locus coalescent process (see Rasmussen and Kellis 2012). We can specify an effective population size and a generation time measured in generations per unit time. For now we will assume that the time of our trees is in units of millions of years. The default for generation time is then 1 time per year (1e-6). And we will randomly draw an effective population size from a Lognormal with mean 14 and standard deviation 0.4. Note that the only reason these values chosen is to mirror the validation test of Mallo et al. (2015) for the SimPhy program.

```
host_tree_locus_trees[[3]]
host_locus_tree <-# randomly choose an effective popsize
1e6
popsize <- sim_multilocus_coal(host_locus_tree,
host_loci_gene_trees <-effective_pop_size = popsize,
num_reps = 100)
```

And we can calculate some tree summary statistics on all of genes that were simulated on one of the locus trees (in this case the first one).

You can also combine a host tree, symbiont tree, and association matrix into a `cophy`

object that can be input into functions in `treeducken`

. The example below reads in the classic gopher and lice dataset from Hafner et al. 1994. The more common association table format with two columns with hosts in the first and symbionts (or parasites) in the second column is converted into an association matrix. These are all then converted into `treeducken`

’s `cophy`

class. You can then print out a summary of these and then calculate summary statistics on these. Currently, this only calculates the parafit statistic and conducts the permutation test for that statistic if the `event_history`

element of `cophy`

is not set. If that is set it will count the numbers of each event.

```
read.table(system.file("extdata",
gopher_lice_map <-"gopher_lice_mapping.txt",
package = "treeducken"),
stringsAsFactors = FALSE, header = TRUE)
convert_assoc_table_to_matrix(gopher_lice_map)
gopher_lice_assoc_matrix <- ape::read.nexus(system.file("extdata",
gopher_tree <-"gophers_bd.tre",
package = "treeducken"))
ape::read.nexus(system.file("extdata",
lice_tree <-"lice_bd.tre",
package = "treeducken"))
convert_to_cophy(hostTree = gopher_tree,
gopher_lice_cophylo <-symbTree = lice_tree,
assocMat = gopher_lice_assoc_matrix)
print(gopher_lice_cophylo)
#>
#> Host Tree:
#>
#>
#> Phylogenetic tree with 15 tips and 14 internal nodes.
#>
#> Tip labels:
#> Cratogeomys_castanops_L32685, Cratogeomys_merriami_L32688, Geomys_breviceps_L32683, Geomys_bursarius_L32693, Geomys_bursarius_L32694, Geomys_personatus_L32689, ...
#>
#> Rooted; includes branch lengths.
#>
#>
#> Symbiont Tree:
#>
#>
#> Phylogenetic tree with 17 tips and 16 internal nodes.
#>
#> Tip labels:
#> Geomydoecus_actuosi_L32665, Geomydoecus_chapini_L32667, Geomydoecus_cherriei_L32668, Geomydoecus_costaricensis_L32669, Geomydoecus_ewingi_L32671, Geomydoecus_expansus_L32670, ...
#>
#> Rooted; includes branch lengths.
#>
#> Association Matrix:
#>
#>
#> There are 17 rows (i.e. extant symbionts).
#> There are 15 cols (i.e. extant hosts).
cophy_summary_stat(gopher_lice_cophylo)
#> Cospeciations Host_Speciations Host_Extinctions Symbiont_Speciations
#> 1 0 0 0 0
#> Symbiont_Extinctions Parafit_Stat Parafit_P-value
#> 1 0 2.720485 0.1
plot(gopher_lice_cophylo,
fsize = 0.5,
show_tip_label = FALSE,
gap = 1,
col = "purple",
lty = "dashed")
```