`graph4lg`

- TutorialThe creation of the `graph4lg`

package on R should make easier the construction and analysis of genetic and landscape graphs in landscape genetics studies (hence the name `graph4lg`

). The package allows to weight the links and to prune the graphs by several ways. To our knowledge, it is the first software which enables to create genetic graphs with such a large variety of parameters. Besides, it allows to carry out preliminary analyses of the spatial pattern of genetic differentiation in order to choose the best genetic distance and pruning method in every context. Lastly, it makes possible the comparison of two spatial graphs sharing the same nodes.

In the following sections, we describe the functions of the package following their separation into three parts. We illustrate them with results’ examples and report the command lines to copy in order to reproduce them.

The package includes genetic and spatial data allowing users to discover its different functionalities. We included two datasets. The first is from French Guadeloupe and consists of 432 Plumbeous Warblers (*Setophaga plumbea*) genotyped at 12 microsatellite loci (169 different alleles in total) from 20 populations whose spatial coordinates are available. These data were collected and analysed by Khimoun et al. (2017). They are also freely available on the website datadryad.org. They were included in the package in different formats: `data.frame`

format with columns of `locus`

class from the package `gstud`

, `genind`

class from `adegenet`

, `loci`

class for `pegas`

, text files (.txt) for GENEPOP or STRUCTURE.

The second data set was created during simulations done with CDPOP (Landguth and Cushman 2010) on a simulated landscape. It consists of 1500 individuals from 50 populations genotyped at 20 microsatellite loci. Individuals dispersed less when the cost-distance between populations was large. These data were included in format `genind`

and as a text file compatible with GENEPOP software. A landscape graph was created with GRAPHAB (Foltête, Clauzel, and Vuidel 2012) whose nodes were the 50 simulated populations and the links were weighted by cost-distance values between populations. The project created with GRAPHAB was included into the package such that the landscape graphs and the cost-distance matrix can be easily imported into the R environment.

The first type of functions from this package allows to process spatial and genetic data used as input data in subsequent analyses performed with other functions. These functions convert genetic data, compute genetic or geographical distances, import landscape graphs and convert cost-distance values into Euclidean geographical distances.

In order to make the package user-friendly and compatible with genetic data commonly used in landscape genetics, the functions `gstud_to_genind`

, `loci_to_genind`

, `structure_to_genind`

and `genepop_to_genind`

allow to convert genetic data from formats used respectively in the GENETIC STUDIO (Dyer 2009) and `pegas`

(Paradis 2010) packages on R and in STRUCTURE (Pritchard, Stephens, and Donnelly 2000) and GENEPOP (Raymond 1995) software into R objects with the class attribute `genind`

from ADEGENET package (Jombart 2008). The format `genind`

allows to use time-efficient functions from ADEGENET (coded in C). This package was developed and is regularly maintained by Thibaut Jombart (his website). The function `genind_to_genepop`

enables to convert `genind`

object into text files in format `genepop`

in order to perform easily analyses with this commonly used R package and executable software.

Genetic data in formats used by software commonly used in population genetics can be converted into `genind`

objects in R. This class of object was created for the package `adegenet`

(Jombart 2008).

The GENEPOP software (Raymond 1995) developed by M. Raymond and F. Rousset (Montpellier) can be used as an executable file, with or without graphical user interface, or as a R package. It is frequently used to compute F_{ST} values and to test for Hardy-Weinberg equilibrium, linkage disequilibrium or genetic differentiation.

When performing simulations with CDPOP, individuals’ genotypes can be saved as GENEPOP files at the end of the simulation.

The function `genepop_to_genind`

loads a GENEPOP file (.txt extension) and converts it into a `genind`

object. To use it, the path to the file, the total number and names of the loci and the populations’ names must be indicated.

```
data_genind <- genepop_to_genind(path = paste0(system.file('extdata',
package = 'graph4lg'), "/gpop_51_sim22_01_25.txt"),
n.loci = 20, pop_names = as.character(1:50))
#> Registered S3 method overwritten by 'spdep':
#> method from
#> plot.mst ape
#> Registered S3 method overwritten by 'pegas':
#> method from
#> print.amova ade4
data_genind
#> /// GENIND OBJECT /////////
#>
#> // 1,500 individuals; 20 loci; 568 alleles; size: 3.4 Mb
#>
#> // Basic content
#> @tab: 1500 x 568 matrix of allele counts
#> @loc.n.all: number of alleles per locus (range: 26-30)
#> @loc.fac: locus factor for the 568 columns of @tab
#> @all.names: list of allele names for each locus
#> @ploidy: ploidy of each individual (range: 2-2)
#> @type: codom
#> @call: adegenet::df2genind(X = as.matrix(x[, attr(x, "locicol"), drop = FALSE]),
#> sep = "/", pop = pop, ploidy = ploidy)
#>
#> // Optional content
#> @pop: population of each individual (group size range: 30-30)
```

We get a `genind`

object. It contains the genotypes of the 1500 individuals from the 50 populations created during the simulation.

Similarly, the function `genind_to_genepop`

allows to perform the reverse conversion, i.e. to convert a `genind`

object into a GENEPOP file. This file is created and saved in the working directory defined earlier.

`genind_to_genepop(x = data_genind, output = "data_gpop_test.txt")`

This function allows for example to create a GENEPOP file to test for between populations genetic differentiation or to compute fixation indices with GENEPOP software.

STRUCTURE software (Pritchard, Stephens, and Donnelly 2000) is frequently used in population genetics and landscape genetics. It enables to create populations’ clusters via a Bayesian approach aiming at minimizing the deviation from Hardy-Weinberg equilibrium when gathering populations with one another. The input files have a particular structure. The function `structure_to_genind`

allows to convert this type of file into a `genind`

object.

To use the function, we need to indicate the path to the file, the names of the loci, the individuals’ ID and the populations’ names in the same order as in the original file.

```
loci_names <- c("DkiD104", "DkiD124", "DkiD102", "CAM19",
"DkiC118", "DkiD128", "DkiB12", "Lswmu7",
"DkiD109", "Lswmu5", "TG12_15", "DkiD12" )
data(data_pc_genind)
ind_names <- row.names(data_pc_genind@tab)
pop_names <- c("BT-1", "BT-10", "BT-11", "BT-12", "BT-13", "BT-2",
"BT-3", "BT-4", "BT-5", "BT-6", "BT-7", "BT-8",
"BT-9", "GT-1", "GT-2", "GT-3", "GT-4", "GT-5",
"GT-6", "GT-7")
data_paru <- structure_to_genind(path = paste0(system.file('extdata',
package = 'graph4lg'),
"/data_PC_str.txt"),
loci_names = loci_names,
pop_names = pop_names,
ind_names = ind_names)
data_paru
#> /// GENIND OBJECT /////////
#>
#> // 432 individuals; 12 loci; 169 alleles; size: 352.3 Kb
#>
#> // Basic content
#> @tab: 432 x 169 matrix of allele counts
#> @loc.n.all: number of alleles per locus (range: 5-23)
#> @loc.fac: locus factor for the 169 columns of @tab
#> @all.names: list of allele names for each locus
#> @ploidy: ploidy of each individual (range: 2-2)
#> @type: codom
#> @call: adegenet::df2genind(X = as.matrix(x[, attr(x, "locicol"), drop = FALSE]),
#> sep = "/", pop = pop, ploidy = ploidy)
#>
#> // Optional content
#> @pop: population of each individual (group size range: 6-33)
```

Packages `gstudio`

and `popgraph`

developed by R. Dyer (Dyer 2009) use as input data R `data.frames`

with columns of class `locus`

. These `data.frame`

objects constitute `gstud`

objects. Given these packages are often used to create genetic graphs, we created a function to convert them into the `genind`

format.

A `gstud`

object generally has the following structure (Plumbeous Warbler data from Guadeloupe as an example):

`head(data_pc_gstud)`

To convert it with the function `gstud_to_genind`

, we indicate the name of the `data.frame`

and of the columns with populations’ names and individuals’ names:

```
gstud_to_genind(x = data_pc_gstud, pop_col = "Cluster",
ind_col = "ID")
#> /// GENIND OBJECT /////////
#>
#> // 432 individuals; 12 loci; 169 alleles; size: 352.3 Kb
#>
#> // Basic content
#> @tab: 432 x 169 matrix of allele counts
#> @loc.n.all: number of alleles per locus (range: 5-23)
#> @loc.fac: locus factor for the 169 columns of @tab
#> @all.names: list of allele names for each locus
#> @ploidy: ploidy of each individual (range: 2-2)
#> @type: codom
#> @call: adegenet::df2genind(X = as.matrix(x[, attr(x, "locicol"), drop = FALSE]),
#> sep = "/", pop = pop, ploidy = ploidy)
#>
#> // Optional content
#> @pop: population of each individual (group size range: 6-33)
```

From a `genind`

object, the function `mat_gen_dist`

calculates several types of between populations genetic distances:

F

_{ST}(Weir and Cockerham 1984) or linearised F_{ST}(Rousset 1997) (options ‘`dist=FST`

’ and ‘`dist=FST_lin`

’).G’

_{ST}(Hedrick 2005) (option ‘`dist=GST`

’).D

_{Jost}(Jost 2008) (option ‘`dist=D`

’).D

_{PS}(1 - proportion of shared alleles) (Bowcock et al. 1994, @murphy_graph_2015) (option ‘`dist=DPS`

’).Euclidean genetic distance (Excoffier, Smouse, and Quattro 1992) (option ‘

`dist=basic`

’).Euclidean genetic distance with a weighting depending on allelic frequencies giving more weight to rare alleles (Fortuna et al. 2009) (option ‘

`dist=weight`

’).Euclidean genetic distance computed after a PCA of the matrix of allelic frequencies by population. The axes considered to compute the Euclidean distance are the non-collinear principal components (total number of alleles - number of loci) (Paschou et al. 2014, @shirk2017comparison) (option ‘

`dist=PCA`

’).Euclidean genetic distance computed in the same way as with the function

`popgraph`

from`popgraph`

package, i.e. after a PCA and two SVD, among other computation steps (option ‘`dist=PG`

’). This distance is different from the conditional genetic distance (cGD) computed from a population graph by summing genetic distances along shortest paths.

To do these calculations with the function `mat_gen_dist`

, you just have to indicate the name of the `genind`

object which includes the genetic data of the individuals as well as the populations to which each of them belongs. The other argument of the function is the type of genetic distance to compute. Here are two examples:

`mat_dps <- mat_gen_dist(x = data_genind, dist = "DPS")`

```
mat_dps[1:5, 1:5]
#> 1 10 11 12 13
#> 1 0.0000000 0.7883333 0.5900000 0.6816667 0.8733333
#> 10 0.7883333 0.0000000 0.5741667 0.5450000 0.8816667
#> 11 0.5900000 0.5741667 0.0000000 0.3750000 0.8658333
#> 12 0.6816667 0.5450000 0.3750000 0.0000000 0.8633333
#> 13 0.8733333 0.8816667 0.8658333 0.8633333 0.0000000
```

`mat_pg <- mat_gen_dist(x = data_genind, dist = "PG")`

```
mat_pg[1:5, 1:5]
#> 1 10 11 12 13
#> 1 0.00000 29.93508 21.67789 25.40865 29.17504
#> 10 29.93508 0.00000 19.15163 20.32432 29.99936
#> 11 21.67789 19.15163 0.00000 13.50181 24.69579
#> 12 25.40865 20.32432 13.50181 0.00000 25.18634
#> 13 29.17504 29.99936 24.69579 25.18634 0.00000
```

The function `graphab_to_igraph`

allows to import landscape graphs created with software into the R environment. It takes the path to the directory of a project created with as a first argument. Then, the names of the shapefile layers corresponding to the nodes and links (the least-cost paths or a set of graph links) of the graphs must be indicated (for the nodes, by default, the shapefile layers “patches.shp” is imported). The links of the imported graph are weighted by cost-distance values or by geographical distance values depending on the `weight`

option. The graph can be plotted on a geographical map. Besides, an object of type `data.frame`

with the spatial coordinates of the nodes’ centroids can be created.

As an example, the following command allows to import the project created to parameterize the gene-flow simulation on performed to create `data_simul_genind`

:

```
land_graph <- graphab_to_igraph(dir_path = system.file('extdata',
package = 'graph4lg'),
nodes = "patches",
links = "liens_rast2_1_11_01_19-links",
weight = "cost", fig = FALSE, crds = TRUE)
#> OGR data source with driver: ESRI Shapefile
#> Source: "C:\Users\psavary\AppData\Local\Temp\RtmpMR98nP\Rinst71436f941b3\graph4lg\extdata", layer: "patches"
#> with 50 features
#> It has 4 fields
#> OGR data source with driver: ESRI Shapefile
#> Source: "C:\Users\psavary\AppData\Local\Temp\RtmpMR98nP\Rinst71436f941b3\graph4lg\extdata", layer: "liens_rast2_1_11_01_19-links"
#> with 1225 features
#> It has 5 fields
```

The `land_graph`

object is a list composed of an object of class `igraph`

corresponding to the landscape graph, and of a table of class `data.frame`

with the coordinates of the patches’ centroids.

```
crds_patches <- land_graph[[2]]
land_graph <- land_graph[[1]]
```

We create a matrix with the cost-distances between populations from the links of the landscape graph imported, which is a complete graph. You have to pay attention to the order of the populations’ names in the distance matrices, even if error messages will appear when using the functions of this package if you use two matrices in which rows’ and columns’ names do not match. The function `reorder_mat`

reorders the rows and columns of a symmetric matrix according to a specified order. To use it, you just need to create a character vector with the rows and columns names from the symmetric matrix in the order you want them to be ordered in the new matrix.

```
mat_ld <- as_adjacency_matrix(land_graph, attr = "weight",
type = "both", sparse = FALSE)
order <- row.names(mat_ld)[order(c(as.character(row.names(mat_ld))))]
mat_ld <- reorder_mat(mat = mat_ld, order = order)
```

You can also calculate Euclidean geographical distances between populations with the function `mat_geo_dist`

. It takes as an argument a `data.frame`

with 3 columns:

ID : populations’ ID (or simple points’ ID when using this function in another context)

x : populations’ or points’ longitude

y : populations’ or points’ latitude

Geographical coordinates must be expressed in a projected coordinates system (metric) because Pythagoras’s theorem is used to compute the distances (a warning message is displayed in every case).

Example :

`head(crds_patches)`

```
mat_geo <- mat_geo_dist(data = crds_patches, ID = "ID", x = "x", y = "y")
#> Coordinates were treated as projected coordinates. Check whether
#> it is the case.
mat_geo <- reorder_mat(mat = mat_geo, order = order)
```

```
mat_geo[1:5, 1:5]
#> 1 10 11 12 13
#> 1 0.00 25292.687 15033.296 20465.825 13354.40
#> 10 25292.69 0.000 12001.667 5622.277 31907.68
#> 11 15033.30 12001.667 0.000 6407.027 19906.28
#> 12 20465.83 5622.277 6407.027 0.000 26300.76
#> 13 13354.40 31907.679 19906.280 26300.760 0.00
```

Cost-distances are expressed in cost units arbitrarily defined based on the cost values assigned to every land cover type when creating resistance surfaces. However, species dispersal distances are usually known as distances expressed in metric units. When we know that the probability that a species covers 10 km is 5 %, we can ask what is the equivalent of this distance in cost distance units according to the scenario of cost values assumed. It allows to prune a landscape graph with a given distance threshold e.g. or provides an order of magnitude of the cost-distance values.

To that purpose, you can perform a regression of the between populations cost-distance values against the corresponding geographical distances. It estimates the relationship between both types of distance. Then, the resulting parameters estimates enable to convert a geographical distance into its cost-distance equivalent according to the cost scenario.

The function `convert_cd`

performs the linear regression or log-log linear regression between the geographical distance matrix and the cost-distance matrix, in the same way as Tournant et al. (2013) and as performed by GRAPHAB software.

Below, we estimate the relationship between geographical distance and cost-distance between the populations used to perform the gene-flow simulation. We convert 10 km into cost-distance units. The function also plots the relationship between these distances.

```
convert_res <- convert_cd(mat_euc = mat_geo, mat_ld = mat_ld,
to_convert = 10000, fig = TRUE,
method = "log-log", pts_col = "grey")
convert_res
#> $`Converted values`
#> 10000
#> 1605.543
#>
#> $`Model parameters`
#> Intercept Slope
#> -2.251216 1.045828
#>
#> $`Model multiple R-squared`
#> Multiple R-squared
#> 0.6870757
#>
#> $Plot
```

In this case, we can see that 10 km are equivalent to 1606 cost-distance units. The log-log linear model estimates the relationship between geographical distance (GD) and cost-distance (CD) such that: \(log(CD)=-2.2512+1.0458 \times log(GD)\). The determination coefficient \(R^2\) associated to this linear model is 0.69. The black dot represented on the figure refers to the 10 km value on the regression line characterizing the relationship between cost-distance and geographical distance.

Some functions from `graph4lg`

package construct graphs from either `genind`

objects or genetic or landscape distance matrices. To choose a genetic distance and a pruning method for the genetic graphs’ construction, we developed functions to perform preliminary analyses of the spatial pattern of genetic differentiation. Indeed, a genetic graph can be created in order to i) identify the direct dispersal paths between populations or to ii) select the set of population pairs to consider to infer landscape effects on dispersal. According to the use of a genetic graph and to the spatial pattern of genetic differentiation (type-I or type-IV pattern of IBD (Hutchison and Templeton 1999, @van2015isolation)), the choice of a genetic distance and of a pruning method will not be the same.

Van Strien, Holderegger, and Van Heck (2015) computed the so-called distance of maximum correlation (DMC) as the distance between populations below which population pairs should be considered in order to maximize the correlation between landscape distance (geographical distance in their case, but applies similarly to cost-distance) and genetic distance. This distance threshold is computed by increasing iteratively the maximum distance between populations above which population pairs are not taken into account to compute the correlation. Thus, an increasing number of population pairs is considered in the inference. When the correlation coefficient between landscape distance and genetic distance reaches a maximum, the distance threshold considered is the DMC. When the DMC is equal to the maximum distance between populations, it means that an equilibrium established between gene flow and genetic drift at the scale of the study area. Conversely, when the DMC is lower than this maximum distance, it means that there is a “plateau” in the relationship between landscape distance and genetic distance because migration-drift equilibrium has not been reached yet at the scale considered. It can be due to recent modifications of the landscape which consistently reduced the connectivity in a previously connected context. In this case, graph pruning is needed to well infer landscape effect on dispersal. Similarly, genetic distances that do not assume this equilibrium should be used.

The function `dist_max_corr`

calculates the DMC from two distance matrices. You need to specify the interval between two distance thresholds iteratively considered to select population pairs and compute the correlation coefficient. The function `scatter_dist`

, on the other hand, allows to visualize the relationship between two distance matrices by making a scatter plot. The shape of this relationship can be compared to the four different types of IBD patterns described by Hutchison and Templeton (1999) in order to characterize the spatial pattern of genetic differentiation.

Here are the command lines to execute to use these two functions:

```
dmc <- dist_max_corr(mat_gd = mat_dps, mat_ld = mat_ld,
interv = 500, pts_col = "black")
```

The `dmc`

object is a list with 1) the DMC value, 2) a vector containing all the computed correlation coefficients, 3) a vector with all the distance thresholds tested and 4) a graphic object created with the `ggplot2`

package.

```
# DMC value
dmc[[1]]
#> [1] 4500
# Correlation coefficients
dmc[[2]]
#> [1] NA 0.2986565 0.3154498 0.5188747 0.7059633 0.7559539 0.7850267
#> [8] 0.7947691 0.8038470 0.7853646 0.7760106 0.7641339 0.7530264 0.7462445
#> [15] 0.7386713 0.7333936 0.7305631 0.7226695 0.7137972 0.7110962 0.7041702
# Threshold distances tested
dmc[[3]]
#> [1] 500.00 1000.00 1500.00 2000.00 2500.00 3000.00 3500.00 4000.00
#> [9] 4500.00 5000.00 5500.00 6000.00 6500.00 7000.00 7500.00 8000.00
#> [17] 8500.00 9000.00 9500.00 10000.00 10230.05
```

The figure below represents the evolution of the correlation coefficient values when distance thresholds increase.

The scatter plot is displayed with the function `scatter_dist`

.

```
scatter_dist(mat_gd = mat_dps, mat_ld = mat_ld,
pts_col = "black")
#> 1225 out of 1225 values were used.
```

In this particular case, we notice a type-IV pattern of isolation by distance with a “plateau” in the relationship between cost-distance and genetic-distance (D_{PS}). Graph pruning will be needed to select the population pairs to include in the inference of landscape effects on dispersal.

In the following section, we present the different pruning methods available.

To prune a graph whose links are weighted by distances, you can remove all the links associated to geographical or genetic distances larger (or lower) than a specific threshold distance. This distance can for example be equal to the maximum dispersal distance of an individual of the study species at the scale of its lifespan so that the resulting graph represents the direct dispersal paths of the species. It can also be equal to the DMC if the objective is to infer landscape effects on dispersal.

The function `gen_graph_thr`

takes as arguments a distance matrix used to weight the links of the resulting graph (`mat_w`

) and a distance matrix on which the “thresholding” is based (`mat_thr`

). The selected links are selected according to the values of this latter matrix. The argument `thr`

is the numerical value of the threshold distance. If `mat_thr`

is not specified, `mat_w`

is used by default for the thresholding. Lastly, you have to specify if the links to remove take larger or lower values than the threshold value.

```
graph_thr <- gen_graph_thr(mat_w = mat_dps, mat_thr = mat_geo,
thr = 12000, mode = "larger")
graph_thr
#> IGRAPH a8b03dc UNW- 50 162 --
#> + attr: name (v/c), weight (e/n)
#> + edges from a8b03dc (vertex names):
#> [1] 1 --2 1 --4 1 --5 1 --6 1 --9 10--12 10--20 10--21 10--8 11--12
#> [11] 11--16 11--18 11--20 11--4 11--5 11--8 11--9 12--18 12--20 12--21
#> [21] 12--8 13--14 13--15 13--16 13--17 13--2 13--22 13--5 13--6 13--7
#> [31] 14--15 14--16 14--17 14--22 14--5 14--6 15--16 15--17 15--22 15--5
#> [41] 15--9 16--17 16--18 16--22 16--26 16--4 16--5 16--9 17--19 17--22
#> [51] 17--23 17--27 17--28 17--6 18--20 18--21 18--24 18--25 18--26 18--8
#> [61] 18--9 19--23 19--27 19--7 2 --4 2 --5 2 --6 2 --9 20--21 20--24
#> [71] 20--25 20--26 20--8 21--24 21--29 22--26 22--28 22--30 23--27 23--28
#> + ... omitted several edges
```

The function returns a graph in the form of an `igraph`

object, which is consequently compatible with all functions from `igraph`

package (Csardi and Nepusz 2006), one of the most used R package to create and analyse graphs (together with `sna`

and `networks`

). In the latter example, the graph has 50 nodes and 162 links when we prune it using a 12-km distance threshold. Its links are weighted with the values of the `mat_dps`

matrix.

A graph can be pruned according to a topological criterion. The function `gen_graph_topo`

can use 4 different criteria. As with the previous function, topological criteria are applied by considering the distance values of the `mat_topo`

matrix, but the links are weighted with the values of the `mat_w`

matrix (except when `mat_topo`

is not specified, cf. previous section).

**Gabriel graph**: in the created graph, two nodes are connected by a link if, when we draw a circle whose center is set at the middle of the segment linking them and whose radius is equal to half the length of this segment, there is no other node inside the circle. In mathematical terms, it means that there is a segment between \(x\) and \(y\) if and only if for every other point \(z\), we have: \(d_{xy}\leq \sqrt{d_{xz}^{2}+d_{yz}^{2}}\). We can compute such a graph from geographical distances (Gabriel and Sokal 1969) (`graph_gab_geo`

below) but also, less commonly, from genetic distances (Naujokaitis-Lewis et al. 2013) (`graph_gab_gen`

below). In the latter case, it is to some extent as if Pythagoras’s theorem was applied to genetic distances, which can seem a bit strange even if this method has already been used by Naujokaitis-Lewis et al. (2013).

```
graph_gab_geo <- gen_graph_topo(mat_w = mat_dps, mat_topo = mat_geo,
topo = "gabriel")
graph_gab_geo
#> IGRAPH a91b5bd UNW- 50 98 --
#> + attr: name (v/c), weight (e/n)
#> + edges from a91b5bd (vertex names):
#> [1] 1 --2 1 --5 10--12 10--21 11--12 11--16 11--18 11--8 11--9 12--20
#> [11] 12--8 13--14 13--17 13--5 13--6 14--15 14--17 14--5 15--16 15--22
#> [21] 15--5 16--18 16--22 16--9 17--19 17--23 17--27 17--28 18--20 18--25
#> [31] 19--23 19--7 2 --6 20--21 20--24 20--25 21--24 21--29 22--26 22--28
#> [41] 23--27 24--25 24--29 24--31 25--26 25--31 25--32 26--30 27--28 27--33
#> [51] 28--30 28--34 28--36 29--31 29--37 3 --7 30--32 30--34 31--32 31--35
#> [61] 31--37 32--34 32--35 33--36 33--42 34--36 34--44 35--37 35--39 35--45
#> [71] 36--40 37--38 37--39 37--41 38--41 38--43 39--41 39--49 4 --5 4 --9
#> + ... omitted several edges
graph_gab_gen <- gen_graph_topo(mat_w = mat_dps, mat_topo = mat_dps,
topo = "gabriel")
```

**Minimum Spanning Tree (MST)**: it creates a minimum spanning tree, i.e a graph in which every node is connected by a link to at least another node and whose total links’ weight is minimum. By definition, its number of links is equal to the number of nodes - 1.

```
graph_mst <- gen_graph_topo(mat_w = mat_dps, mat_topo = mat_dps,
topo = "mst")
graph_mst
#> IGRAPH a97b081 UNW- 50 49 --
#> + attr: name (v/c), weight (e/n)
#> + edges from a97b081 (vertex names):
#> [1] 1 --2 1 --4 10--8 11--12 11--18 11--20 11--4 12--8 13--14 13--6
#> [11] 14--15 15--17 15--22 15--28 16--23 17--23 19--23 2 --7 20--25 21--24
#> [21] 23--27 24--29 25--29 25--30 26--30 27--33 3 --6 3 --7 30--34 31--32
#> [31] 32--34 32--35 32--37 36--40 37--38 38--43 39--43 4 --9 40--42 41--43
#> [41] 41--49 42--48 43--45 43--47 44--45 44--48 46--47 48--50 5 --9
```

**“Percolation” graph**: the graph is created by removing iteratively some links, beginning with those with the highest weights until the graph breaks into more than one component. We conserve the link whose removal entails the creation of another component to obtain a connected graph. This method is also called the *edge-thinning method* (Urban et al. 2009). Such a method is linked to percolation theory (Rozenfeld et al. 2008). The function `gen_graph_topo`

indicates the number of conserved links and the weight of the link whose removal disconnects the graph (maximum link weight of the created graph).

```
graph_percol <- gen_graph_topo(mat_w = mat_dps, mat_topo = mat_dps,
topo = "percol")
#> Number of conserved links : 325
#> Maximum weight of the conserved links : 0.7525
```

**Complete graph**: the function allows to create a complete graph from a distance matrix. In that case, there is no pruning and, by definition, all population pairs are connected.

```
graph_comp <- gen_graph_topo(mat_w = mat_dps, mat_topo = mat_dps,
topo = "comp")
```

The last pruning method implemented by the `graph4lg`

package is based upon the conditional independence principle. The function `gen_graph_indep`

is largely inspired by the function `popgraph`

created by R. Dyer (Dyer and Nason 2004), but does not need the package `popgraph`

to function. Besides, as some calculations are performed with functions from the `adegenet`

package (coded in C), it is faster than the original `popgraph`

function. It is also more flexible than `popgraph`

function given you can vary i) the way we compute genetic distances used to weight the links and to compute the covariance between populations, ii) the formula used to compute the covariance from squared distances or alternatively simple distances, iii) the statistical tolerance threshold, iv) the p-values adjustment and v) the returned objects created by the function. Without entering further into the details, here is an implementation example.

```
graph_ci <- gen_graph_indep(x = data_genind,
dist = "PCA",
cov = "sq",
adj = "holm")
```

```
graph_ci
#> IGRAPH 326a1c8 UNW- 50 105 --
#> + attr: name (v/c), weight (e/n)
#> + edges from 326a1c8 (vertex names):
#> [1] 1 --2 1 --3 1 --4 1 --9 10--11 10--8 11--12 11--18 11--20 12--20
#> [11] 12--25 12--8 13--14 13--15 13--22 13--6 14--15 14--17 14--19 14--27
#> [21] 14--32 14--6 15--17 15--23 15--28 16--22 16--23 16--33 16--41 17--23
#> [31] 17--33 18--19 18--20 18--9 19--23 19--27 2 --5 2 --7 20--4 20--5
#> [41] 21--24 21--29 21--45 22--23 22--8 23--27 23--49 24--29 25--26 25--3
#> [51] 25--30 25--32 25--37 26--3 26--30 26--34 26--39 26--44 26--8 27--33
#> [61] 3 --31 3 --6 3 --7 30--34 30--7 31--32 31--35 31--36 32--34 32--35
#> [71] 32--8 33--40 34--6 35--37 36--40 36--48 37--38 37--39 37--46 38--43
#> + ... omitted several edges
```

Once the graphs have been created, you can perform calculation from them, visualize and export them.

The function `plot_graph_lg`

integrates functions from `igraph`

and `ggplot2`

to represent graphs on a map. Most frequently, graphs are spatial and a table with populations’ coordinates must be given as an argument. It must have exactly the same structure as the table given as an argument to `mat_geo_dist`

(3 columns : ID, x, y). The visual representation can make visible the links’ weights by plotting the links with a width proportional to the weight (`width = "w"`

) or the inverse weight (`width = "inv"`

) of the links.

For example, with the graph `graph_mst`

:

```
p <- plot_graph_lg(graph = graph_mst, crds = crds_patches,
mode = "spatial", weight = TRUE, width = "inv")
p
```

If the populations’ spatial coordinates are not available, you can still display the graph on a two-dimensional plane. In that case, the nodes’ positions are computed with Fruchterman and Reingold (1991) algorithm to optimize the representation. With the graph `graph_mst`

, we obtain:

```
p <- plot_graph_lg(graph = graph_mst, crds = crds_patches,
mode = "aspatial", weight = TRUE, width = "inv")
p
```

You can also plot the results of a graph’s partition carried out by computing a modularity index with the function `plot_graph_modul`

. This function is similar to `plot_graph_lg`

. The main difference is that the graph’s nodes from the same modules are displayed in the same color.

Several algorithms can be used: `fast greedy`

(Clauset, Newman, and Moore 2004), `louvain`

(Blondel et al. 2008), `optimal`

(Brandes et al. 2008) and `walktrap`

(Pons and Latapy 2006). The number of created modules in each graph is adjustable but can be fixed depending on the optimal values of the modularity indices. Besides, the modularity calculation can take into account the links’ weights or not. When taken into account, the weight given to a link in the calculation will be i) proportional to or ii) inversely proportional to the genetic or landscape distance to which it is associated.

In the following example, the graph is partitioned into 6 modules with the `fast_greedy`

algorithm (default option).

`plot_graph_modul(graph = graph_gab_geo, crds = crds_patches)`

In landscape genetics, a graph is generally pruned from a distance matrix in which a set of distance values between population pairs or sample sites are chosen. This matrix is usually a genetic distance matrix. The relationship between these genetic distances and corresponding landscape distances (geographical or cost-distance) can be studied. When a scatterplot is created to do that (with the function `scatter_dist`

), you can display the points corresponding to population pairs connected in the pruned graph in a different color. The function `scatter_dist_g`

thereby allows to understand the pruning and to assess its intensity.

In the following example, you can see that all connected population pairs from `graph_gab_geo`

are separated by short landscape distances. Besides, not a single population pair is located where a plateau is reached in the relationship between genetic and landscape distances on the scatterplot.

`scatter_dist_g(mat_y = mat_dps , mat_x = mat_ld, graph = graph_gab_geo)`

You can also create an histogram with the links’ weights distribution with the function `plot_hist_w`

.

```
p <- plot_w_hist(graph = graph_gab_gen)
p
```