## Introduction

The creation of the graph4lg package on R should make easier the construction and analysis of genetic and landscape graphs in landscape genetics studies (hence the name graph4lg). The package allows to weight the links and to prune the graphs by several ways. To our knowledge, it is the first software which enables to create genetic graphs with such a large variety of parameters. Besides, it allows to carry out preliminary analyses of the spatial pattern of genetic differentiation in order to choose the best genetic distance and pruning method in every context. Lastly, it makes possible the comparison of two spatial graphs sharing the same nodes.

In the following sections, we describe the functions of the package following their separation into three parts. We illustrate them with results’ examples and report the command lines to copy in order to reproduce them.

The package includes genetic and spatial data allowing users to discover its different functionalities. We included two datasets. The first is from French Guadeloupe and consists of 432 Plumbeous Warblers (Setophaga plumbea) genotyped at 12 microsatellite loci (169 different alleles in total) from 20 populations whose spatial coordinates are available. These data were collected and analysed by Khimoun et al. (2017). They are also freely available on the website datadryad.org. They were included in the package in different formats: data.frame format with columns of locus class from the package gstud, genind class from adegenet, loci class for pegas, text files (.txt) for GENEPOP or STRUCTURE.

The second data set was created during simulations done with CDPOP (Landguth and Cushman 2010) on a simulated landscape. It consists of 1500 individuals from 50 populations genotyped at 20 microsatellite loci. Individuals dispersed less when the cost-distance between populations was large. These data were included in format genind and as a text file compatible with GENEPOP software. A landscape graph was created with GRAPHAB (Foltête, Clauzel, and Vuidel 2012) whose nodes were the 50 simulated populations and the links were weighted by cost-distance values between populations. The project created with GRAPHAB was included into the package such that the landscape graphs and the cost-distance matrix can be easily imported into the R environment.

## Input data processing

The first type of functions from this package allows to process spatial and genetic data used as input data in subsequent analyses performed with other functions. These functions convert genetic data, compute genetic or geographical distances, import landscape graphs and convert cost-distance values into Euclidean geographical distances.

### Genetic data conversion

In order to make the package user-friendly and compatible with genetic data commonly used in landscape genetics, the functions gstud_to_genind, loci_to_genind, structure_to_genind and genepop_to_genind allow to convert genetic data from formats used respectively in the GENETIC STUDIO (Dyer 2009) and pegas (Paradis 2010) packages on R and in STRUCTURE (Pritchard, Stephens, and Donnelly 2000) and GENEPOP (Raymond 1995) software into R objects with the class attribute genind from ADEGENET package (Jombart 2008). The format genind allows to use time-efficient functions from ADEGENET (coded in C). This package was developed and is regularly maintained by Thibaut Jombart (his website). The function genind_to_genepop enables to convert genind object into text files in format genepop in order to perform easily analyses with this commonly used R package and executable software.

Genetic data in formats used by software commonly used in population genetics can be converted into genind objects in R. This class of object was created for the package adegenet (Jombart 2008).

#### GENEPOP to genind

The GENEPOP software (Raymond 1995) developed by M. Raymond and F. Rousset (Montpellier) can be used as an executable file, with or without graphical user interface, or as a R package. It is frequently used to compute FST values and to test for Hardy-Weinberg equilibrium, linkage disequilibrium or genetic differentiation.

When performing simulations with CDPOP, individuals’ genotypes can be saved as GENEPOP files at the end of the simulation.

The function genepop_to_genind loads a GENEPOP file (.txt extension) and converts it into a genind object. To use it, the path to the file, the total number and names of the loci and the populations’ names must be indicated.

data_genind <- genepop_to_genind(path = paste0(system.file('extdata',
package = 'graph4lg'), "/gpop_51_sim22_01_25.txt"),
n.loci = 20, pop_names = as.character(1:50))
#> Registered S3 method overwritten by 'spdep':
#>   method   from
#>   plot.mst ape
#> Registered S3 method overwritten by 'pegas':
#>   method      from
data_genind
#> /// GENIND OBJECT /////////
#>
#>  // 1,500 individuals; 20 loci; 568 alleles; size: 3.4 Mb
#>
#>  // Basic content
#>    @tab:  1500 x 568 matrix of allele counts
#>    @loc.n.all: number of alleles per locus (range: 26-30)
#>    @loc.fac: locus factor for the 568 columns of @tab
#>    @all.names: list of allele names for each locus
#>    @ploidy: ploidy of each individual  (range: 2-2)
#>    @type:  codom
#>    @call: adegenet::df2genind(X = as.matrix(x[, attr(x, "locicol"), drop = FALSE]),
#>     sep = "/", pop = pop, ploidy = ploidy)
#>
#>  // Optional content
#>    @pop: population of each individual (group size range: 30-30)

We get a genind object. It contains the genotypes of the 1500 individuals from the 50 populations created during the simulation.

#### genind to GENEPOP

Similarly, the function genind_to_genepop allows to perform the reverse conversion, i.e. to convert a genind object into a GENEPOP file. This file is created and saved in the working directory defined earlier.

genind_to_genepop(x = data_genind, output = "data_gpop_test.txt")

This function allows for example to create a GENEPOP file to test for between populations genetic differentiation or to compute fixation indices with GENEPOP software.

#### STRUCTURE to genind

STRUCTURE software (Pritchard, Stephens, and Donnelly 2000) is frequently used in population genetics and landscape genetics. It enables to create populations’ clusters via a Bayesian approach aiming at minimizing the deviation from Hardy-Weinberg equilibrium when gathering populations with one another. The input files have a particular structure. The function structure_to_genind allows to convert this type of file into a genind object.

To use the function, we need to indicate the path to the file, the names of the loci, the individuals’ ID and the populations’ names in the same order as in the original file.

loci_names <- c("DkiD104", "DkiD124", "DkiD102", "CAM19",
"DkiC118", "DkiD128", "DkiB12",  "Lswmu7",
"DkiD109", "Lswmu5",  "TG12_15", "DkiD12" )
data(data_pc_genind)
ind_names <- row.names(data_pc_genind@tab)
pop_names <- c("BT-1", "BT-10", "BT-11", "BT-12", "BT-13", "BT-2",
"BT-3",  "BT-4",  "BT-5",  "BT-6",  "BT-7",  "BT-8",
"BT-9",  "GT-1",  "GT-2", "GT-3",  "GT-4",  "GT-5",
"GT-6", "GT-7")
data_paru <- structure_to_genind(path = paste0(system.file('extdata',
package = 'graph4lg'),
"/data_PC_str.txt"),
loci_names = loci_names,
pop_names = pop_names,
ind_names = ind_names)
data_paru
#> /// GENIND OBJECT /////////
#>
#>  // 432 individuals; 12 loci; 169 alleles; size: 352.3 Kb
#>
#>  // Basic content
#>    @tab:  432 x 169 matrix of allele counts
#>    @loc.n.all: number of alleles per locus (range: 5-23)
#>    @loc.fac: locus factor for the 169 columns of @tab
#>    @all.names: list of allele names for each locus
#>    @ploidy: ploidy of each individual  (range: 2-2)
#>    @type:  codom
#>    @call: adegenet::df2genind(X = as.matrix(x[, attr(x, "locicol"), drop = FALSE]),
#>     sep = "/", pop = pop, ploidy = ploidy)
#>
#>  // Optional content
#>    @pop: population of each individual (group size range: 6-33)

#### gstud to genind

Packages gstudio and popgraph developed by R. Dyer (Dyer 2009) use as input data R data.frames with columns of class locus. These data.frame objects constitute gstud objects. Given these packages are often used to create genetic graphs, we created a function to convert them into the genind format.

A gstud object generally has the following structure (Plumbeous Warbler data from Guadeloupe as an example):

head(data_pc_gstud)

To convert it with the function gstud_to_genind, we indicate the name of the data.frame and of the columns with populations’ names and individuals’ names:

gstud_to_genind(x = data_pc_gstud, pop_col = "Cluster",
ind_col = "ID")
#> /// GENIND OBJECT /////////
#>
#>  // 432 individuals; 12 loci; 169 alleles; size: 352.3 Kb
#>
#>  // Basic content
#>    @tab:  432 x 169 matrix of allele counts
#>    @loc.n.all: number of alleles per locus (range: 5-23)
#>    @loc.fac: locus factor for the 169 columns of @tab
#>    @all.names: list of allele names for each locus
#>    @ploidy: ploidy of each individual  (range: 2-2)
#>    @type:  codom
#>    @call: adegenet::df2genind(X = as.matrix(x[, attr(x, "locicol"), drop = FALSE]),
#>     sep = "/", pop = pop, ploidy = ploidy)
#>
#>  // Optional content
#>    @pop: population of each individual (group size range: 6-33)

### Genetic and geographical distances calculation

#### Genetic distances

From a genind object, the function mat_gen_dist calculates several types of between populations genetic distances:

• FST (Weir and Cockerham 1984) or linearised FST (Rousset 1997) (options ‘dist=FST’ and ‘dist=FST_lin’).

• G’ST (Hedrick 2005) (option ‘dist=GST’).

• DJost (Jost 2008) (option ‘dist=D’).

• DPS (1 - proportion of shared alleles) (Bowcock et al. 1994, @murphy_graph_2015) (option ‘dist=DPS’).

• Euclidean genetic distance (Excoffier, Smouse, and Quattro 1992) (option ‘dist=basic’).

• Euclidean genetic distance with a weighting depending on allelic frequencies giving more weight to rare alleles (Fortuna et al. 2009) (option ‘dist=weight’).

• Euclidean genetic distance computed after a PCA of the matrix of allelic frequencies by population. The axes considered to compute the Euclidean distance are the non-collinear principal components (total number of alleles - number of loci) (Paschou et al. 2014, @shirk2017comparison) (option ‘dist=PCA’).

• Euclidean genetic distance computed in the same way as with the function popgraph from popgraph package, i.e. after a PCA and two SVD, among other computation steps (option ‘dist=PG’). This distance is different from the conditional genetic distance (cGD) computed from a population graph by summing genetic distances along shortest paths.

To do these calculations with the function mat_gen_dist, you just have to indicate the name of the genind object which includes the genetic data of the individuals as well as the populations to which each of them belongs. The other argument of the function is the type of genetic distance to compute. Here are two examples:

mat_dps <- mat_gen_dist(x = data_genind, dist = "DPS")
mat_dps[1:5, 1:5]
#>            1        10        11        12        13
#> 1  0.0000000 0.7883333 0.5900000 0.6816667 0.8733333
#> 10 0.7883333 0.0000000 0.5741667 0.5450000 0.8816667
#> 11 0.5900000 0.5741667 0.0000000 0.3750000 0.8658333
#> 12 0.6816667 0.5450000 0.3750000 0.0000000 0.8633333
#> 13 0.8733333 0.8816667 0.8658333 0.8633333 0.0000000
mat_pg <- mat_gen_dist(x = data_genind, dist = "PG")
mat_pg[1:5, 1:5]
#>           1       10       11       12       13
#> 1   0.00000 29.93508 21.67789 25.40865 29.17504
#> 10 29.93508  0.00000 19.15163 20.32432 29.99936
#> 11 21.67789 19.15163  0.00000 13.50181 24.69579
#> 12 25.40865 20.32432 13.50181  0.00000 25.18634
#> 13 29.17504 29.99936 24.69579 25.18634  0.00000

#### Import of landscape graphs created with GRAPHAB

The function graphab_to_igraph allows to import landscape graphs created with software into the R environment. It takes the path to the directory of a project created with as a first argument. Then, the names of the shapefile layers corresponding to the nodes and links (the least-cost paths or a set of graph links) of the graphs must be indicated (for the nodes, by default, the shapefile layers “patches.shp” is imported). The links of the imported graph are weighted by cost-distance values or by geographical distance values depending on the weight option. The graph can be plotted on a geographical map. Besides, an object of type data.frame with the spatial coordinates of the nodes’ centroids can be created.

As an example, the following command allows to import the project created to parameterize the gene-flow simulation on performed to create data_simul_genind:

land_graph <- graphab_to_igraph(dir_path = system.file('extdata',
package = 'graph4lg'),
nodes = "patches",
weight = "cost", fig = FALSE, crds = TRUE)
#> OGR data source with driver: ESRI Shapefile
#> Source: "C:\Users\psavary\AppData\Local\Temp\RtmpMR98nP\Rinst71436f941b3\graph4lg\extdata", layer: "patches"
#> with 50 features
#> It has 4 fields
#> OGR data source with driver: ESRI Shapefile
#> with 1225 features
#> It has 5 fields

The land_graph object is a list composed of an object of class igraph corresponding to the landscape graph, and of a table of class data.frame with the coordinates of the patches’ centroids.

crds_patches <- land_graph[[2]]
land_graph <- land_graph[[1]]

We create a matrix with the cost-distances between populations from the links of the landscape graph imported, which is a complete graph. You have to pay attention to the order of the populations’ names in the distance matrices, even if error messages will appear when using the functions of this package if you use two matrices in which rows’ and columns’ names do not match. The function reorder_mat reorders the rows and columns of a symmetric matrix according to a specified order. To use it, you just need to create a character vector with the rows and columns names from the symmetric matrix in the order you want them to be ordered in the new matrix.

mat_ld <- as_adjacency_matrix(land_graph, attr = "weight",
type = "both", sparse = FALSE)
order <- row.names(mat_ld)[order(c(as.character(row.names(mat_ld))))]
mat_ld <- reorder_mat(mat = mat_ld, order = order)

#### Geographical distances

You can also calculate Euclidean geographical distances between populations with the function mat_geo_dist. It takes as an argument a data.frame with 3 columns:

• ID : populations’ ID (or simple points’ ID when using this function in another context)

• x : populations’ or points’ longitude

• y : populations’ or points’ latitude

Geographical coordinates must be expressed in a projected coordinates system (metric) because Pythagoras’s theorem is used to compute the distances (a warning message is displayed in every case).

Example :

head(crds_patches)
mat_geo <- mat_geo_dist(data = crds_patches, ID = "ID", x = "x", y = "y")
#> Coordinates were treated as projected coordinates. Check whether
#>               it is the case.
mat_geo <- reorder_mat(mat = mat_geo, order = order) 
mat_geo[1:5, 1:5]
#>           1        10        11        12       13
#> 1      0.00 25292.687 15033.296 20465.825 13354.40
#> 10 25292.69     0.000 12001.667  5622.277 31907.68
#> 11 15033.30 12001.667     0.000  6407.027 19906.28
#> 12 20465.83  5622.277  6407.027     0.000 26300.76
#> 13 13354.40 31907.679 19906.280 26300.760     0.00

Cost-distances are expressed in cost units arbitrarily defined based on the cost values assigned to every land cover type when creating resistance surfaces. However, species dispersal distances are usually known as distances expressed in metric units. When we know that the probability that a species covers 10 km is 5 %, we can ask what is the equivalent of this distance in cost distance units according to the scenario of cost values assumed. It allows to prune a landscape graph with a given distance threshold e.g. or provides an order of magnitude of the cost-distance values.

To that purpose, you can perform a regression of the between populations cost-distance values against the corresponding geographical distances. It estimates the relationship between both types of distance. Then, the resulting parameters estimates enable to convert a geographical distance into its cost-distance equivalent according to the cost scenario.

The function convert_cd performs the linear regression or log-log linear regression between the geographical distance matrix and the cost-distance matrix, in the same way as Tournant et al. (2013) and as performed by GRAPHAB software.

Below, we estimate the relationship between geographical distance and cost-distance between the populations used to perform the gene-flow simulation. We convert 10 km into cost-distance units. The function also plots the relationship between these distances.

convert_res <- convert_cd(mat_euc = mat_geo, mat_ld = mat_ld,
to_convert = 10000, fig = TRUE,
method = "log-log", pts_col = "grey")
convert_res
#> $Converted values #> 10000 #> 1605.543 #> #>$Model parameters
#> Intercept     Slope
#> -2.251216  1.045828
#>
#> $Model multiple R-squared #> Multiple R-squared #> 0.6870757 #> #>$Plot

In this case, we can see that 10 km are equivalent to 1606 cost-distance units. The log-log linear model estimates the relationship between geographical distance (GD) and cost-distance (CD) such that: $$log(CD)=-2.2512+1.0458 \times log(GD)$$. The determination coefficient $$R^2$$ associated to this linear model is 0.69. The black dot represented on the figure refers to the 10 km value on the regression line characterizing the relationship between cost-distance and geographical distance.

## Graphs’ construction

Some functions from graph4lg package construct graphs from either genind objects or genetic or landscape distance matrices. To choose a genetic distance and a pruning method for the genetic graphs’ construction, we developed functions to perform preliminary analyses of the spatial pattern of genetic differentiation. Indeed, a genetic graph can be created in order to i) identify the direct dispersal paths between populations or to ii) select the set of population pairs to consider to infer landscape effects on dispersal. According to the use of a genetic graph and to the spatial pattern of genetic differentiation (type-I or type-IV pattern of IBD (Hutchison and Templeton 1999, @van2015isolation)), the choice of a genetic distance and of a pruning method will not be the same.

Van Strien, Holderegger, and Van Heck (2015) computed the so-called distance of maximum correlation (DMC) as the distance between populations below which population pairs should be considered in order to maximize the correlation between landscape distance (geographical distance in their case, but applies similarly to cost-distance) and genetic distance. This distance threshold is computed by increasing iteratively the maximum distance between populations above which population pairs are not taken into account to compute the correlation. Thus, an increasing number of population pairs is considered in the inference. When the correlation coefficient between landscape distance and genetic distance reaches a maximum, the distance threshold considered is the DMC. When the DMC is equal to the maximum distance between populations, it means that an equilibrium established between gene flow and genetic drift at the scale of the study area. Conversely, when the DMC is lower than this maximum distance, it means that there is a “plateau” in the relationship between landscape distance and genetic distance because migration-drift equilibrium has not been reached yet at the scale considered. It can be due to recent modifications of the landscape which consistently reduced the connectivity in a previously connected context. In this case, graph pruning is needed to well infer landscape effect on dispersal. Similarly, genetic distances that do not assume this equilibrium should be used.

The function dist_max_corr calculates the DMC from two distance matrices. You need to specify the interval between two distance thresholds iteratively considered to select population pairs and compute the correlation coefficient. The function scatter_dist, on the other hand, allows to visualize the relationship between two distance matrices by making a scatter plot. The shape of this relationship can be compared to the four different types of IBD patterns described by Hutchison and Templeton (1999) in order to characterize the spatial pattern of genetic differentiation.

Here are the command lines to execute to use these two functions:

dmc <- dist_max_corr(mat_gd = mat_dps, mat_ld = mat_ld,
interv = 500, pts_col = "black")

The dmc object is a list with 1) the DMC value, 2) a vector containing all the computed correlation coefficients, 3) a vector with all the distance thresholds tested and 4) a graphic object created with the ggplot2 package.

# DMC value
dmc[[1]]
#> [1] 4500
# Correlation coefficients
dmc[[2]]
#>  [1]        NA 0.2986565 0.3154498 0.5188747 0.7059633 0.7559539 0.7850267
#>  [8] 0.7947691 0.8038470 0.7853646 0.7760106 0.7641339 0.7530264 0.7462445
#> [15] 0.7386713 0.7333936 0.7305631 0.7226695 0.7137972 0.7110962 0.7041702
# Threshold distances tested
dmc[[3]]
#>  [1]   500.00  1000.00  1500.00  2000.00  2500.00  3000.00  3500.00  4000.00
#>  [9]  4500.00  5000.00  5500.00  6000.00  6500.00  7000.00  7500.00  8000.00
#> [17]  8500.00  9000.00  9500.00 10000.00 10230.05

The figure below represents the evolution of the correlation coefficient values when distance thresholds increase.

The scatter plot is displayed with the function scatter_dist.

scatter_dist(mat_gd = mat_dps, mat_ld = mat_ld,
pts_col = "black")
#> 1225 out of 1225 values were used.

In this particular case, we notice a type-IV pattern of isolation by distance with a “plateau” in the relationship between cost-distance and genetic-distance (DPS). Graph pruning will be needed to select the population pairs to include in the inference of landscape effects on dispersal.

In the following section, we present the different pruning methods available.

### Pruning based on distance thresholds

To prune a graph whose links are weighted by distances, you can remove all the links associated to geographical or genetic distances larger (or lower) than a specific threshold distance. This distance can for example be equal to the maximum dispersal distance of an individual of the study species at the scale of its lifespan so that the resulting graph represents the direct dispersal paths of the species. It can also be equal to the DMC if the objective is to infer landscape effects on dispersal.

The function gen_graph_thr takes as arguments a distance matrix used to weight the links of the resulting graph (mat_w) and a distance matrix on which the “thresholding” is based (mat_thr). The selected links are selected according to the values of this latter matrix. The argument thr is the numerical value of the threshold distance. If mat_thr is not specified, mat_w is used by default for the thresholding. Lastly, you have to specify if the links to remove take larger or lower values than the threshold value.

graph_thr <- gen_graph_thr(mat_w = mat_dps, mat_thr = mat_geo,
thr = 12000, mode = "larger")
graph_thr
#> IGRAPH a8b03dc UNW- 50 162 --
#> + attr: name (v/c), weight (e/n)
#> + edges from a8b03dc (vertex names):
#>  [1] 1 --2  1 --4  1 --5  1 --6  1 --9  10--12 10--20 10--21 10--8  11--12
#> [11] 11--16 11--18 11--20 11--4  11--5  11--8  11--9  12--18 12--20 12--21
#> [21] 12--8  13--14 13--15 13--16 13--17 13--2  13--22 13--5  13--6  13--7
#> [31] 14--15 14--16 14--17 14--22 14--5  14--6  15--16 15--17 15--22 15--5
#> [41] 15--9  16--17 16--18 16--22 16--26 16--4  16--5  16--9  17--19 17--22
#> [51] 17--23 17--27 17--28 17--6  18--20 18--21 18--24 18--25 18--26 18--8
#> [61] 18--9  19--23 19--27 19--7  2 --4  2 --5  2 --6  2 --9  20--21 20--24
#> [71] 20--25 20--26 20--8  21--24 21--29 22--26 22--28 22--30 23--27 23--28
#> + ... omitted several edges

The function returns a graph in the form of an igraph object, which is consequently compatible with all functions from igraph package (Csardi and Nepusz 2006), one of the most used R package to create and analyse graphs (together with sna and networks). In the latter example, the graph has 50 nodes and 162 links when we prune it using a 12-km distance threshold. Its links are weighted with the values of the mat_dps matrix.

### Pruning based on topological constraints

A graph can be pruned according to a topological criterion. The function gen_graph_topo can use 4 different criteria. As with the previous function, topological criteria are applied by considering the distance values of the mat_topo matrix, but the links are weighted with the values of the mat_w matrix (except when mat_topo is not specified, cf. previous section).

Gabriel graph: in the created graph, two nodes are connected by a link if, when we draw a circle whose center is set at the middle of the segment linking them and whose radius is equal to half the length of this segment, there is no other node inside the circle. In mathematical terms, it means that there is a segment between $$x$$ and $$y$$ if and only if for every other point $$z$$, we have: $$d_{xy}\leq \sqrt{d_{xz}^{2}+d_{yz}^{2}}$$. We can compute such a graph from geographical distances (Gabriel and Sokal 1969) (graph_gab_geo below) but also, less commonly, from genetic distances (Naujokaitis-Lewis et al. 2013) (graph_gab_gen below). In the latter case, it is to some extent as if Pythagoras’s theorem was applied to genetic distances, which can seem a bit strange even if this method has already been used by Naujokaitis-Lewis et al. (2013).

graph_gab_geo <- gen_graph_topo(mat_w = mat_dps, mat_topo = mat_geo,
topo = "gabriel")
graph_gab_geo
#> IGRAPH a91b5bd UNW- 50 98 --
#> + attr: name (v/c), weight (e/n)
#> + edges from a91b5bd (vertex names):
#>  [1] 1 --2  1 --5  10--12 10--21 11--12 11--16 11--18 11--8  11--9  12--20
#> [11] 12--8  13--14 13--17 13--5  13--6  14--15 14--17 14--5  15--16 15--22
#> [21] 15--5  16--18 16--22 16--9  17--19 17--23 17--27 17--28 18--20 18--25
#> [31] 19--23 19--7  2 --6  20--21 20--24 20--25 21--24 21--29 22--26 22--28
#> [41] 23--27 24--25 24--29 24--31 25--26 25--31 25--32 26--30 27--28 27--33
#> [51] 28--30 28--34 28--36 29--31 29--37 3 --7  30--32 30--34 31--32 31--35
#> [61] 31--37 32--34 32--35 33--36 33--42 34--36 34--44 35--37 35--39 35--45
#> [71] 36--40 37--38 37--39 37--41 38--41 38--43 39--41 39--49 4 --5  4 --9
#> + ... omitted several edges
graph_gab_gen <- gen_graph_topo(mat_w = mat_dps, mat_topo = mat_dps,
topo = "gabriel")

Minimum Spanning Tree (MST): it creates a minimum spanning tree, i.e a graph in which every node is connected by a link to at least another node and whose total links’ weight is minimum. By definition, its number of links is equal to the number of nodes - 1.

graph_mst <- gen_graph_topo(mat_w = mat_dps, mat_topo = mat_dps,
topo = "mst")
graph_mst
#> IGRAPH a97b081 UNW- 50 49 --
#> + attr: name (v/c), weight (e/n)
#> + edges from a97b081 (vertex names):
#>  [1] 1 --2  1 --4  10--8  11--12 11--18 11--20 11--4  12--8  13--14 13--6
#> [11] 14--15 15--17 15--22 15--28 16--23 17--23 19--23 2 --7  20--25 21--24
#> [21] 23--27 24--29 25--29 25--30 26--30 27--33 3 --6  3 --7  30--34 31--32
#> [31] 32--34 32--35 32--37 36--40 37--38 38--43 39--43 4 --9  40--42 41--43
#> [41] 41--49 42--48 43--45 43--47 44--45 44--48 46--47 48--50 5 --9

“Percolation” graph: the graph is created by removing iteratively some links, beginning with those with the highest weights until the graph breaks into more than one component. We conserve the link whose removal entails the creation of another component to obtain a connected graph. This method is also called the edge-thinning method (Urban et al. 2009). Such a method is linked to percolation theory (Rozenfeld et al. 2008). The function gen_graph_topo indicates the number of conserved links and the weight of the link whose removal disconnects the graph (maximum link weight of the created graph).

graph_percol <- gen_graph_topo(mat_w = mat_dps, mat_topo = mat_dps,
topo = "percol")
#> Number of conserved links : 325
#> Maximum weight of the conserved links : 0.7525

Complete graph: the function allows to create a complete graph from a distance matrix. In that case, there is no pruning and, by definition, all population pairs are connected.

graph_comp <- gen_graph_topo(mat_w = mat_dps, mat_topo = mat_dps,
topo = "comp")

### Pruning based on the conditional independence principle

The last pruning method implemented by the graph4lg package is based upon the conditional independence principle. The function gen_graph_indep is largely inspired by the function popgraph created by R. Dyer (Dyer and Nason 2004), but does not need the package popgraph to function. Besides, as some calculations are performed with functions from the adegenet package (coded in C), it is faster than the original popgraph function. It is also more flexible than popgraph function given you can vary i) the way we compute genetic distances used to weight the links and to compute the covariance between populations, ii) the formula used to compute the covariance from squared distances or alternatively simple distances, iii) the statistical tolerance threshold, iv) the p-values adjustment and v) the returned objects created by the function. Without entering further into the details, here is an implementation example.

graph_ci <- gen_graph_indep(x = data_genind,
dist = "PCA",
cov = "sq",
adj = "holm")
graph_ci
#> IGRAPH 326a1c8 UNW- 50 105 --
#> + attr: name (v/c), weight (e/n)
#> + edges from 326a1c8 (vertex names):
#>  [1] 1 --2  1 --3  1 --4  1 --9  10--11 10--8  11--12 11--18 11--20 12--20
#> [11] 12--25 12--8  13--14 13--15 13--22 13--6  14--15 14--17 14--19 14--27
#> [21] 14--32 14--6  15--17 15--23 15--28 16--22 16--23 16--33 16--41 17--23
#> [31] 17--33 18--19 18--20 18--9  19--23 19--27 2 --5  2 --7  20--4  20--5
#> [41] 21--24 21--29 21--45 22--23 22--8  23--27 23--49 24--29 25--26 25--3
#> [51] 25--30 25--32 25--37 26--3  26--30 26--34 26--39 26--44 26--8  27--33
#> [61] 3 --31 3 --6  3 --7  30--34 30--7  31--32 31--35 31--36 32--34 32--35
#> [71] 32--8  33--40 34--6  35--37 36--40 36--48 37--38 37--39 37--46 38--43
#> + ... omitted several edges

## Graphs’ processing and analysis

Once the graphs have been created, you can perform calculation from them, visualize and export them.

### Graphical application

#### Representation of the graph on a map

The function plot_graph_lg integrates functions from igraph and ggplot2 to represent graphs on a map. Most frequently, graphs are spatial and a table with populations’ coordinates must be given as an argument. It must have exactly the same structure as the table given as an argument to mat_geo_dist (3 columns : ID, x, y). The visual representation can make visible the links’ weights by plotting the links with a width proportional to the weight (width = "w") or the inverse weight (width = "inv") of the links.

For example, with the graph graph_mst :

p <- plot_graph_lg(graph = graph_mst, crds = crds_patches,
mode = "spatial", weight = TRUE, width = "inv")
p

If the populations’ spatial coordinates are not available, you can still display the graph on a two-dimensional plane. In that case, the nodes’ positions are computed with Fruchterman and Reingold (1991) algorithm to optimize the representation. With the graph graph_mst, we obtain:

p <- plot_graph_lg(graph = graph_mst, crds = crds_patches,
mode = "aspatial", weight = TRUE, width = "inv")
p

#### Representation of the graph’s modules on a map

You can also plot the results of a graph’s partition carried out by computing a modularity index with the function plot_graph_modul. This function is similar to plot_graph_lg. The main difference is that the graph’s nodes from the same modules are displayed in the same color.

Several algorithms can be used: fast greedy (Clauset, Newman, and Moore 2004), louvain (Blondel et al. 2008), optimal (Brandes et al. 2008) and walktrap (Pons and Latapy 2006). The number of created modules in each graph is adjustable but can be fixed depending on the optimal values of the modularity indices. Besides, the modularity calculation can take into account the links’ weights or not. When taken into account, the weight given to a link in the calculation will be i) proportional to or ii) inversely proportional to the genetic or landscape distance to which it is associated.

In the following example, the graph is partitioned into 6 modules with the fast_greedy algorithm (default option).

plot_graph_modul(graph = graph_gab_geo, crds = crds_patches)