# psSubpathway: a software package for flexible identification of phenotype specific subpathways in the cancer progression

## Introduction

psSubpathway is a method designed to discover the relationship between subpathway and multiple sample phenotype.psSubpathway consists of four parts:

1. Extracting subpathways from canonical pathways. We used k-clique method in social network analysis to extract the subpathways and eliminated the smaller subpathways that had an overlap above 80%; The subpathway data is stored in a list structure.
2. Inferring subpathway activity profiles. The smaple-specific subpathway activity profiles was inferred by using sample-based gene set enrichment method (GSVA or ssGSEA);
3. Identifying subtype-specific subpathways. We developed a Subtype Set Enrichment Analysis (SubSEA) method to identify subtype-specific subpathways across multi-subtype groups of cancer samples;
4. Identifying the dynamic changed subpathways. We developed a Dynamic Changed Subpathway Analysis (DCSA) method to identify the dynamic changed subpathways associated with the developmental stage of cancer.

Its capabilities render psSubpathway could find the specific dysregulated subpathways in multi-phenotypes samples, and fill the gap in recent subpathway identification methods which generally found the differentially expressed subpathways between normal and cancer samples. psSubpathway may identify more precision biomarkers and phenotype-related disease mechanisms to help to tailored treatment for patients with cancer.

## SubSEA：identifying subtype-specific subpathways

The function SubSEA can identify subtype-specific subpathways across multi-subtype groups of cancer samples. For each subpathway, we ranked the N samples in the dataset to form a sample list L=<s1, s2…sN> according to decreasing subpathway activity. The samples in a given subtype were mapped to the sample list L. If the samples in the subtype significantly cluster at the top or bottom of the entire ranked list L, the subpathway will be associated with the specific subtype. We used the weighted Kolmogorov-Smirnov statistic to calculate an sample enrichment score (SES), which reflects the degree to which the samples in a subtype is overrepresented toward the extremes (top or bottom) of the sample list L. The SES is calculated by walking down the list L, increasing a running-sum statistic when we encounter a sample in the subtype and decreasing it when we encounter a sample not in the subtype.

This function requires user input the gene expression profile identified and the corresponding phenotype file of the sample (CLS format).In addition, this function requires input of subpathway list data,which has been extracted from KEGG pathway data and saved into the package environment variables. Of course, users can also define their own dataset list data as input, as long as the gene identification and gene expression profile of gene sets are maintained.

We selected the breast cancer gene expression profile data in the GDC TCGA database and pm50 phenotype file containing four breast cancer subtypes for SubSEA and obtained the results of subpathways related to each subtype of breast cancer. The commands are as follows.

require(GSVA)
require(parallel)
# get breast cancer disease subtype gene expression profile.
Bregenematrix<-get("Subgenematrix")
# get path of the sample disease subtype files.
Subtypelabels<- system.file("extdata", "Sublabels.cls", package = "psSubpathway")
# perform the SubSEA method.
#SubSEAresult<-SubSEA(Bregenematrix,
#                     input.cls=Subtypelabels,
#                     nperm=50,                     # Number of random permutations
#                     fdr.th=0.01,                  # Cutoff value for fdr.when fdr.th=1,get all subpathway
#                     parallel.sz=2)                # Number of processors to use

# get the result of the SubSEA function
SubSEAresult<-get("Subspwresult")
str(SubSEAresult)
#> List of 5
#>  $Basal :'data.frame': 15 obs. of 6 variables: #> ..$ SubpathwayID  : Factor w/ 15 levels "00120_19","00120_9",..: 2 1 3 4 5 6 7 8 9 10 ...
#>   ..$PathwayName : Factor w/ 14 levels "Apoptosis","C-type lectin receptor signaling pathway",..: 10 10 3 13 8 6 4 11 1 14 ... #> ..$ SubpathwayGene: Factor w/ 15 levels "AMACR ACOX2 HSD17B4 SCP2",..: 1 13 6 7 9 3 15 12 2 4 ...
#>   ..$SES : num [1:15] -0.885 -0.829 -0.848 -0.767 -0.665 ... #> ..$ Pvalue        : num [1:15] 0 0 0 0 0 0 0 0 0 0 ...
#>   ..$FDR : num [1:15] 0 0 0 0 0 0 0 0 0 0 ... #>$ Her2     :'data.frame':   25 obs. of  6 variables:
#>   ..$SubpathwayID : Factor w/ 25 levels "00010_6","00020_4",..: 1 2 3 4 5 6 7 8 9 10 ... #> ..$ PathwayName   : Factor w/ 24 levels "Arginine and proline metabolism",..: 12 3 8 1 16 11 24 22 5 17 ...
#>   ..$SubpathwayGene: Factor w/ 25 levels "ADCY1 ADCY2 ADCY3 ADCY5 ADCY6 ADCY7 ADCY8 ADCY9 ADCY4 PRKACA PRKACB PRKACG CREB1 ATF2 CREM ATF1 ATF3 ATF4 XBP1 "| __truncated__,..: 3 18 2 23 13 22 12 24 7 16 ... #> ..$ SES           : num [1:25] 0.704 0.681 0.725 0.817 0.711 ...
#>   ..$Pvalue : num [1:25] 0 0 0 0 0 0 0 0 0 0 ... #> ..$ FDR           : num [1:25] 0 0 0 0 0 0 0 0 0 0 ...
#>  $LumA :'data.frame': 49 obs. of 6 variables: #> ..$ SubpathwayID  : Factor w/ 49 levels "00052_2","00340_5",..: 1 2 3 5 4 6 7 8 9 10 ...
#>   ..$PathwayName : Factor w/ 36 levels "Adipocytokine signaling pathway",..: 15 18 32 11 11 30 13 21 7 14 ... #> ..$ SubpathwayGene: Factor w/ 49 levels "ADCY1 ADCY2 ADCY3 ADCY5 ADCY6 ADCY7 ADCY8 ADCY9 ADCY4 PRKACA PRKACB PRKACG CREB1 ATF2 CREM ATF1 ATF3 ATF4 XBP1 "| __truncated__,..: 3 23 14 17 12 49 16 42 41 10 ...
#>   ..$SES : num [1:49] -0.731 0.773 -0.554 0.677 0.703 ... #> ..$ Pvalue        : num [1:49] 0 0 0 0 0 0 0 0 0 0 ...
#>   ..$FDR : num [1:49] 0 0 0 0 0 0 0 0 0 0 ... #>$ LumB     :'data.frame':   22 obs. of  6 variables:
#>   ..$SubpathwayID : Factor w/ 22 levels "00350_2","00563_1",..: 1 2 3 4 5 6 7 8 9 10 ... #> ..$ PathwayName   : Factor w/ 22 levels "Amyotrophic lateral sclerosis (ALS)",..: 22 7 18 15 6 4 12 8 2 13 ...
#>   ..$SubpathwayGene: Factor w/ 22 levels "ADAM17 HBEGF ADAM10 CXCL8 CXCR1 CXCR2 EGFR",..: 6 17 7 11 10 14 12 18 19 21 ... #> ..$ SES           : num [1:22] 0.576 0.559 0.608 0.627 -0.658 ...
#>   ..$Pvalue : num [1:22] 0 0 0 0 0 0 0 0 0 0 ... #> ..$ FDR           : num [1:22] 0 0 0 0 0 0 0 0 0 0 ...
#>  $spwmatrix: num [1:725, 1:67] 0.1892 0.0467 0.1907 0.0874 0.1993 ... #> ..- attr(*, "dimnames")=List of 2 #> .. ..$ : chr [1:725] "00010_1" "00010_2" "00010_5" "00010_6" ...
#>   .. ..$: chr [1:67] "Her2" "LumA" "LumB" "LumA" ... head(SubSEAresult$Basal)
#>          SubpathwayID
#> 00120_9       00120_9
#> 00120_19     00120_19
#> 00232_1       00232_1
#> 00280_10     00280_10
#> 00510_8       00510_8
#> 00601_8       00601_8
#>                                                         PathwayName
#> 00120_9                              Primary bile acid biosynthesis
#> 00120_19                             Primary bile acid biosynthesis
#> 00232_1                                         Caffeine metabolism
#> 00280_10                 Valine, leucine and isoleucine degradation
#> 00510_8                                       N-Glycan biosynthesis
#> 00601_8  Glycosphingolipid biosynthesis - lacto and neolacto series
#>                                                                                                                                                                           SubpathwayGene
#> 00120_9                                                                                                                                                         AMACR ACOX2 HSD17B4 SCP2
#> 00120_19                                                                                                                                             SLC27A5 CYP27A1 AMACR ACOX2 HSD17B4
#> 00232_1                                                                                                                                                      CYP1A2 XDH NAT2 NAT1 CYP2A6
#> 00510_8                                                                                                   FUT8 MGAT3 MGAT2 MAN2A2 MAN2A1 MAN1A2 MAN1B1 MAN1A1 MAN1C1 MGAT4B MGAT4A MGAT1
#> 00601_8  B4GALT4 B4GALT3 B4GALT2 FUT4 FUT7 FUT3 FUT5 FUT6 ST8SIA1 ST3GAL6 FUT9 B3GNT3 B3GNT2 B3GNT4 B4GALT1 ABO FUT1 FUT2 B3GNT5 B3GALT1 B3GALT2 B3GALT5 ST3GAL3 ST3GAL4 A4GALT B3GALNT1
#>               SES Pvalue FDR
#> 00120_9  -0.88462      0   0
#> 00120_19 -0.82945      0   0
#> 00232_1  -0.84790      0   0
#> 00280_10 -0.76735      0   0
#> 00510_8  -0.66495      0   0
#> 00601_8   0.74271      0   0

## Visualize

We provide a set of visual analysis functions including plotSubSEScurve,plotSpwACmap,plotSpwNetmap and plotheatmap.

The plotSubSEScurve function can plot the subtype set sample enrichment curve graph of the subpathway. The follows commands can plot the subtype set sample enrichment curve graph and we can see the enrichment distribution of the four subtypes of breast cancer samples in the subpathway 00120_9 activity. We can observe that the sample of subtype Basal is specifically enriched to the low expression region of subpathway 00120_9.

# plot enrichment score curve of the subpathway 00120_9 in all breast cancer subtypes
plotSubSEScurve(Subspwresult,
spwid="00120_9", # The subpathway id which subpahtway is ploted
phenotype="all")  # Which phenotypic phenotype set enrichment curve is drawn

# plot enrichment score curve of the subpathway 00120_9 in the Basal breast cancer subtype.
plotSubSEScurve(Subspwresult,spwid="00120_9",phenotype="Basal")

The plotSpwACmap function can plot subpathway activity change map (includes subpathway active change box plot and subpathway active change heat map). We can observe the changes in subpathway activity values in breast cancer subtypes or stages and the distribution of each subtype samples. The commands are as follows:

# plot the subpathway 00120_9 in the SubSEA function result.
plotSpwACmap(Subspwresult,spwid="00120_9")

# plot the subpathway 00982_2 in the DCSA function result.
plotSpwACmap(DCspwresult,spwid="00982_2")

The plotheatmap function presents a heat map of the subpathway matrix according to the user’s set conditions. The following commands plot heat map of significant up-regulation or down-regulation subpathways. We can observe the obvious block area from the heat map in each subtype.

require(pheatmap)
# plot significant up-regulation subpathway heat map specific for each breast cancer subtype.
plotheatmap(Subspwresult,
fdr.th=0.01,      # Cutoff value for fdr
plotSubSEA=TRUE,  # Indicate that the input data is the result of the SubSEA function
SES="positive",   # Obtain a subpathway with a positive SES value
phenotype="all")

# plot significant down-regulation subpathway heat map specific for each breast cancer subtype.
plotheatmap(Subspwresult,plotSubSEA=TRUE,fdr.th=0.01,SES="negative",phenotype="all")

We can also draw a single subtype-specific significant pathway heat map. We can see that there are distinct specific regions under the Basal subtype sample. The commands and results are as follows:

# plot Basal subtype specific significant subpathway heat map.
plotheatmap(Subspwresult,plotSubSEA=TRUE,fdr.th=0.01,SES="all",phenotype="Basal")

Since the DCSA function is used to mine the subpathways that change dynamically with the phenotype, the subpathway heat map is scattered. As follows:

# plot a heat map of the subpathway that is significantly associated with breast cancer stages.
plotheatmap(DCspwresult,plotSubSEA=FALSE,fdr.th=0.01)

The plotSpwPSheatmap function presents a heat map of the T-test P-value of the activity of the subpathway between the phenotypes. The lower the number in the cells in the heat map, the greater the difference in the activity of the subpathways between the two phenotypes. The following commands plot the heat map. The activity of subpathway 00120_9 is significantly different from other subtypes in the breast cancer basal subtype.

# get the Subspwresult which is the result of SubSEA method.
Subspwresult<-get("Subspwresult")
# plot significant heat map between the activity of the subpathway in each subtype of breast cancer.
plotSpwPSheatmap(Subspwresult,spwid="00120_9")

The plotSpwPSheatmap function can also be used for the results of the DCSA method.

# get the DCspwresult which is the result of DCSA method.
DCspwresult<-get("DCspwresult")
##' # plot significant heat map between the activity of the subpathway in each stage of breast cancer.
plotSpwPSheatmap(DCspwresult,spwid="00982_2")

The function plotSpwNetmap for visualization of a subpathway network map. The commands are as follows: