This document contains the statistical analysis performed in the publication Pathobiochemical signatures of cholestatic liver disease in bile duct ligated mice.
A comprehensive data set of serum markers, histological parameters and transcript profiles was compiled at 8 time points after bile duct ligation (BDL) in mice, comprising different stages of the disease. The data set consists of \(N_{r}=5\) repeats (\(N_{r}=3\) for CTGF, \(\alpha\)-SMA and S100a4) for \(N_{t}=8\) time points denoted by \(t_1,..., t_{N_t}\) consisting of a total \(N_{f}=153\) measured parameters in the following referred to as factors (Fluidigm gene expression, biochemical markers, and (immuno-)histochemical measurements).
The following naming conventions are used
The main steps of the analysis comprise
The HTML version of this document is available from
http://matthiaskoenig.github.io/bdl-analysis/
All data sets, source code and documentation are provided at
https://github.com/matthiaskoenig/bdl-analysis
All results of this analysis are written to the results directory defined via the BDL_RESULTS
environment variable. To reproduce this analysis create the respective variable and run the Koenig_BDL_analysis.Rmd
file.
# read results directory from environment variable
resultsPath <- Sys.getenv("BDL_RESULTS")
if (identical(resultsPath, "")){
stop("No results directory defined, set the BDL_RESULTS environment variable")
}
# create directory to store data sets of analysis
dir.create(file.path(resultsPath, 'data'), showWarnings=FALSE)
In a first step the processed data sets are loaded from the data
folder: These are the time course data for all factors (BDLdata
), additional information for the factors (BDLfactors
), the sample definition, i.e. the assignment of sample ids to respective time point and repeat (BDLsamples
), and a mapping of the Fluidigm (gene) probe ids to UniProt identifiers and names (BDLprobes
).
suppressPackageStartupMessages(library(BDLanalysis))
suppressPackageStartupMessages(library(calibrate))
suppressPackageStartupMessages(library(pander))
dir.create(file.path(resultsPath, 'control'), showWarnings=FALSE)
# path definition
baseLoc <- system.file(package="BDLanalysis")
extPath <- file.path(baseLoc, "extdata")
# load data
data(BDLdata)
data(BDLsamples)
data(BDLfactors)
data(BDLprobes)
# counters
Nr <- 5 # repeats
Nt <- length(levels(BDLsamples$time_fac)) # Nt=8 time points
# store all data sets in the results folder
save(BDLdata, file=file.path(resultsPath, "data", "BDLdata.Rdata"))
save(BDLsamples, file=file.path(resultsPath, "data", "BDLsamples.Rdata"))
save(BDLfactors, file=file.path(resultsPath, "data", "BDLfactors.Rdata"))
save(BDLprobes, file=file.path(resultsPath, "data", "BDLprobes.Rdata"))
In addition to the individual sample data, the mean data averaged over the \(N_{r}\) repeats per time points is used in parts of the analysis. The mean factor data set is calculated once via
BDLmean <- bdl_mean_data(BDLdata, BDLsamples)
BDLmean.time <- as.numeric(levels(as.factor(BDLsamples$time)))
In total 153 factors were measured in the this BDL study falling in the categories: Biochemistry, GE_ADME, GE_Cytokines, GE_Fibrosis, Histochemistry. The majority of factors belongs hereby to the 3 fluidigm chips with 47 probes per chip (one probe was filtered from GE fibrosis during preprocessing of the chips).
An overview of the number of factors per category is provided in the following table
cat_table <- as.data.frame(table(BDLfactors$ftype))
colnames(cat_table) <- c("Category", "Freq")
set.caption(sub(".", " ", "Factors per category", fixed = TRUE))
pander(cat_table)
Category | Freq |
---|---|
Biochemistry | 4 |
GE_ADME | 47 |
GE_Cytokines | 47 |
GE_Fibrosis | 46 |
Histochemistry | 9 |
rm(cat_table)
In the following an overview of the gene probes is given providing full names and links to UniProt.
# create data frame with probe information
create_probe_df <- function() {
# get the gene factors
f_names <- colnames(BDLdata)[BDLfactors$ftype.short == ""]
# get the probe information
Nf = length(f_names)
probe_names <- rep(NA, length=Nf)
probe_uniprot <- rep(NA, length=Nf)
probe_genes <- rep(NA, length=Nf)
names(probe_names) <- f_names
names(probe_uniprot) <- f_names
names(probe_genes) <- f_names
for (name in colnames(BDLdata)){
probe_info <- ProbeInformation(name)
if (!is.null(probe_info)){
if (!is.null(probe_info$Protein.name)){
probe_names[name] <- probe_info$Protein.name
probe_uniprot[name] <- probe_info$Entry
probe_genes[name] <- probe_info$Gene.name
}
}
}
probe.df <- data.frame(name=probe_names, genes=probe_genes, uniprot=probe_uniprot)
# sort
probe.df <- probe.df[order(rownames(probe.df)), ]
# remove the double Act probes
probe.df <- probe.df[!(rownames(probe.df) %in% c("Actb.x", "Actb.y")) , ]
return(probe.df)
}
# print probe information
probe.df <- create_probe_df()
options(width=200)
print(probe.df)
name genes uniprot
Abcb1a Multidrug resistance protein 1A Abcb1a, Abcb4, Mdr1a, Mdr3, Pgy-3, Pgy3 P21447
Abcc2 Canalicular multispecific organic a... Abcc2 Q8VI47
Abcg2 ATP-binding cassette sub-family G m... Abcg2, Abcp, Bcrp1 Q7TMS5
Acta2 Actin, aortic smooth muscle Acta2, Actsa, Actvs P62737
Actb Actin, cytoplasmic 1 Actb P60710
Ahr Aryl hydrocarbon receptor Ahr P30561
Bad Bcl2-associated agonist of cell dea... Bad, Bbc6 Q61337
Bak1 Bcl-2 homologous antagonist/killer Bak1, Bak O08734
Bax Apoptosis regulator BAX Bax Q07813
Bcl2l11 Bcl-2-like protein 11 Bcl2l11, Bim O54918
Birc5 Baculoviral IAP repeat-containing p... Birc5, Api4, Iap4 O70201
Ccl2 C-C motif chemokine 2 Ccl2, Je, Mcp1, Scya2 P10148
Ccl3 C-C motif chemokine 3 Ccl3, Mip1a, Scya3 P10855
Ccl4 C-C motif chemokine 4 Ccl4, Mip1b, Scya4 P14097
Ccl5 C-C motif chemokine 5 Ccl5, Scya5 P30882
Ccl7 C-C motif chemokine 7 Ccl7, Fic, Mcp3, Scya7 Q03366
Ccl8 C-C motif chemokine 8 Ccl8, Mcp2, Scya8 Q9Z121
Ccr2 C-C chemokine receptor type 2 Ccr2, Cmkbr2 P51683
Ccr3 Probable C-C chemokine receptor typ... Ccr3, Cmkbr1l2, Cmkbr3 P51678
Ccr5 C-C chemokine receptor type 5 Ccr5, Cmkbr5 P51682
Cd14 Monocyte differentiation antigen CD... Cd14 P10810
Cd69 Early activation antigen CD69 Cd69 P37217
Cd86 T-lymphocyte activation antigen CD8... Cd86 P42082
Cdh1 Cadherin-1 Cdh1 P09803
Cdh2 Cadherin-2 Cdh2 P15116
Cebpa CCAAT/enhancer-binding protein alph... Cebpa, Cebp P53566
Cebpb CCAAT/enhancer-binding protein beta Cebpb P28033
Cebpd CCAAT/enhancer-binding protein delt... Cebpd, Crp3 Q00322
Ch25h Cholesterol 25-hydroxylase Ch25h Q9Z0F5
Col1a1 Collagen alpha-1(I) chain Col1a1, Cola1 P11087
Col3a1 Collagen alpha-1(III) chain Col3a1 P08121
Col4a3 Collagen alpha-3(IV) chain Col4a3 Q9QZS0
Col6a6 Collagen alpha-6(VI) chain Col6a6 Q8C6K9
Col8a1 Collagen alpha-1(VIII) chain Col8a1 Q00780
Ctgf Connective tissue growth factor Ctgf, Ccn2, Fisp-12, Fisp12, Hcs24 P29268
Cxcl1 Growth-regulated alpha protein Cxcl1, Gro, Gro1, Mgsa, Scyb1 P12850
Cxcl15 C-X-C motif chemokine 15 Cxcl15, Scyb15 Q9WVL7
Cxcl2 C-X-C motif chemokine 2 Cxcl2, Mip-2, Mip2, Scyb2 P10889
Cxcl3 C-X-C motif chemokine 3 Cxcl3, Dcip1, Gm1960 Q6W5C0
Cxcl5 C-X-C motif chemokine 5 Cxcl5, Scyb5 P50228
Cxcr1 C-X-C chemokine receptor type 1 Cxcr1, Il8ra Q810W6
Cxcr2 C-X-C chemokine receptor type 2 Cxcr2, Cmkar2, Gpcr16, Il8rb P35343
Cyp1a2 Cytochrome P450 1A2 Cyp1a2, Cyp1a-2 P00186
Cyp24a1 1,25-dihydroxyvitamin D(3) 24-hydro... Cyp24a1, Cyp-24, Cyp24 Q64441
Cyp2b10 Cytochrome P450 2B10 Cyp2b10, Cyp2b-10, Cyp2b20 P12791
Cyp2c29 Cytochrome P450 2C29 Cyp2c29 Q64458
Cyp2c37 Cytochrome P450 2C37 Cyp2c37 P56654
Cyp2c39 Cytochrome P450 2C39 Cyp2c39 P56656
Cyp2d22
Cyp2e1 Cytochrome P450 2E1 Cyp2e1, Cyp2e, Cyp2e-1 Q05421
Cyp3a11 Cytochrome P450 3A11 Cyp3a11, Cyp3a-11 Q64459
Cyp4a10 Cytochrome P450 4A10 Cyp4a10, Cyp4a-10 O88833
Cyp7a1 Cholesterol 7-alpha-monooxygenase Cyp7a1, Cyp7 Q64505
Dpyd Dihydropyrimidine dehydrogenase [NA... Dpyd Q8CHR6
Edn1 Endothelin-1 Edn1 P22387
Egf Pro-epidermal growth factor Egf P01132
Egfr Epidermal growth factor receptor Egfr Q01279
Fasl Tumor necrosis factor ligand superf... Faslg, Apt1lg1, Cd95l, Fasl, gld, Tnfsf6 P41047
Fn1 Fibronectin Fn1 P11276
Gdf2 Growth/differentiation factor 2 Gdf2, Bmp9 Q9WV56
Gsta2 Glutathione S-transferase A2 Gsta2 P10648
Gstm1 Glutathione S-transferase Mu 1 Gstm1 P10649
Gstp1 Glutathione S-transferase P 1 Gstp1, Gstpib P19157
Hgf Hepatocyte growth factor receptor Met P16056
Hk2 Hexokinase-2 Hk2 O08528
Hmox1 Heme oxygenase 1 Hmox1 P14901
Hnf4a Hepatocyte nuclear factor 4-alpha Hnf4a, Hnf-4, Hnf4, Nr2a1, Tcf14 P49698
Ifna1 Interferon alpha-1 Ifna1 P01572
Ifnar1 Interferon alpha/beta receptor 1 Ifnar1, Ifar, Ifnar P33896
Ifnar2 Interferon alpha/beta receptor 2 Ifnar2 O35664
Ifnb1 Interferon beta Ifnb1, Ifb, Ifnb P01575
Ifng Interferon gamma Ifng P01580
Igf1 Insulin-like growth factor I Igf1, Igf-1 P05017
Il10 Interleukin-10 Il10, Il-10 P18893
Il10ra Interleukin-10 receptor subunit alp... Il10ra, Il10r Q61727
Il10rb Interleukin-10 receptor subunit bet... Il10rb, Crfb4 Q61190
Il13 Interleukin-13 Il13, Il-13 P20109
Il17a Interleukin-17A Il17a, Ctla8, Il17 Q62386
Il1b Interleukin-1 beta Il1b P10749
Il1rn Interleukin-1 receptor antagonist p... Il1rn, Il-1ra P25085
Il2 Interleukin-2 Il2, Il-2 P04351
Il28b Interferon lambda-3 Ifnl3, Il28, Il28b Q8CGK6
Il4 Interleukin-4 Il4, Il-4 P07750
Il6 Interleukin-6 Il6, Il-6 P08505
Il6st Interleukin-6 receptor subunit beta Il6st Q00560
Lama1 Laminin subunit alpha-1 Lama1, Lama, Lama-1 P19137
Met Hepatocyte growth factor receptor Met P16056
Mki67 MKI67 FHA domain-interacting nucleo... Nifk, Mki67ip Q91VE6
Mrc1 Macrophage mannose receptor 1 Mrc1 Q61830
Nes Nestin Nes Q6P5H2
Nfkbia NF-kappa-B inhibitor alpha Nfkbia, Ikba Q9Z1E3
Nos2 Nitric oxide synthase, inducible Nos2, Inosl P29477
Notch1 Neurogenic locus notch homolog prot... Notch1, Motch Q01705
Notch3 Neurogenic locus notch homolog prot... Notch3 Q61982
Nr0b2 Nuclear receptor subfamily 0 group ... Nr0b2, Shp Q62227
Nr1h3 Oxysterols receptor LXR-alpha Nr1h3, Lxra Q9Z0Y9
Nr1i2 Nuclear receptor subfamily 1 group ... Nr1i2, Pxr O54915
Nr1i3 Nuclear receptor subfamily 1 group ... Nr1i3, Car O35627
Nr2f1 COUP transcription factor 1 Nr2f1, Erbal3, Tfcoup1 Q60632
Nr2f2 COUP transcription factor 2 Nr2f2, Aporp1, Arp-1, Arp1, Tfcoup2 P43135
Nr3c1 Glucocorticoid receptor Nr3c1, Grl, Grl1 P06537
Osm Oncostatin-M Osm P53347
Osmr Oncostatin-M-specific receptor subu... Osmr, Osmrb O70458
Pde4a cAMP-specific 3',5'-cyclic phosphod... Pde4a O89084
Pde4b cAMP-specific 3',5'-cyclic phosphod... Pde4a O89084
Pde4d cAMP-specific 3',5'-cyclic phosphod... Pde4d Q01063
Pdgfb Platelet-derived growth factor subu... Pdgfb, Sis P31240
Por NADPH--cytochrome P450 reductase Por P37040
Ppara Peroxisome proliferator-activated r... Ppara, Nr1c1, Ppar P23204
Pparg Peroxisome proliferator-activated r... Pparg, Nr1c3 P37238
Prom1 Prominin-1 Prom1, Prom, Proml1 O54990
Pten Phosphatidylinositol 3,4,5-trisphos... Pten, Mmac1 O08586
Ptgs2 Prostaglandin G/H synthase 2 Ptgs2, Cox-2, Cox2, Pghs-b, Tis10 Q05769
Rarres1 Cytosolic carboxypeptidase 2 Agbl2, Ccp2 Q8CDK2
Rps18 40S ribosomal protein S18 Rps18 P62270
Rxra Retinoic acid receptor RXR-alpha Rxra, Nr2b1 P28700
Slc10a1 Sodium/bile acid cotransporter Slc10a1, Ntcp O08705
Smad6 Mothers against decapentaplegic hom... Smad6, Madh6, Madh7, Msmad6 O35182
Smad7 Mothers against decapentaplegic hom... Smad7, Madh7, Madh8 O35253
Socs1 Suppressor of cytokine signaling 1 Socs1, Cish1, Ssi1 O35716
Socs3 Suppressor of cytokine signaling 3 Socs3, Cis3, Cish3 O35718
Sod2 Superoxide dismutase [Mn], mitochon... Sod2, Sod-2 P09671
Sparc SPARC Sparc P07214
Sult1a1 Sulfotransferase 1A1 Sult1a1, St1a1, Stp, Stp1 P52840
Sult1b1 Sulfotransferase family cytosolic 1... Sult1b1 Q9QWG7
Tgfb1 Transforming growth factor beta-1 Tgfb1 P04202
Tgfb2 Transforming growth factor beta-2 Tgfb2 P27090
Tgfbr2 TGF-beta receptor type-2 Tgfbr2 Q62312
Timp1 Metalloproteinase inhibitor 1 Timp1, Timp, Timp-1 P12032
Timp2 Metalloproteinase inhibitor 2 Timp2, Timp-2 P25785
Tnc Tenascin Tnc, Hxb Q80YX1
Tnf Tumor necrosis factor Tnf, Tnfa, Tnfsf2 P06804
Tnfrsf1a Tumor necrosis factor receptor supe... Tnfrsf1a, Tnfr-1, Tnfr1 P25118
Tnfrsf1b Tumor necrosis factor receptor supe... Tnfrsf1b, Tnfr-2, Tnfr2 P25119
Ugt1a1 UDP-glucuronosyltransferase 1-1 Ugt1a1, Ugt1 Q63886
Vdr Vitamin D3 receptor Vdr, Nr1i1 P48281
Wisp1 WNT1-inducible-signaling pathway pr... Wisp1, Ccn4, Elm1 O54775
Xiap E3 ubiquitin-protein ligase XIAP Xiap, Aipa, Api3, Birc4, Miha Q60989
options(width=75)
Create heatmap of the mean chip data
suppressPackageStartupMessages(library(gplots))
suppressPackageStartupMessages(library(RColorBrewer))
data_ADME = t(as.matrix(BDLmean[BDLfactors$ftype == "GE_ADME"]))
data_ADME <- data_ADME[rownames(data_ADME) != "Actb.x", ]
data_Cytokines = t(as.matrix(BDLmean[BDLfactors$ftype == "GE_Cytokines"]))
data_Cytokines <- data_Cytokines[rownames(data_Cytokines) != "Actb.y", ]
data_Fibrosis = t(as.matrix(BDLmean[BDLfactors$ftype == "GE_Fibrosis"]))
data_Fibrosis <- data_Fibrosis[rownames(data_Fibrosis) != "Actb", ]
# fold changes relative to sham mice (0 h) displayed in log2 scale
log2_to_control <- function(data){
data <- data[sort(rownames(data)), ] # sort by name
reldata <- data[, 2:ncol(data)]/data[, 1]
log2_reldata <- log2(reldata)
}
data_list <- list(ADME=data_ADME,
Fibrosis=data_Fibrosis,
Cytokines=data_Cytokines)
for (name in names(data_list)){
data <- data_list[[name]]
cor_colors <- HeatmapColors() # color palette for correlation (red - white - blue)
hmap_colors <- cor_colors(100)
# plot to file
pdf(file.path(resultsPath, "control", paste("chip_heatmap_", name, ".pdf", sep="")),
width=8, height=10, pointsize=12)
heatmap.2(log2_to_control(data), col=hmap_colors, breaks=seq(from=-2, to=2, length.out=length(hmap_colors)+1),
scale="none",
key=TRUE, symkey=FALSE, key.xlab = "log2",
trace="none", cexRow=0.8, cexCol=0.8,
symm=FALSE,
density.info="none", dendrogram="none",
Rowv=NULL, Colv=NULL,
keysize=0.8,
sepwidth=c(0.01,0.01),
sepcolor="white")
invisible(dev.off())
}
To get an overview over the BDL data set plots of the raw and mean data for all individual factors over time were generated. These are available in the resultsPath/factors
folder under the respective names of the factors
# Create figures for all factors
factors_path <- file.path(resultsPath, 'factors')
dir.create(factors_path, showWarnings=FALSE)
plot_all_factors(path=factors_path)
rm(factors_path)
An example plot for a single factor is depicted for bilirubin
. A continous increase in bilirubin can be observed after BDL.
plot_single_factor('bilirubin', path=NULL)
Next the heatmap overview of the complete data set is generated, i.e. the data of all factors for of all time points and repeats. This provides a first overview over the complete data set.
# define horizontal and vertical helper lines
v_lines <- ((1:Nt)*Nr+0.5)
factor_types <- c("Histochemistry", "Biochemistry",
"GE_Fibrosis", "GE_Cytokines", "GE_ADME")
factor_table <- table(BDLfactors$ftype)
h_lines <- 0.5 + cumsum(factor_table[factor_types])
timecourse_heatmap <- function(){
# create better row names
dtmp <- BDLdata
rownames(dtmp) <- paste(rownames(BDLsamples), BDLsamples$time_fac, sep=" ")
# heatmap colors
hmap_colors <- HeatmapColors()
# colors for factor groups
colorset <- brewer.pal(length(factor_types), "Set2")
color.map <- function(factor_id) {
return(
colorset[which(factor_types==BDLfactors$ftype[which(BDLfactors$id==factor_id)])]
)
}
factorColors <- unlist(lapply(BDLfactors$id, color.map))
# heatmap
heatmap.2(t(as.matrix(dtmp)), col=hmap_colors(100), scale="row",
dendrogram="none", Rowv=NULL, Colv=NULL,
key=TRUE, trace="none", cexRow=0.5, keysize=0.8, density.info="none",
RowSideColors=factorColors,
add.expr=abline(v=v_lines, h=h_lines, col="black", lwd=0.5),
main="Heatmap of BDL time course data")
# xlab="sample", ylab="factor")
# legend
legend("left",
inset=c(-0.03,0),
legend = rev(factor_types), # category labels
col = rev(colorset), # color key
lty= 1, lwd = 10, cex = 0.7, bty="n"
)
}
# plot to file
pdf(file.path(resultsPath, "control", "timecourse_heatmap.pdf"),
width=10, height=10, pointsize=12)
timecourse_heatmap()
invisible(dev.off())
timecourse_heatmap()
Results: Various patterns are visible in the plotted raw data set:
Actb (Actin, cytoplasmic 1) probes were included on all Fluidigm chips (GE_ADME
, GE_Cytokines
, GE_Fibrosis
) and not used in the normalization of the transcription data. Hence, ActB can serve as quality control for the technical reproducibility of the Fluidigm chips. If the data is reproducible between chips the pairwise correlation between all individual Actb measurements should have high correlation coefficients close to 1. Plotting the data of the Actb measurements of two chips against each other should lie on a straight line.
# Actb control figure
plot_actb_control <- function(){
par(mfrow=c(2,3))
plot_single("Actb")
plot_single("Actb.x")
plot_single("Actb.y")
plot_cor_pair("Actb", "Actb.x", single_plots=FALSE)
plot_cor_pair("Actb", "Actb.y", single_plots=FALSE)
plot_cor_pair("Actb.x", "Actb.y", single_plots=FALSE)
par(mfrow=c(1,1))
}
# plot to file
pdf(file.path(resultsPath, "control", "Actb_control.pdf"),
width=10, height=6, pointsize=12)
plot_actb_control()
invisible(dev.off())
# calculate Spearman and Pearson correlation coefficients on N=8*5=40 data points
actb.spearman <- cor(data.frame(Actb=BDLdata$Actb,
Actb.x=BDLdata$Actb.x,
Actb.y=BDLdata$Actb.y), method="spearman")
actb.pearson <- cor(data.frame(Actb=BDLdata$Actb,
Actb.x=BDLdata$Actb.x,
Actb.y=BDLdata$Actb.y), method="pearson")
# table of correlation coefficients
set.caption(sub(".", " ", "Spearman correlation of Actb controls", fixed = TRUE))
pander(round(actb.spearman, digits=3))
Actb | Actb.x | Actb.y | |
---|---|---|---|
Actb | 1 | 0.908 | 0.944 |
Actb.x | 0.908 | 1 | 0.917 |
Actb.y | 0.944 | 0.917 | 1 |
set.caption(sub(".", " ", "Pearson correlation of Actb controls", fixed = TRUE))
pander(round(actb.pearson, digits=3))
Actb | Actb.x | Actb.y | |
---|---|---|---|
Actb | 1 | 0.945 | 0.938 |
Actb.x | 0.945 | 1 | 0.935 |
Actb.y | 0.938 | 0.935 | 1 |
plot_actb_control()