Vesalius: Spatial Transcriptomics
Patrick C.N. Martin
Source:vignettes/Vesalius_Analysis/Spatial_RNA_seq.Rmd
Spatial_RNA_seq.Rmd
Forward
The following notebook highlights the analysis shown in the Vesalius 2.0 manuscript. More specifcially, we show the analysis for Spatial Transcriptomic Slide-SeqV2 in mouse brain and mouse embryo.
The aim of this notebook is two-fold:
- Provide a clear demonstration of how the figures in the manuscript were produced.
- A contextual vignette that demonstrates how one could use Vesalius
You will find through out this article subsection with the prefix “See also”. These sections will highlight link towards documentation that will enable you to go beyond what we did with vesalius for this analysis.
You will also find a Code Block section that contains the entire analysis in a single block of code to facilitate copy/pasting.
Introduction
Vesalius
Vesalius is a tool to decipher tissue anatomy and spatial domains from spatial omics data. To achieve this, Vesalius converts latent space embeddings into images upon which image processing techniques can be applied. In contrast to many other tools, Vesalius does not create a spatially aware latent space but utilises images that represent latent space to isolate tissue domains.
Slide-seq V2
Slide-seq version 2 is a high resolution spatial trancriptomics data that uses spatially indexed beads that hybridise with mRNA species prensent in the tissue. As such, there are a few aspect to consider when using this type of data:
- Spatially Indexed beads will not necessarily line up with the cells from the tissue. As a consequence, each bead can recover mRNA transcript from multiple cells. Conversly, in the case of large cells, transcripts might be spread over multiple spatial indices.
- Slide-seq provides an unbiased representation of the transcriptomic landscape that is limited by its sensitvity to mRNA transcript binding.
Aims
Vesalius aims to recover the finer details of spatial patterning by providing uniform tissue territories.
Uniform territories enable:
- In depth analysis of spatial patterning.
- Differential gene expression analysis between territories but also cell types present in different territories.
- Tissue border expression.
- Morphology dependant expression.
Spatial transcriptomic analysis
Before starting, make sure you have vesalius installed and that you have your data ready and downloaded.
To facilitate the analysis process, we use input/output folder to
store and save data. For visualization purposes, we also load a few
extra packages. We also load the future
package to limit
the number of cores being used.
# Load libraries and set seed
library(ggplot2)
library(patchwork)
library(ggpubr)
library(future)
library(Matrix)
library(tidyverse)
set.seed(1514)
library(vesalius, lib.loc = "/common/martinp4/R")
library(pwr, lib.loc = "/common/martinp4/R")
# Using Multi core !!! See note below !!!
plan(multicore, workers = 10)
# Input data directory Brain
spatial_coordinates_brain <- "/common/wonklab/SSv2/Puck_200115_08_bead_locations.csv"
counts_brain <- "/common/wonklab/SSv2/Puck_200115_08.digital_expression.txt.gz"
# Input data directory Embryo
spatial_coordinates_embryo <- "/common/wonklab/SSv2/Puck_190926_03_bead_locations.csv"
counts_embryo <- "/common/wonklab/SSv2/Puck_190926_03.digital_expression.txt.gz"
# Output directory
if (!dir.exists("/common/martinp4/output_plots/")) {
dir.create("/common/martinp4/output_plots/")
}
output_plots <- "/common/martinp4/output_plots/"
if (!dir.exists("/common/martinp4/output_data/")) {
dir.create("/common/martinp4/output_data/")
}
output_data <- "/common/martinp4/output_data/"
NOTE Here, we use multicore processing from the future and future.apply packages.
To use a single core, use plan(sequential)
If you are using a windows machine, multicore is not supported.
Instead you can use plan(multisession, workers = 10)
Loading data and building objects
The first step is to load the spatial transcriptomic data. Generally, spatial data comes in two seperate files.
- Spatial coordinates: This contains the x, y (even z) coordinates of each spatially indexed barcodes.
- Count matrix: This contains the number of gene counts at every spatial index.
Reading in spatial coordinates.
spatial_coordinates_brain <- read.csv(spatial_coordinates_brain,
header = TRUE,
skip = 1)
colnames(spatial_coordinates_brain) <- c("barcodes", "x", "y")
spatial_coordinates_embryo <- read.csv(spatial_coordinates_embryo,
header = TRUE)
colnames(spatial_coordinates_embryo) <- c("barcodes", "x", "y")
str(spatial_coordinates_brain)
str(spatial_coordinates_embryo)
Note the use of skip = 1
. This may not be required
depending on how your data is formatted.
Reading in count data
counts_brain <- read.table(counts_brain, header = TRUE, row.names = 1)
counts_embryo <- read.table(counts_embryo, header = TRUE, row.names = 1)
counts_brain[1:5, 1:5]
Building a Vesalius Assay
The first step in the Vesalius Pipeline is to build a vesalius_assay object. This object is used throughout the vesalius workflow and stores data after every round of computation.
To build a vesalius_assay
:
vesalius_brain <- build_vesalius_assay(coordinates = spatial_coordinates_brain,
counts = counts_brain,
assay = "Spatial_transcriptomics_brain",
verbose = FALSE)
vesalius_embryo <- build_vesalius_assay(coordinates = spatial_coordinates_embryo,
counts = counts_embryo,
assay = "Spatial_transcriptomics_embryo",
verbose = FALSE)
vesalius_brain
vesalius_embryo
See also - Vesalius assays
vesalius_assay
objects require as minimal input the
spatial coordinates.
- To add custom counts see add_counts.
Generating embeddings
The next step is to generate pixels tiles from spatial coordinates and latent space embedding that will be used to populate each tile with a grey scale color value.
vesalius_brain <- generate_embeddings(vesalius_brain,
dim_reduction = "PCA",
dimensions = 50,
tensor_resolution = 1,
verbose = FALSE)
vesalius_embryo <- generate_embeddings(vesalius_embryo,
dim_reduction = "PCA",
dimensions = 50,
tensor_resolution = 0.3,
verbose = FALSE)
vesalius_brain
vesalius_embryo
Here, we will use Principal Component Analysis (PCA) to reduce data dimensionality and produce our latent space. We will also reduce the dimension of the image tensor. Effectively, we are reducing the size of the image by combining neighboring spatial indices if they fall within the same local area. Reducing the image resolution ensures lower run times.
In addition to merging spatial indices, we adjust the count matrix by taking the average count values for all spatial indices that have been merged together. Note that the spatial index names become a combination of the original spatial indices.
For example:
See also - Generating embeddings
The generate_embeddings
function generates tiles from
spatial indices, pre-processes the count matrix (Finding variable
features and normalisation) , and generates latent space embeddings used
in the vesalius image stack.
- To add custom counts see add_counts. For more information, refer to (modifying objects vignette)[].
- To add custom embeddings see add_embeddings. For more information, refer to (modifying objects vignette)[].
- To only generate tiles seegenerate_tiles
- To run multiple embeddings, see user guide
Image processing
The core concept of Vesalius is to provide spatial domains by utilizing image processing techniques. In contrast to other methods, Vesalius does not provide spatially aware latent space by default. It utilises Principal Component Analsyis (as well as UMAP or SVD/LSI) to generate a reduced dimesnionaility space.
We find that Principal Components (PC) contain subtle spatial information already embedded within them. We can viszualise this by looking at the grey scale embeddings for each PC.
pca_embedding_brain <- vector("list", 30)
pca_embedding_embryo <- vector("list", 30)
for (i in seq(1, 30)){
pca_embedding_brain[[i]] <- image_plot(vesalius_brain,
embedding = "last",
dimensions = i)
pca_embedding_embryo[[i]] <- image_plot(vesalius_embryo,
embedding = "last",
dimensions = i)
}
For the mouse brain:
For the mouse embryo:
We can see that many of these PCs contain spatially relevant information even though the latent space was not produced in an explitely spatially aware manner. By using image processing, we can further highlight these regions and better utilise them during spatial domain isolation.
vesalius_brain <- regularise_image(vesalius_brain,
dimensions = seq(1, 50),
verbose = FALSE)
vesalius_brain <- equalize_image(vesalius_brain,
dimensions = seq(1, 50),
sleft = 5, sright = 5,
verbose = FALSE)
vesalius_brain <- smooth_image(vesalius_brain,
dimensions = seq(1, 50),
method = c("iso", "box"),
sigma = 2,
box = 10,
iter = 10,
verbose = FALSE)
vesalius_embryo <- regularise_image(vesalius_embryo,
embedding = "PCA",
dimensions = seq(1, 50),
lambda = 10,
verbose = FALSE)
vesalius_embryo <- equalize_image(vesalius_embryo,
dimensions = seq(1, 50),
sleft = 1, sright = 1,
verbose = FALSE)
vesalius_embryo <- smooth_image(vesalius_embryo,
dimensions = seq(1, 50),
method = c("iso", "box"),
sigma = 1,
box = 10,
across_levels = "min",
iter = 10,
verbose = FALSE)
We can viszualise each PC in the image tensor after image processing.
pca_embedding_brain <- vector("list", 30)
pca_embedding_embryo <- vector("list", 30)
for (i in seq(1, 30)){
pca_embedding_brain[[i]] <- image_plot(vesalius_brain,
embedding = "last",
dimensions = i)
pca_embedding_embryo[[i]] <- image_plot(vesalius_embryo,
embedding = "last",
dimensions = i)
}
For example, in the mous brain, we can cleary see that PC17 contains information related to the CA2 field - a field that is often lost during standard analysis.
For the mouse embryo:
In vesalius, each PC is handled independantly in the form of a grey scale image. We can use this image stack to generate isolated spatial domains and recover subtle expression patterns. The selection of image processing parameters is dependant on the data set used and on the biological question at hand. You might want to have more or less territory granularity. However, the ability to visualize PCs greatly aids is selecting the parameters that fits your needs.
See also - image processing
Image prcessing is iteratively applied to the active embedding.
- Understanding active embeddings in vesalius see the user guide
- Selecting different PCs see the user guide
Image segmentation & Territory Isolation
The isolation of territories is achieved by a combination of image segmentation and 2D isolation of image segments. By default, we use kmeans for image segmentation. This produces a series of image segments that can then be further isolated if they are sperated in 2D space.
vesalius_brain <- segment_image(vesalius_brain,
dimensions = seq(1, 30),
method = "kmeans",
col_resolution = 25,
verbose = FALSE)
vesalius_brain <- isolate_territories(vesalius_brain, verbose = FALSE)
vesalius_embryo <- segment_image(vesalius_embryo,
dimensions = seq(1, 30),
method = "kmeans",
col_resolution = 32,
verbose = FALSE)
vesalius_embryo <- isolate_territories(vesalius_embryo, verbose = FALSE)
In this instance, we elected to use all images in the image stack
(i.e dimensions = seq(1,30)
). In the context of vesalius,
the col_resolution
argument represents the number of
clusters to parse to kmeans
however this does not define
the final number of territories. These segments will be subdivided into
small segments based on the distribution in space.
See also - Image segmentation & Territory Isolation
By default, vesalius uses kmeans clustering as it provides a simple and effective way to segment images.
- Using louvain based segmentation see the image_segmentation function
- Using leiden based segmentation see the image_segmentation function
Visualization of territories
As mentionned above we distinguish image segements and territories. We can vizualise these differences with the vesalius plotting functions.
brain_segments <- territory_plot(vesalius_brain,
trial = "Segment",
cex_pt = 0.25) +
labs(title = "Brain Segments")
brain_territories <- territory_plot(vesalius_brain,
trial = "Territory",
cex_pt = 0.25) +
labs(title = "Brain Territories")
embryo_segments <- territory_plot(vesalius_embryo,
trial = "Segment",
cex_pt = 0.25) +
labs(title = "Embryo Segments")
embryo_territories <- territory_plot(vesalius_embryo,
trial = "Territory",
cex_pt = 0.25) +
labs(title = "Embryo Territories")
all <- (brain_segments + brain_territories) /
(embryo_segments + embryo_territories)
print(all)
We can see how image segments and territories differ when compareing
their plots. While segments are defined by the value parsed to
col_resolution
during image segmentation, the final number
of territories will depend on how the images were processed, the value
parsed to col_resolution
and the capture radius
defined in isolate_territories.
In short, this parameters defines the maximum distance between color
segments for them to be pooled into the same territory.
Differential Gene Expression
The advantage of uniform territories is that it enables the comparison of spatial territories rather than comparing cell clusters.
We can start by getting all genes that are differentially expressed in each territory and export the results.
vesalius_brain <- identify_markers(vesalius_brain,
trial = "Territory", # Using PCA territories here
method = "DESeq2",
seed = NULL,
query = NULL
)
vesalius_embryo <- identify_markers(vesalius_embryo,
trial = "Territory", # Using PCA territories here
method = "DESeq2",
seed = NULL,
query = NULL
)
deg_brain <- get_markers(vesalius_brain)
write.csv(deg_brain, file = paste0(output_data, "DEG_SSv2_brain.csv"))
deg_embryo <- get_markers(vesalius_embryo)
write.csv(deg_embryo, file = paste0(output_data, "DEG_SSv2_embryo.csv"))
head(deg_brain)
Using these tables, we can ivestigate differntial gene expression and
find which genes are more highly expressed in a given territory comapred
to everything else. If you want to only look at one or a subset of
territories, you can specifiy your territories in the seed
argument.
vesalius_brain <- identify_markers(vesalius_brain,
trial = "Territory", # Using PCA territories here
method = "DESeq2",
seed = c(1, 2, 3),
query = NULL)
We can also use this option to compare territories between each other. If there are many territories it can be easier to visualize each territory seperately.
brain_split <- territory_plot(vesalius_brain,
trial = "Territory",
split = TRUE,
randomise = FALSE # we don't reallyt need to randomise colors here
)
brain_split
In this case, we can compare territory 18 and territory 3 (medial habenula compartments).
vesalius_brain <- identify_markers(vesalius_brain,
trial = "Territory",
method = "DESeq2",
seed = 18,
query = 3)
med_comp <- get_markers(vesalius_brain)
head(med_comp)
See also - using different DEG methods
The indentify_markers
function offers different
comparison options.
- Using different DEG methods - see (identify_markers function)[https://patrickcnmartin.github.io/Vesalius/reference/identify_markers]
- Extracting markers from object - see (get_markers function)[https://patrickcnmartin.github.io/Vesalius/reference/get_markers]
- **Comparing cells between territories - see (User guide)[https://patrickcnmartin.github.io/Vesalius/articles/vesalius.html]
Visualization of Gene expresssion
The final step is to visualise gene expression patterns.
For Pcp4 in the mouse hippocampus:
pcp4 <- view_gene_expression(vesalius_brain,
genes = "Pcp4",
trial = "Territory",
cex = 4)
pcp4_layer <- view_gene_expression(vesalius_brain,
genes = "Pcp4",
trial = "Territory",
as_layer = TRUE,
cex = 4)
pcp4 + pcp4_layer + plot_annotation(tag_levels = "A")
For Tac1 in the mouse hippocampus:
Tac1 <- view_gene_expression(vesalius_brain,
genes = "Tac1",
trial = "Territory",
cex = 4)
Tac1_layer <- view_gene_expression(vesalius_brain,
genes = "Tac1",
trial = "Territory",
as_layer = TRUE,
cex = 4)
Tac1 + Tac1_layer + plot_annotation(tag_levels = "A")
For Pmel in the mouse embryo:
Pmel <- view_gene_expression(vesalius_embryo,
genes = "Pmel",
trial = "Territory",
cex = 4)
Pmel_layer <- view_gene_expression(vesalius_embryo,
genes = "Pmel",
trial = "Territory",
as_layer = TRUE,
cex = 4)
Pmel + Pmel_layer + plot_annotation(tag_levels = "A")
We can visualise gene expression as an overall pattern (“A” plots) or an averge expression in each territory (“B” plots). It is important to use “B” plots with caution. While these plots can help to clarify the expression distribution across territories, they represent a discretised view of expression patterns. As such many transcriptional dynamics such as gradients or overall low expression can altered.
See also - Visualization of multiple genes
You can also parse a vector of genes to the genes
argument. This will return a ggarrange
object that can
directly be plotted.
- Visualization of multiple genes - see view_gene_expression function
- Understaing ggarrange - see ggpubr page
Conclusion
Here, we present a Vesalius workflow using 2 high resolution spatial transcriptomic data sets. This pipeline is applicabled to any spatial transcriptomic data set. However, image processing parameteres should be set according to your needs and the data set!
Session Info
## R version 4.4.2 (2024-10-31)
## Platform: x86_64-pc-linux-gnu
## Running under: Ubuntu 24.04.1 LTS
##
## Matrix products: default
## BLAS: /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3
## LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.26.so; LAPACK version 3.12.0
##
## locale:
## [1] LC_CTYPE=C.UTF-8 LC_NUMERIC=C LC_TIME=C.UTF-8
## [4] LC_COLLATE=C.UTF-8 LC_MONETARY=C.UTF-8 LC_MESSAGES=C.UTF-8
## [7] LC_PAPER=C.UTF-8 LC_NAME=C LC_ADDRESS=C
## [10] LC_TELEPHONE=C LC_MEASUREMENT=C.UTF-8 LC_IDENTIFICATION=C
##
## time zone: UTC
## tzcode source: system (glibc)
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## loaded via a namespace (and not attached):
## [1] digest_0.6.37 desc_1.4.3 R6_2.5.1 fastmap_1.2.0
## [5] xfun_0.50 cachem_1.1.0 knitr_1.49 htmltools_0.5.8.1
## [9] rmarkdown_2.29 lifecycle_1.0.4 cli_3.6.3 sass_0.4.9
## [13] pkgdown_2.1.1 textshaping_0.4.1 jquerylib_0.1.4 systemfonts_1.2.0
## [17] compiler_4.4.2 tools_4.4.2 ragg_1.3.3 bslib_0.8.0
## [21] evaluate_1.0.3 yaml_2.3.10 jsonlite_1.8.9 rlang_1.1.4
## [25] fs_1.6.5 htmlwidgets_1.6.4