Vesalius: Spatial Transcriptomics

Forward

The following notebook highlights the analysis shown in the Vesalius 2.0 manuscript. More specifcially, we show the analysis for Spatial Transcriptomic Slide-SeqV2 in mouse brain and mouse embryo.

The aim of this notebook is two-fold:

Provide a clear demonstration of how the figures in the manuscript were produced.
A contextual vignette that demonstrates how one could use Vesalius

You will find through out this article subsection with the prefix “See also”. These sections will highlight link towards documentation that will enable you to go beyond what we did with vesalius for this analysis.

You will also find a Code Block section that contains the entire analysis in a single block of code to facilitate copy/pasting.

Introduction

Vesalius

Vesalius is a tool to decipher tissue anatomy and spatial domains from spatial omics data. To achieve this, Vesalius converts latent space embeddings into images upon which image processing techniques can be applied. In contrast to many other tools, Vesalius does not create a spatially aware latent space but utilises images that represent latent space to isolate tissue domains.

Slide-seq V2

Slide-seq version 2 is a high resolution spatial trancriptomics data that uses spatially indexed beads that hybridise with mRNA species prensent in the tissue. As such, there are a few aspect to consider when using this type of data:

Spatially Indexed beads will not necessarily line up with the cells from the tissue. As a consequence, each bead can recover mRNA transcript from multiple cells. Conversly, in the case of large cells, transcripts might be spread over multiple spatial indices.
Slide-seq provides an unbiased representation of the transcriptomic landscape that is limited by its sensitvity to mRNA transcript binding.

Aims

Vesalius aims to recover the finer details of spatial patterning by providing uniform tissue territories.

Uniform territories enable:

In depth analysis of spatial patterning.
Differential gene expression analysis between territories but also cell types present in different territories.
Tissue border expression.
Morphology dependant expression.

Spatial transcriptomic analysis

Before starting, make sure you have vesalius installed and that you have your data ready and downloaded.

To facilitate the analysis process, we use input/output folder to store and save data. For visualization purposes, we also load a few extra packages. We also load the future package to limit the number of cores being used.

# Load libraries and set seed
library(ggplot2)
library(patchwork)
library(ggpubr)
library(future)
library(Matrix)
library(tidyverse)
set.seed(1514)
library(vesalius, lib.loc = "/common/martinp4/R")
library(pwr, lib.loc = "/common/martinp4/R")


# Using Multi core !!! See note below !!!
plan(multicore, workers = 10)


# Input data directory Brain
spatial_coordinates_brain <- "/common/wonklab/SSv2/Puck_200115_08_bead_locations.csv"
counts_brain <- "/common/wonklab/SSv2/Puck_200115_08.digital_expression.txt.gz"

# Input data directory Embryo
spatial_coordinates_embryo <- "/common/wonklab/SSv2/Puck_190926_03_bead_locations.csv"
counts_embryo <- "/common/wonklab/SSv2/Puck_190926_03.digital_expression.txt.gz"

# Output directory
if (!dir.exists("/common/martinp4/output_plots/")) {
    dir.create("/common/martinp4/output_plots/")
}
output_plots <- "/common/martinp4/output_plots/"

if (!dir.exists("/common/martinp4/output_data/")) {
    dir.create("/common/martinp4/output_data/")
}
output_data <- "/common/martinp4/output_data/"

NOTE Here, we use multicore processing from the future and future.apply packages.

To use a single core, use plan(sequential)

If you are using a windows machine, multicore is not supported. Instead you can use plan(multisession, workers = 10)

Loading data and building objects

The first step is to load the spatial transcriptomic data. Generally, spatial data comes in two seperate files.

Spatial coordinates: This contains the x, y (even z) coordinates of each spatially indexed barcodes.
Count matrix: This contains the number of gene counts at every spatial index.

Reading in spatial coordinates.

spatial_coordinates_brain <- read.csv(spatial_coordinates_brain,
  header = TRUE,
  skip = 1)
colnames(spatial_coordinates_brain) <- c("barcodes", "x", "y")

spatial_coordinates_embryo <- read.csv(spatial_coordinates_embryo,
  header = TRUE)
colnames(spatial_coordinates_embryo) <- c("barcodes", "x", "y")

str(spatial_coordinates_brain)
str(spatial_coordinates_embryo)

Note the use of skip = 1. This may not be required depending on how your data is formatted.

Reading in count data

counts_brain <- read.table(counts_brain, header = TRUE, row.names = 1)
counts_embryo <- read.table(counts_embryo, header = TRUE, row.names = 1)
counts_brain[1:5, 1:5]

Building a Vesalius Assay

The first step in the Vesalius Pipeline is to build a vesalius_assay object. This object is used throughout the vesalius workflow and stores data after every round of computation.

To build a vesalius_assay :

vesalius_brain <- build_vesalius_assay(coordinates = spatial_coordinates_brain,
  counts = counts_brain,
  assay = "Spatial_transcriptomics_brain",
  verbose = FALSE)

vesalius_embryo <- build_vesalius_assay(coordinates = spatial_coordinates_embryo,
  counts = counts_embryo,
  assay = "Spatial_transcriptomics_embryo",
  verbose = FALSE)

vesalius_brain
vesalius_embryo

Generating embeddings

The next step is to generate pixels tiles from spatial coordinates and latent space embedding that will be used to populate each tile with a grey scale color value.

vesalius_brain <- generate_embeddings(vesalius_brain,
    dim_reduction = "PCA",
    dimensions = 50,
    tensor_resolution = 1,
    verbose = FALSE)
vesalius_embryo <- generate_embeddings(vesalius_embryo,
    dim_reduction = "PCA",
    dimensions = 50,
    tensor_resolution = 0.3,
    verbose = FALSE)
vesalius_brain
vesalius_embryo

Here, we will use Principal Component Analysis (PCA) to reduce data dimensionality and produce our latent space. We will also reduce the dimension of the image tensor. Effectively, we are reducing the size of the image by combining neighboring spatial indices if they fall within the same local area. Reducing the image resolution ensures lower run times.

In addition to merging spatial indices, we adjust the count matrix by taking the average count values for all spatial indices that have been merged together. Note that the spatial index names become a combination of the original spatial indices.

For example:

head(grep(pattern = "_et_",
  x = unique(get_tiles(vesalius_brain)$barcodes),
  value = TRUE))

Image processing

The core concept of Vesalius is to provide spatial domains by utilizing image processing techniques. In contrast to other methods, Vesalius does not provide spatially aware latent space by default. It utilises Principal Component Analsyis (as well as UMAP or SVD/LSI) to generate a reduced dimesnionaility space.

We find that Principal Components (PC) contain subtle spatial information already embedded within them. We can viszualise this by looking at the grey scale embeddings for each PC.

pca_embedding_brain <- vector("list",  30)
pca_embedding_embryo <- vector("list",  30)
for (i in seq(1, 30)){
    pca_embedding_brain[[i]] <- image_plot(vesalius_brain,
        embedding = "last",
        dimensions = i)
    pca_embedding_embryo[[i]] <- image_plot(vesalius_embryo,
        embedding = "last",
        dimensions = i)
}

For the mouse brain:

print(ggarrange(plotlist = pca_embedding_brain), ncol = 5)

For the mouse embryo:

print(ggarrange(plotlist = pca_embedding_embryo), ncol = 5)

We can see that many of these PCs contain spatially relevant information even though the latent space was not produced in an explitely spatially aware manner. By using image processing, we can further highlight these regions and better utilise them during spatial domain isolation.

vesalius_brain <- regularise_image(vesalius_brain,
    dimensions = seq(1, 50),
    verbose = FALSE)
vesalius_brain <- equalize_image(vesalius_brain,
    dimensions = seq(1, 50),
    sleft = 5, sright = 5,
    verbose = FALSE)
vesalius_brain <- smooth_image(vesalius_brain,
    dimensions = seq(1, 50),
    method = c("iso", "box"),
    sigma = 2,
    box = 10,
    iter = 10,
    verbose = FALSE)

vesalius_embryo <- regularise_image(vesalius_embryo,
   embedding = "PCA",
   dimensions = seq(1, 50),
   lambda = 10,
   verbose = FALSE)
vesalius_embryo <- equalize_image(vesalius_embryo,
    dimensions = seq(1, 50),
    sleft = 1, sright = 1,
    verbose = FALSE)
vesalius_embryo <- smooth_image(vesalius_embryo,
    dimensions = seq(1, 50),
    method = c("iso", "box"),
    sigma = 1,
    box = 10,
    across_levels = "min",
    iter = 10,
    verbose = FALSE)

We can viszualise each PC in the image tensor after image processing.

pca_embedding_brain <- vector("list",  30)
pca_embedding_embryo <- vector("list",  30)
for (i in seq(1, 30)){
    pca_embedding_brain[[i]] <- image_plot(vesalius_brain,
        embedding = "last",
        dimensions = i)
    pca_embedding_embryo[[i]] <- image_plot(vesalius_embryo,
        embedding = "last",
        dimensions = i)
}

For example, in the mous brain, we can cleary see that PC17 contains information related to the CA2 field - a field that is often lost during standard analysis.

print(ggarrange(plotlist = pca_embedding_brain), ncol = 5)

For the mouse embryo:

print(ggarrange(plotlist = pca_embedding_embryo), ncol = 5)

In vesalius, each PC is handled independantly in the form of a grey scale image. We can use this image stack to generate isolated spatial domains and recover subtle expression patterns. The selection of image processing parameters is dependant on the data set used and on the biological question at hand. You might want to have more or less territory granularity. However, the ability to visualize PCs greatly aids is selecting the parameters that fits your needs.

Image segmentation & Territory Isolation

The isolation of territories is achieved by a combination of image segmentation and 2D isolation of image segments. By default, we use kmeans for image segmentation. This produces a series of image segments that can then be further isolated if they are sperated in 2D space.

vesalius_brain <- segment_image(vesalius_brain,
    dimensions = seq(1, 30),
    method = "kmeans",
    col_resolution = 25,
    verbose = FALSE)
vesalius_brain <- isolate_territories(vesalius_brain, verbose = FALSE)

vesalius_embryo <- segment_image(vesalius_embryo,
    dimensions = seq(1, 30),
    method = "kmeans",
    col_resolution = 32,
    verbose = FALSE)
vesalius_embryo <- isolate_territories(vesalius_embryo, verbose = FALSE)

In this instance, we elected to use all images in the image stack (i.e dimensions = seq(1,30)). In the context of vesalius, the col_resolution argument represents the number of clusters to parse to kmeans however this does not define the final number of territories. These segments will be subdivided into small segments based on the distribution in space.

Visualization of territories

As mentionned above we distinguish image segements and territories. We can vizualise these differences with the vesalius plotting functions.

brain_segments <- territory_plot(vesalius_brain,
  trial = "Segment",
  cex_pt = 0.25) +
  labs(title = "Brain Segments")
brain_territories <- territory_plot(vesalius_brain,
  trial = "Territory",
  cex_pt = 0.25) +
  labs(title = "Brain Territories")
embryo_segments <- territory_plot(vesalius_embryo,
  trial = "Segment",
  cex_pt = 0.25) +
  labs(title = "Embryo Segments")
embryo_territories <- territory_plot(vesalius_embryo,
  trial = "Territory",
  cex_pt = 0.25) +
  labs(title = "Embryo Territories")


all <- (brain_segments + brain_territories) /
  (embryo_segments + embryo_territories)

print(all)

We can see how image segments and territories differ when compareing their plots. While segments are defined by the value parsed to col_resolution during image segmentation, the final number of territories will depend on how the images were processed, the value parsed to col_resolution and the capture radius defined in isolate_territories. In short, this parameters defines the maximum distance between color segments for them to be pooled into the same territory.

Differential Gene Expression

The advantage of uniform territories is that it enables the comparison of spatial territories rather than comparing cell clusters.

We can start by getting all genes that are differentially expressed in each territory and export the results.

vesalius_brain <- identify_markers(vesalius_brain,
  trial = "Territory", # Using PCA territories here
  method = "DESeq2",
  seed =  NULL,
  query = NULL
)

vesalius_embryo <- identify_markers(vesalius_embryo,
  trial = "Territory", # Using PCA territories here
  method = "DESeq2",
  seed =  NULL,
  query = NULL
)

deg_brain <- get_markers(vesalius_brain)
write.csv(deg_brain, file = paste0(output_data, "DEG_SSv2_brain.csv"))
deg_embryo <- get_markers(vesalius_embryo)
write.csv(deg_embryo, file = paste0(output_data, "DEG_SSv2_embryo.csv"))

head(deg_brain)

Using these tables, we can ivestigate differntial gene expression and find which genes are more highly expressed in a given territory comapred to everything else. If you want to only look at one or a subset of territories, you can specifiy your territories in the seed argument.

vesalius_brain <- identify_markers(vesalius_brain,
  trial = "Territory", # Using PCA territories here
  method = "DESeq2",
  seed =  c(1, 2, 3),
  query = NULL)

We can also use this option to compare territories between each other. If there are many territories it can be easier to visualize each territory seperately.

brain_split  <- territory_plot(vesalius_brain,
  trial = "Territory",
  split = TRUE,
  randomise = FALSE # we don't reallyt need to randomise colors here
)

brain_split

In this case, we can compare territory 18 and territory 3 (medial habenula compartments).

vesalius_brain <- identify_markers(vesalius_brain,
    trial = "Territory",
    method = "DESeq2",
    seed = 18,
    query = 3)
med_comp <- get_markers(vesalius_brain)
head(med_comp)

Visualization of Gene expresssion

The final step is to visualise gene expression patterns.

For Pcp4 in the mouse hippocampus:

pcp4 <- view_gene_expression(vesalius_brain,
  genes = "Pcp4",
  trial = "Territory",
  cex = 4) 
pcp4_layer <- view_gene_expression(vesalius_brain,
  genes = "Pcp4",
  trial = "Territory",
  as_layer = TRUE,
  cex = 4)

pcp4 + pcp4_layer + plot_annotation(tag_levels = "A")

For Tac1 in the mouse hippocampus:

Tac1 <- view_gene_expression(vesalius_brain,
  genes = "Tac1",
  trial = "Territory",
  cex = 4) 
Tac1_layer <- view_gene_expression(vesalius_brain,
  genes = "Tac1",
  trial = "Territory",
  as_layer = TRUE,
  cex = 4)

Tac1 + Tac1_layer + plot_annotation(tag_levels = "A")

For Pmel in the mouse embryo:

Pmel <- view_gene_expression(vesalius_embryo,
  genes = "Pmel",
  trial = "Territory",
  cex = 4) 
Pmel_layer <- view_gene_expression(vesalius_embryo,
  genes = "Pmel",
  trial = "Territory",
  as_layer = TRUE,
  cex = 4)

Pmel + Pmel_layer + plot_annotation(tag_levels = "A")

We can visualise gene expression as an overall pattern (“A” plots) or an averge expression in each territory (“B” plots). It is important to use “B” plots with caution. While these plots can help to clarify the expression distribution across territories, they represent a discretised view of expression patterns. As such many transcriptional dynamics such as gradients or overall low expression can altered.

Conclusion

Here, we present a Vesalius workflow using 2 high resolution spatial transcriptomic data sets. This pipeline is applicabled to any spatial transcriptomic data set. However, image processing parameteres should be set according to your needs and the data set!

Session Info

sessionInfo()

## R version 4.5.1 (2025-06-13)
## Platform: x86_64-pc-linux-gnu
## Running under: Ubuntu 24.04.2 LTS
## 
## Matrix products: default
## BLAS:   /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3 
## LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.26.so;  LAPACK version 3.12.0
## 
## locale:
##  [1] LC_CTYPE=C.UTF-8       LC_NUMERIC=C           LC_TIME=C.UTF-8       
##  [4] LC_COLLATE=C.UTF-8     LC_MONETARY=C.UTF-8    LC_MESSAGES=C.UTF-8   
##  [7] LC_PAPER=C.UTF-8       LC_NAME=C              LC_ADDRESS=C          
## [10] LC_TELEPHONE=C         LC_MEASUREMENT=C.UTF-8 LC_IDENTIFICATION=C   
## 
## time zone: UTC
## tzcode source: system (glibc)
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## loaded via a namespace (and not attached):
##  [1] digest_0.6.37     desc_1.4.3        R6_2.6.1          fastmap_1.2.0    
##  [5] xfun_0.52         cachem_1.1.0      knitr_1.50        htmltools_0.5.8.1
##  [9] rmarkdown_2.29    lifecycle_1.0.4   cli_3.6.5         sass_0.4.10      
## [13] pkgdown_2.1.3     textshaping_1.0.1 jquerylib_0.1.4   systemfonts_1.2.3
## [17] compiler_4.5.1    tools_4.5.1       ragg_1.4.0        bslib_0.9.0      
## [21] evaluate_1.0.4    yaml_2.3.10       jsonlite_2.0.0    rlang_1.1.6      
## [25] fs_1.6.6          htmlwidgets_1.6.4

Patrick C.N. Martin