Aling and integrate spatial assay from the same modality using super pixels

Usage

map_assays(
  seed_assay,
  query_assay,
  signal = "variable_features",
  use_cost = c("feature", "niche"),
  method = "pearson",
  neighborhood = "knn",
  k = 20,
  radius = 0.05,
  depth = 1,
  dimensions = seq(1, 30),
  batch_size = 10000,
  epochs = 1,
  allow_duplicates = TRUE,
  threshold = 0.3,
  filter_cells = FALSE,
  use_norm = "raw",
  scale = FALSE,
  custom_cost = NULL,
  seed_territory_labels = "Territory",
  query_territory_labels = "Territory",
  seed_meta_labels = NULL,
  query_meta_labels = NULL,
  jitter = 0,
  digits = 5,
  verbose = TRUE
)

Arguments

seed_assay: vesalius_assay object - data to be mapped to
query_assay: vesalius_assay objecy - data to map
signal: character (variable_features, all_features, embeddings, custom) - What should be used as cell signal to generate the cost matrix. Seed details
use_cost: character string defining how should total cost be computer Available: feature, niche, territory, composition (See details for combinations and custom matrices)
neighborhood: character - how should the neighborhood be selected? "knn", "radius", "graph"(See details)
k: int ]2, n_points] number of neareset neighbors to be considered for neighborhodd computation.
radius: numeric ]0,1[ proportion of max distance between points to consider for the neighborhood
depth: int [1, NA] graph depth from cell to consider from neighborhood (See details)
dimensions: Int vector containing latent space dimensions to use
batch_size: number of points per batch in query during assignment problem solving
threshold: score threshold below which indicices should be removed. Scores will always be between 0 and 1
use_norm: character - which count data to use
scale: logical - should signal be scaled
custom_cost: matrix - matrix of size n (query cells) by p (seed cells) containing custom cost matrix. Used instead of vesalius cost matrix
verbose: logical - should I be a noisy boy?

Value

vesalius_assay

Details

The goal is to assign the best matching point between a seed set and a query set.

To do so, map_assays will first extract a biological signal. This can be latent space embeddings per cell, or by using gene counts (or any other modality).

If using gene counts, there are a few more options available to you. First, you can select "variable_features" and vesalius will find the intersection between the variable features in your seed_assay and your query_assay. "all_features" will find the intersection of all genes across assays (even if they are not highly variable). Finally, you can also select a custom gene vector, containing only the gene set you are interested in.

The second step is to create a cost matrix. The creation of a cost matrix is achieved by pair-wise sum of various cost matrices. By default, the map_assays function will use "feature" and "niche" cost matrices. The feature matrix computes the pearson correlation between the seed and query using which ever signal was defined by the signal argument (variable_features) will compute the correlation between shared variable features in seed and query). The niche matrix will be computed by using the pearson correlation between niche expression profiles (based on signal). Niche are defined using the neighborhood argument where knn represent the k nearest neighbors algorithm (with k defining the number of nearest neighbors), depth represents the graph depth of a local neighborhood graph, and radius defining a spatial radius surrunding a center cell. The singal (expression or embedding) is average across all cells in the niche. The territory matrix will compare the average signal of vesalius territories between seed and query. The composition matrix will compute a frequency aware jaccard index between cell types present in a niche. Cell types must be assigned to seed and query vesalius objects (See add_cells function) Total cost matrix will be computed by computing the pairwise sum of the complement (1 - p ) of each cost matrix.

This cost matrix is then parsed to a Kuhn–Munkres algorithm that will generate point pairs that minimize the overall cost.

Since the algorithm complexity is O(n3), it can be time consuming to to run on larger data sets. As such, mapping will be approximated by dividing seed and query into batches defined by batch size. For an exact mapping ensure that batch_size is larger than the number of cells in both query and seed.

Finaly once the matches are found, the coordinates are mapped to its corresponding point and a new object is returned.

Examples

if (FALSE) { # \dontrun{
data(vesalius)
# Create Vesalius object for processing
vesalius <- build_vesalius_assay(coordinates, counts)
jitter_ves <- build_vesalius_assay(jitter_coord, jitter_counts)
mapped <- map_assays(vesalius, jitter_ves)
} # }