Aling and integrate spatial assay from the same modality using super pixels
Source:R/map_assays.R
map_assays.Rd
Aling and integrate spatial assay from the same modality using super pixels
Usage
map_assays(
seed_assay,
query_assay,
signal = "variable_features",
use_cost = c("feature", "niche"),
method = "pearson",
neighborhood = "knn",
k = 20,
radius = 0.05,
depth = 1,
dimensions = seq(1, 30),
batch_size = 10000,
epochs = 1,
allow_duplicates = TRUE,
threshold = 0.3,
filter_cells = FALSE,
use_norm = "raw",
scale = FALSE,
custom_cost = NULL,
seed_territory_labels = "Territory",
query_territory_labels = "Territory",
seed_meta_labels = NULL,
query_meta_labels = NULL,
jitter = 0,
digits = 5,
verbose = TRUE
)
Arguments
- seed_assay
vesalius_assay object - data to be mapped to
- query_assay
vesalius_assay objecy - data to map
- signal
character (variable_features, all_features, embeddings, custom) - What should be used as cell signal to generate the cost matrix. Seed details
- use_cost
character string defining how should total cost be computer Available: feature, niche, territory, composition (See details for combinations and custom matrices)
- neighborhood
character - how should the neighborhood be selected? "knn", "radius", "graph"(See details)
- k
int ]2, n_points] number of neareset neighbors to be considered for neighborhodd computation.
- radius
numeric ]0,1[ proportion of max distance between points to consider for the neighborhood
- depth
int [1, NA] graph depth from cell to consider from neighborhood (See details)
- dimensions
Int vector containing latent space dimensions to use
- batch_size
number of points per batch in query during assignment problem solving
- threshold
score threshold below which indicices should be removed. Scores will always be between 0 and 1
- use_norm
character - which count data to use
- scale
logical - should signal be scaled
- custom_cost
matrix - matrix of size n (query cells) by p (seed cells) containing custom cost matrix. Used instead of vesalius cost matrix
- verbose
logical - should I be a noisy boy?
Details
The goal is to assign the best matching point between a seed set and a query set.
To do so, map_assays
will first extract a
biological signal. This can be latent space embeddings per cell, or by using
gene counts (or any other modality).
If using gene counts, there are a few more options available to you. First, you can select "variable_features" and vesalius will find the intersection between the variable features in your seed_assay and your query_assay. "all_features" will find the intersection of all genes across assays (even if they are not highly variable). Finally, you can also select a custom gene vector, containing only the gene set you are interested in.
The second step is to create a cost matrix. The creation of a cost matrix is achieved by pair-wise sum of various cost matrices. By default, the map_assays function will use "feature" and "niche" cost matrices. The feature matrix computes the pearson correlation between the seed and query using which ever signal was defined by the signal argument (variable_features) will compute the correlation between shared variable features in seed and query). The niche matrix will be computed by using the pearson correlation between niche expression profiles (based on signal). Niche are defined using the neighborhood argument where knn represent the k nearest neighbors algorithm (with k defining the number of nearest neighbors), depth represents the graph depth of a local neighborhood graph, and radius defining a spatial radius surrunding a center cell. The singal (expression or embedding) is average across all cells in the niche. The territory matrix will compare the average signal of vesalius territories between seed and query. The composition matrix will compute a frequency aware jaccard index between cell types present in a niche. Cell types must be assigned to seed and query vesalius objects (See add_cells function) Total cost matrix will be computed by computing the pairwise sum of the complement (1 - p ) of each cost matrix.
This cost matrix is then parsed to a Kuhn–Munkres algorithm that will generate point pairs that minimize the overall cost.
Since the algorithm complexity is O(n3), it can be time consuming to to run on larger data sets. As such, mapping will be approximated by dividing seed and query into batches defined by batch size. For an exact mapping ensure that batch_size is larger than the number of cells in both query and seed.
Finaly once the matches are found, the coordinates are mapped to its corresponding point and a new object is returned.
Examples
if (FALSE) { # \dontrun{
data(vesalius)
# Create Vesalius object for processing
vesalius <- build_vesalius_assay(coordinates, counts)
jitter_ves <- build_vesalius_assay(jitter_coord, jitter_counts)
mapped <- map_assays(vesalius, jitter_ves)
} # }