05-cellular_niches.qmd

# Identifying spatial domains with unsupervised clustering

```{r 05-code chunk timing, include = FALSE}
knitr::knit_hooks$set(time_it = local({
  now <- NULL
  function(before, options) {
    if (before) {
      # record the current time before each chunk
      now <<- Sys.time()
    } else {
      # calculate the time difference after a chunk
      res <- difftime(Sys.time(), now, units = "secs")
      # return a character string to show the time in bold and right-aligned, with extra spacing
      paste("<div style='text-align: right;'><em>Time for this code chunk to run with ", nCores, "cores: ", round(res, 2), "s</div></em>")
    }
  }
}))
```

Beyond spatial relationships between cell types, imaging datasets also contain another source of rich information - spatial domains. To give an idea of what spatial domains might visually look like, we've provided an image on the right, where we can clearly map out our healthy epithelial tissue spatial domain on the left of the image, and our immune and tumour domains on the right of the image. <img src="images/IMC_colon.png" align="right" style="height: 200px; border: 0px"/>

However, spatial domains tend to be highly dependent on the biological question being answered. For example, when your primary tissue of interest are solid tumours, spatial domain analysis can provide insights into proportion of tumour domains vs immune domains, or how tumour domains differ between progressive and non-progressive cancers. Alternatively, if your primary question of interest is diabetes, spatial domains can provide insights into marker or cell type differences in your pancreatic islets.

In this section, we'll be exploring the use of `lisaClust` on two different datasets to identify spatial domains and help predict patient survival.

```{r 05-load libraries, echo=FALSE, results="hide", warning=FALSE}
suppressPackageStartupMessages({
  library(lisaClust)
  library(spicyR)
  library(ggplot2)
  library(SingleCellExperiment)
  library(SpatialDatasets)
})
```

```{r 05-libraries, eval = FALSE}
library(lisaClust)
library(spicyR)
library(ggplot2)
library(SingleCellExperiment)
library(SpatialDatasets)
```

```{r 05-set parameters}
# set parameters
set.seed(51773)
use_mc <- TRUE
if (use_mc) {
  nCores <- max(parallel::detectCores()/2, 1)
} else {
  nCores <- 1
}
BPPARAM <- simpleSeg:::generateBPParam(nCores)
theme_set(theme_classic())
```

## lisaClust

Clustering Local Indicators of Spatial Association (LISA) functions is a methodology for identifying consistent spatial organisation of multiple cell-types in an unsupervised way. This can be used to enable the characterisation of interactions between multiple cell-types simultaneously and can complement traditional pairwise analysis. In our implementation our LISA curves are a localised summary of an L-function from a Poisson point process model. Our framework `lisaClust` can be used to provide a high-level summary of cell type co-localisation in high-parameter spatial cytometry data, facilitating the identification of distinct tissue compartments or complex cellular microenvironments.

<img src="images/lisaClust_fig1.jpg" align="center" style="height: 300px; border: 0px"/>

The workflow that lisaClust uses to identify regions of tissue with similar localisation patterns of cells contains multiple key steps. First, cells are treated as objects and assigned coordinates in an x-y space. Second, distances between all cells are calculated and then, by modeling the cells as a multi-type Poisson point process, the distances are used to calculate local indicators of spatial association (LISA). These LISA curves summarize the spatial association between each cell and a specific cell type over a range of radii. The LISA curves are calculated for each cell and cell type and then clustered to assign a region label for each cell.

### Case study: Keren

We will start by reading in the [Keren 2018](datasets.qmd) dataset from the `SpatialDatasets` package as a `SingleCellExperiment` object. Here the data is in a format consistent with that outputted by CellProfiler.

```{r 05-load keren data}
kerenSPE <- SpatialDatasets::spe_Keren_2018()
```

#### Generate LISA curves

For the purpose of this demonstration, we will be using only images 5 and 6 of the dataset.

```{r 05-filter keren dataset}
kerenSPE <- kerenSPE[,kerenSPE$imageID %in% c("5", "6")]
```

This data comes with pre-annotated cell types, sowe can move directly to performing k-means clustering on the local indicators of spatial association (LISA) functions using the `lisaClust` function. The image ID, cell type column, and spatial coordinates can be specified using the `imageID`, `cellType`, and `spatialCoords` arguments respectively. We will identify 5 regions of co-localisation by setting `k = 5`.

```{r 05-lisaClust, message = FALSE, warning = FALSE}
kerenSPE <- lisaClust(kerenSPE,
                      k = 5)
```

These regions are stored in `colData` and can be extracted.

```{r 05-extract regions}
colData(kerenSPE)[, c("imageID", "region")] |>
  head(10)
```

#### Examine cell type enrichment

lisaClust also provides a convenient function, `regionMap`, for examining which cell types are located in which regions. In this example, we use this to check which cell types appear more frequently in each region than expected by chance.

```{r 05-regionMap}
regionMap(kerenSPE,
  type = "bubble")
```

Above, we can see that tumour cells are concentrated in region 5, and immune cells are concentrated in region 1 and 4. We can further segregate these cells by increasing the number of clusters, i.e., increasing the parameter `k =` in the `lisaClust` function.

::: {.callout-tip title="Choosing the number of spatial domains"}
**How do we choose an appropriate value for `k`?**

-   The choice of `k` depends largely on the biological question being asked. For instance, if we are interested in understanding the interactions between immune cells in a tumor microenvironment, the number of clusters should reflect the known biological subtypes of immune cells, such as T cells, B cells, macrophages, etc. In this case, a larger value of `k` may be needed to capture the diversity within these immune cell populations.

-   On the other hand, if the focus is on interactions between immune cells and tumor cells, we might choose a smaller value of `k` to group immune cells into broader categories.

-   Additionally, methods like the Gap statistic, Jump statistic, or Silhouette score could be employed to determine an optimal value of `k`.
:::

#### Plot identified regions

We can use the `hatchingPlot` function to visualise all 5 regions and 17 cell types simultaneously for a specific image or set of images. The output is a `ggplot` object where the regions are marked by different hatching patterns. The `nbp` argument can be used to tune the granularity of the grid used for defining regions.

```{r 05-hatchingPlot, fig.height=7, fig.width=9, time_it = TRUE}
hatchingPlot(kerenSPE, useImages = 5, nbp = 300)
```

In accordance with the `regionMap` output, we can see that region 5 is mostly made up of tumour cells, and region 2 and 4 both contain our immune cell populations.

::: {.callout-tip title="Combining localisation scores with spatial domains"}
**How could results from lisaClust be used in conjunction with results from spicyR?**

lisaClust provides a high-resolution view of the tissue architecture, while spicyR can quantify how these spatial relationships or features contribute to clinical outcomes. spicyR's L-function metric can be used to determine the degree of localisation or dispersion between different spatial domains. For instance, we can look at co-localisation between region 5 (our tumour cells) and regions 2 or 4 (our immune cells).
:::

## sessionInfo

```{r}
sessionInfo()
```