module dataset


function validate_counts

validate_counts(counter, threshold, label)

Validates the counts in a counter dictionary against a threshold.

Args:

  • counter (collections.Counter): The counter dictionary containing the counts.
  • threshold (int): The minimum count threshold.
  • label (str): The label to be used in the assertion error message.

Raises:

  • AssertionError: If any count in the counter dictionary is less than the threshold.

function filter_and_encode

filter_and_encode(df, node_encoder, all_nodes, use_index=False)

Filters and encodes the given DataFrame based on the provided node encoder and all nodes.

Args:

  • df (pandas.DataFrame): The DataFrame to be filtered and encoded.
  • node_encoder (dict): A dictionary mapping node IDs to encoded values.
  • all_nodes (list): A list of all node IDs.
  • use_index (bool, optional): Whether to filter based on DataFrame index. Defaults to False.

Returns:

  • pandas.DataFrame: The filtered and encoded DataFrame.

function drop_small

drop_small(edges, numb)

Drop clones and cell types with less than ‘numb’ cells from the edges dataframe.

Parameters: edges (DataFrame): The dataframe containing the edges information. numb (int): The minimum number of cells required for a clone or cell type to be included.

Returns: DataFrame: The modified edges dataframe with small clones and cell types dropped.


function preprocess_data

preprocess_data(edges, overcl, spatial_edges, grid_edges)

Preprocesses the given data by filtering and filling missing values.

Args:

  • edges (DataFrame): The edges data.
  • overcl (DataFrame): The annotation data with clone and cell type labels.
  • spatial_edges (str): The type of spatial edges.
  • grid_edges (str): The type of grid edges.

Returns:

  • Tuple[DataFrame, DataFrame]: The preprocessed edges and overcl data.

function read_and_merge_embeddings

read_and_merge_embeddings(paths, edges, drop_less=10)

Read and merge the embeddings from spatial and RNA datasets.

Parameters:

  • paths (dict): A dictionary containing the file paths for the spatial and RNA datasets.
  • edges (pd.DataFrame): A DataFrame containing the edges of the graph.
  • drop_less (int): The minimum number of occurrences required for an edge to be kept.

Returns:

  • emb_vis (pd.DataFrame): The merged embeddings from the spatial dataset.
  • emb_rna (pd.DataFrame): The merged embeddings from the RNA dataset.
  • edges (pd.DataFrame): The filtered edges of the graph.
  • node_encoder (dict): A dictionary mapping node IDs to encoded node IDs.

function create_data_object

create_data_object(
    edges,
    emb_vis,
    emb_rna,
    node_encoder,
    sim=None,
    with_diploid=True
)

Create a data object for graph neural network training.

Args:

  • edges (pandas.DataFrame): DataFrame containing the edges of the graph.
  • emb_vis (pandas.DataFrame): DataFrame containing the spatial embeddings.
  • emb_rna (pandas.DataFrame): DataFrame containing the RNA embeddings.
  • node_encoder (dict): Dictionary mapping node IDs to their corresponding encodings.
  • sim (pandas.DataFrame, optional): Similarity matrix between clone values. Defaults to None.
  • with_diploid (bool, optional): Flag indicating whether to include diploid values in the encoding. Defaults to True.

Returns:

  • tuple: A tuple containing the data object and dictionaries for node, clone, and cell type encodings. If sim is provided, an additional similarity matrix is returned.

Raises:

  • AssertionError: If the data object is not valid or the shapes of the data arrays are not consistent.

function create_encoding_dict

create_encoding_dict(df, column, extras=[])

Create a dictionary that maps unique values in a column of a DataFrame to their corresponding indices.

Parameters:

  • df (pandas.DataFrame): The DataFrame containing the column.
  • column (str): The name of the column.
  • extras (list, optional): Additional values to exclude from the dictionary.

Returns:

  • dict: A dictionary mapping unique values to their corresponding indices.

This file was automatically generated via lazydocs.