module dataset
function validate_counts
validate_counts(counter, threshold, label)
Validates the counts in a counter dictionary against a threshold.
Args:
counter
(collections.Counter): The counter dictionary containing the counts.threshold
(int): The minimum count threshold.label
(str): The label to be used in the assertion error message.
Raises:
AssertionError
: If any count in the counter dictionary is less than the threshold.
function filter_and_encode
filter_and_encode(df, node_encoder, all_nodes, use_index=False)
Filters and encodes the given DataFrame based on the provided node encoder and all nodes.
Args:
df
(pandas.DataFrame): The DataFrame to be filtered and encoded.node_encoder
(dict): A dictionary mapping node IDs to encoded values.all_nodes
(list): A list of all node IDs.use_index
(bool, optional): Whether to filter based on DataFrame index. Defaults to False.
Returns:
pandas.DataFrame
: The filtered and encoded DataFrame.
function drop_small
drop_small(edges, numb)
Drop clones and cell types with less than ‘numb’ cells from the edges dataframe.
Parameters: edges (DataFrame): The dataframe containing the edges information. numb (int): The minimum number of cells required for a clone or cell type to be included.
Returns: DataFrame: The modified edges dataframe with small clones and cell types dropped.
function preprocess_data
preprocess_data(edges, overcl, spatial_edges, grid_edges)
Preprocesses the given data by filtering and filling missing values.
Args:
edges
(DataFrame): The edges data.overcl
(DataFrame): The annotation data with clone and cell type labels.spatial_edges
(str): The type of spatial edges.grid_edges
(str): The type of grid edges.
Returns:
Tuple[DataFrame, DataFrame]
: The preprocessed edges and overcl data.
function read_and_merge_embeddings
read_and_merge_embeddings(paths, edges, drop_less=10)
Read and merge the embeddings from spatial and RNA datasets.
Parameters:
- paths (dict): A dictionary containing the file paths for the spatial and RNA datasets.
- edges (pd.DataFrame): A DataFrame containing the edges of the graph.
- drop_less (int): The minimum number of occurrences required for an edge to be kept.
Returns:
- emb_vis (pd.DataFrame): The merged embeddings from the spatial dataset.
- emb_rna (pd.DataFrame): The merged embeddings from the RNA dataset.
- edges (pd.DataFrame): The filtered edges of the graph.
- node_encoder (dict): A dictionary mapping node IDs to encoded node IDs.
function create_data_object
create_data_object(
edges,
emb_vis,
emb_rna,
node_encoder,
sim=None,
with_diploid=True
)
Create a data object for graph neural network training.
Args:
edges
(pandas.DataFrame): DataFrame containing the edges of the graph.emb_vis
(pandas.DataFrame): DataFrame containing the spatial embeddings.emb_rna
(pandas.DataFrame): DataFrame containing the RNA embeddings.node_encoder
(dict): Dictionary mapping node IDs to their corresponding encodings.sim
(pandas.DataFrame, optional): Similarity matrix between clone values. Defaults to None.with_diploid
(bool, optional): Flag indicating whether to include diploid values in the encoding. Defaults to True.
Returns:
tuple
: A tuple containing the data object and dictionaries for node, clone, and cell type encodings. Ifsim
is provided, an additional similarity matrix is returned.
Raises:
AssertionError
: If the data object is not valid or the shapes of the data arrays are not consistent.
function create_encoding_dict
create_encoding_dict(df, column, extras=[])
Create a dictionary that maps unique values in a column of a DataFrame to their corresponding indices.
Parameters:
df
(pandas.DataFrame): The DataFrame containing the column.column
(str): The name of the column.extras
(list, optional): Additional values to exclude from the dictionary.
Returns:
dict
: A dictionary mapping unique values to their corresponding indices.
This file was automatically generated via lazydocs.