thematic_search.SoftClusterTree
- class thematic_search.SoftClusterTree(cluster_matrices: list, cluster_tree: dict, sparsity_threshold: float = 0.0)
A hierarchical soft clustering structure storing inclusion strengths as uint8 sparse matrices (0-255, divide by 255 to recover floats).
The cluster hierarchy may be a DAG (directed acyclic graph), meaning a node may have multiple parents. Edges must respect the layer ordering: if (s, k) is a parent of (l, i) then s > l. The inclusion strength consistency assumption must hold for every edge: c^s_k(r) >= c^l_i(r) for all records r.
- Parameters:
cluster_matrices (list of np.ndarray) – List of L dense float arrays, one per layer. cluster_matrices[l] has shape (n_docs, n_clusters_at_layer_l), with values in [0, 1].
cluster_tree (dict) – A dict mapping (layer, cluster_number) tuples to lists of children tuples, e.g. {(2, 0): [(1, 0), (1, 1)], (2, 1): [(1, 2)], …} The unique root node is the key with no parents, i.e. the node that does not appear in any value list.
sparsity_threshold (float, optional (default=0.0)) – Inclusion strengths below this value are set to zero before sparsification. Useful for cleaning up near-zero soft memberships.
- __init__(cluster_matrices: list, cluster_tree: dict, sparsity_threshold: float = 0.0)
Methods
__init__(cluster_matrices, cluster_tree[, ...])children(idx)Return the child indices of a cluster.
cluster(layer, cluster_number)Convenience method to construct a Cluster from a (layer, cluster_number) pair.
inside(expr[, threshold])Return indices of documents satisfying the cluster expression with inclusion strength >= threshold.
join(indices)Return the indices of the least upper bounds (LUBs) of a set of clusters, i.e. their lowest common ancestors in the DAG.
parents(idx)Return the parent indices of a cluster, or an empty list if it is the root.
strengths(expr[, indices, as_float])Return the inclusion strengths of a set of documents for a given cluster expression.
to_float(uint8_value)Convert uint8 inclusion strength to float in [0, 1].
to_int(float_value)Convert float inclusion strength in [0, 1] to uint8.
Attributes
cluster_matricesReconstruct the dense cluster matrices for saving purposes.
topics