EVōC: Embedding Vector Oriented Clustering
EVōC (pronounced as “evoke”) provides Embedding Vector Oriented Clustering.
EVōC (Embedding Vector Oriented Clustering) is a powerful clustering algorithm designed specifically for high-dimensional embedding vectors such as CLIP-vectors, sentence-transformers output, and other dense vector representations.
The algorithm combines a node embedding approach (related to UMAP) with density-based clustering (related to HDBSCAN), providing improved efficiency and quality for clustering high-dimensional embedding vectors.
Key Features
Optimized for High-Dimensional Embeddings: Specifically designed for modern embedding vectors
Multi-Layer Clustering: Provides hierarchical clustering with multiple granularity levels
Performance Optimized: Uses Numba for high-performance computation
Flexible Parameters: Extensive parameter set for fine-tuning clustering behavior
Scikit-learn Compatible: Follows scikit-learn API conventions
Quick Start
from evoc import EVoC
import numpy as np
# Generate sample data
X = np.random.rand(1000, 512) # 1000 samples, 512-dimensional embeddings
# Initialize and fit the clusterer
clusterer = EVoC()
labels = clusterer.fit_predict(X)
# Access cluster layers and membership strengths
print(f"Number of clusters: {len(np.unique(labels[labels >= 0]))}")
print(f"Number of cluster layers: {len(clusterer.cluster_layers_)}")