SearchHound

Architecture: Distributed Lucene Hash Join with Bloom Filters

This page documents the core architecture and distributed join protocol used by SearchHound. The system uses a hierarchical tree of SearchCoordinators and SearchLeaf nodes. Join planning is cost-based; build-side nodes generate Bloom filters which are merged and distributed to probe nodes to prune non-matching keys and drastically reduce network transfer.

Key Components

Root / SearchCoordinator: Global planner and aggregator.
Region / Zone SearchCoordinators: Mid-level aggregators for hierarchical distribution.
SearchLeaf nodes: Leaf nodes with Lucene index, join index (embedded RDBMS), and local Bloom filter cache.
BloomFilterService: Builds, compresses, diffs, and distributes Bloom filters.
JoinCoordinatorService: Cost-based strategy selection and join execution orchestration.

Join Execution Summary

Client submits join query to Root SearchCoordinator.
Planner selects build and probe sides using cardinality estimates.
Build side SearchLeaf nodes extract join keys and produce compressed Bloom filters.
SearchCoordinators merge filters into a global Bloom filter and distribute to probe nodes.
Probe nodes use the filter to prune data, perform local joins and stream partial results upward.
SearchCoordinators aggregate and merge partial results and apply final join semantics.

Distributed Join Engine

SearchHound introduces a novel Bloom-filter optimized join mechanism that reduces network transfer by 60–90% in typical join scenarios. Join planning is adaptive and depends on cardinality, selectivity, and cluster topology.

Real-time distributed join execution
Bloom-filter pruning of non-matching keys
Cross-shard and cross-region join support
Runtime strategy switching

                Join Execution Flow:
                1. Query → Root Node
                2. Extract join keys
                3. Build Bloom filters
                4. Distribute filters to probe nodes
                5. Local Lucene + RDBMS join
                6. Aggregate results upward

Core Components

Main coordination layer with adaptive join selection.

SearchLeaf

Leaf Lucene node with RDBMS join index and Bloom probing.

SearchCoordinator

Aggregation node orchestrating the join pipeline.

Research Areas

Bloom filter optimization and compression
Join-aware data placement & partitioning
Adaptive distributed query planning
RDBMS–Lucene hybrid indexing models
Federated joins across clusters

Collaborate With Lucentrix Research

Contact us to explore research collaboration or distributed join experiments.

Get in Touch

GitHub: lucentrixdev/searchhound

SearchHound - Distributed Lucene Join Search

Overview