Lucentrix Logo

SearchHound - Distributed Lucene Join Search

A research platform combining Lucene, distributed join execution, Bloom-filter optimized probing, and hierarchical query planning - built for real-time, relationship-aware search.

Overview

SearchHound is an experimental Lucentrix-native engine designed for real-time distributed JOIN execution over Lucene indexes. It introduces a hierarchical coordination model, Bloom-filter optimized hashing, adaptive join strategies, and hybrid RDBMS + Lucene indexing for high-speed relational lookups.

Architecture: Distributed Lucene Hash Join with Bloom Filters

This page documents the core architecture and distributed join protocol used by SearchHound. The system uses a hierarchical tree of SearchCoordinators and SearchLeaf nodes. Join planning is cost-based; build-side nodes generate Bloom filters which are merged and distributed to probe nodes to prune non-matching keys and drastically reduce network transfer.

Key Components

  • Root / SearchCoordinator: Global planner and aggregator.
  • Region / Zone SearchCoordinators: Mid-level aggregators for hierarchical distribution.
  • SearchLeaf nodes: Leaf nodes with Lucene index, join index (embedded RDBMS), and local Bloom filter cache.
  • BloomFilterService: Builds, compresses, diffs, and distributes Bloom filters.
  • JoinCoordinatorService: Cost-based strategy selection and join execution orchestration.

Join Execution Summary

  1. Client submits join query to Root SearchCoordinator.
  2. Planner selects build and probe sides using cardinality estimates.
  3. Build side SearchLeaf nodes extract join keys and produce compressed Bloom filters.
  4. SearchCoordinators merge filters into a global Bloom filter and distribute to probe nodes.
  5. Probe nodes use the filter to prune data, perform local joins and stream partial results upward.
  6. SearchCoordinators aggregate and merge partial results and apply final join semantics.
Root SearchCoordinator Region A SearchCoordinator Region B SearchCoordinator Region C SearchCoordinator SearchLeaf 1 SearchLeaf 2 SearchLeaf 3 SearchLeaf 4 SearchLeaf 5 SearchLeaf 6 Bloom Filter Network / BloomFilterService Merge · Compress · Distribute High-level architecture: Root → Regions → Tethers; Bloom filters distribute across hierarchy

Distributed Join Engine

SearchHound introduces a novel Bloom-filter optimized join mechanism that reduces network transfer by 60–90% in typical join scenarios. Join planning is adaptive and depends on cardinality, selectivity, and cluster topology.

  • Real-time distributed join execution
  • Bloom-filter pruning of non-matching keys
  • Cross-shard and cross-region join support
  • Runtime strategy switching
                Join Execution Flow:
                1. Query → Root Node
                2. Extract join keys
                3. Build Bloom filters
                4. Distribute filters to probe nodes
                5. Local Lucene + RDBMS join
                6. Aggregate results upward
                

Core Components

SearchHound

Main coordination layer with adaptive join selection.

SearchLeaf

Leaf Lucene node with RDBMS join index and Bloom probing.

SearchCoordinator

Aggregation node orchestrating the join pipeline.

Research Areas

Collaborate With Lucentrix Research

Contact us to explore research collaboration or distributed join experiments.

Get in Touch