Cross-Detector Descriptor Fusion: Scale Control and Spatial Alignment for Local Feature Matching

Sossi, Frank Thomas

Cross-Detector Descriptor Fusion: Scale Control and Spatial Alignment for Local Feature Matching

dc.contributor.advisor	Olson, Clark F
dc.contributor.author	Sossi, Frank Thomas
dc.date.accessioned	2026-04-20T15:24:23Z
dc.date.available	2026-04-20T15:24:23Z
dc.date.issued	2026-04-20
dc.date.submitted	2026
dc.description	Thesis (Master's)--University of Washington, 2026
dc.description.abstract	Cross-Detector Descriptor Fusion:Scale Control and Spatial Alignment for Local Feature Matching Frank Sossi Chair of the Supervisory Committee: Committee Chair Professor Clark Olson Computing & Software Systems Local feature descriptors are fundamental to many computer vision applications including SLAM, structure from motion, and image retrieval. This thesis evaluates two approaches to improving local feature matching: using multiple detectors as a quality filter for keypoint selection, and fusing complementary descriptors to combine their strengths. We show that spatial intersection between different keypoint detectors acts as a quality filter. When different detection methods, whether SIFT and SURF or SIFT and KeyNet, both identify a keypoint at the same location, this consensus indicates a distinctive feature. Descriptors computed at intersection keypoints consistently outperform those on single de- tector sets, with HardNet achieving 82.1% mAP on SIFT-KeyNet intersection, a 25% relative improvement and the best single descriptor result in our study. In order to evaluate color descriptors we re-implemented a color version of the HPatches patch benchmark, allowing us to evaluate color aware descriptors. Using this dataset, we show that fusing the color histogram descriptor HoNC with learned CNN descriptors yields substantial improvements: HoNC+SOSNet concatenation achieves 50.6% mAP on patch matching, outperforming all individual descriptors. HoNC’s strong discriminative capability (high verification to matching ratio) complements the CNN’s matching optimized represen- tations. Cross family fusion (SIFT+CNN) requires pre-fusion L2 normalization to ensure equal contribution from each descriptor; with proper normalization, SIFT+HardNet achieves 46.0% mAP on patches. Keypoint scale is also a dominant factor: filtering to large scale keypoints yields 39% relative improvement for SIFT and 21% for CNN descriptors. We develop DescriptorWorkbench, an open source evaluation framework, and conduct over 100 experiments. The results show that keypoint quality determined by detector con- sensus and scale has greater impact on matching performance than descriptor algorithm choice alone.
dc.embargo.terms	Open Access
dc.format.mimetype	application/pdf
dc.identifier.other	Sossi_washington_0250O_29297.pdf
dc.identifier.uri	https://hdl.handle.net/1773/55427
dc.language.iso	en_US
dc.rights	CC BY
dc.subject	Computer vision
dc.subject	Image matching
dc.subject	Keypoint Descriptors
dc.subject	Computer science
dc.subject.other	To Be Assigned
dc.title	Cross-Detector Descriptor Fusion: Scale Control and Spatial Alignment for Local Feature Matching
dc.type	Thesis

Files

Original bundle

Now showing 1 - 1 of 1

Name:: Sossi_washington_0250O_29297.pdf
Size:: 465.94 KB
Format:: Adobe Portable Document Format

Download

Collections

To Be Assigned