Cross-Detector Descriptor Fusion: Scale Control and Spatial Alignment for Local Feature Matching
| dc.contributor.advisor | Olson, Clark F | |
| dc.contributor.author | Sossi, Frank Thomas | |
| dc.date.accessioned | 2026-04-20T15:24:23Z | |
| dc.date.available | 2026-04-20T15:24:23Z | |
| dc.date.issued | 2026-04-20 | |
| dc.date.submitted | 2026 | |
| dc.description | Thesis (Master's)--University of Washington, 2026 | |
| dc.description.abstract | Cross-Detector Descriptor Fusion:Scale Control and Spatial Alignment for Local Feature Matching Frank Sossi Chair of the Supervisory Committee: Committee Chair Professor Clark Olson Computing & Software Systems Local feature descriptors are fundamental to many computer vision applications including SLAM, structure from motion, and image retrieval. This thesis evaluates two approaches to improving local feature matching: using multiple detectors as a quality filter for keypoint selection, and fusing complementary descriptors to combine their strengths. We show that spatial intersection between different keypoint detectors acts as a quality filter. When different detection methods, whether SIFT and SURF or SIFT and KeyNet, both identify a keypoint at the same location, this consensus indicates a distinctive feature. Descriptors computed at intersection keypoints consistently outperform those on single de- tector sets, with HardNet achieving 82.1% mAP on SIFT-KeyNet intersection, a 25% relative improvement and the best single descriptor result in our study. In order to evaluate color descriptors we re-implemented a color version of the HPatches patch benchmark, allowing us to evaluate color aware descriptors. Using this dataset, we show that fusing the color histogram descriptor HoNC with learned CNN descriptors yields substantial improvements: HoNC+SOSNet concatenation achieves 50.6% mAP on patch matching, outperforming all individual descriptors. HoNC’s strong discriminative capability (high verification to matching ratio) complements the CNN’s matching optimized represen- tations. Cross family fusion (SIFT+CNN) requires pre-fusion L2 normalization to ensure equal contribution from each descriptor; with proper normalization, SIFT+HardNet achieves 46.0% mAP on patches. Keypoint scale is also a dominant factor: filtering to large scale keypoints yields 39% relative improvement for SIFT and 21% for CNN descriptors. We develop DescriptorWorkbench, an open source evaluation framework, and conduct over 100 experiments. The results show that keypoint quality determined by detector con- sensus and scale has greater impact on matching performance than descriptor algorithm choice alone. | |
| dc.embargo.terms | Open Access | |
| dc.format.mimetype | application/pdf | |
| dc.identifier.other | Sossi_washington_0250O_29297.pdf | |
| dc.identifier.uri | https://hdl.handle.net/1773/55427 | |
| dc.language.iso | en_US | |
| dc.rights | CC BY | |
| dc.subject | Computer vision | |
| dc.subject | Image matching | |
| dc.subject | Keypoint Descriptors | |
| dc.subject | Computer science | |
| dc.subject.other | To Be Assigned | |
| dc.title | Cross-Detector Descriptor Fusion: Scale Control and Spatial Alignment for Local Feature Matching | |
| dc.type | Thesis |
Files
Original bundle
1 - 1 of 1
Loading...
- Name:
- Sossi_washington_0250O_29297.pdf
- Size:
- 465.94 KB
- Format:
- Adobe Portable Document Format
