Carnegie Mellon University
Browse

SegSub: Enhancing Robustness in Vision-Language Models with Knowledge Conflicts and Counterfactual Image Augmentation

dataset
posted on 2025-01-28, 20:43 authored by Peter CarragherPeter Carragher

We propose SegSub, a Segmentation Substitution framework to improve the robustness of VLMs by introducing augmentations that modify object features (shape or color), add counterfactual images, and knowledge conflicts between image sources. Existing VLMs perform poorly on counterfactual examples (<30% accuracy) and fail to address any knowledge conflicts (<1% accuracy). We mitigate this by finetuning models on SegSub, which leads to significant improvements in reasoning over counterfactual samples. We find a link between hallucinations and image context, with GPT-4o prone to hallucination when presented with counterfactual examples in SegSub.

History

Date

2025-01-28

Usage metrics

    Licence

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC