We propose SegSub, a Segmentation Substitution framework to improve the robustness of VLMs by introducing augmentations that modify object features (shape or color), add counterfactual images, and knowledge conflicts between image sources. Existing VLMs perform poorly on counterfactual examples (<30% accuracy) and fail to address any knowledge conflicts (<1% accuracy). We mitigate this by finetuning models on SegSub, which leads to significant improvements in reasoning over counterfactual samples. We find a link between hallucinations and image context, with GPT-4o prone to hallucination when presented with counterfactual examples in SegSub.