Vision language models are blind