Blip-2 - Bootstrapping Language-image Pre-training with Frozen Image Encoders and Large Language Models

arXiv V1: BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models