Florence-2模型因其训练数据的高质量,在视觉问答、图像描述、物体检测等多种任务中保持领先地位。本文将探讨如何利用Florence-2进行实例分割,这是一种结合了物体检测和语义分割的技术。在检测到物体后,将为边界框内的每个像素分配一个类别,从而实现更高准确度的物体分类。
首先,设置Colab以使用GPU,并安装必要的库,包括transformers、einops、timm和Roboflow Supervision。
!nvidia-smi
!pip install -q transformers einops timm flash_attn
!pip install -q roboflow git+https://github.com/roboflow/supervision.git
导入以下库以使用模型和注释器。
from transformers import AutoProcessor, AutoModelForCausalLM
import requests
from PIL import Image
import supervision as sv
从Hugging Face获取模型,并确保有一个访问令牌。
CHECKPOINT = "microsoft/Florence-2-base-ft"
model = AutoModelForCausalLM.from_pretrained(CHECKPOINT, trust_remote_code=True)
processor = AutoProcessor.from_pretrained(CHECKPOINT, trust_remote_code=True)
创建一个函数来生成分割结果。该函数将添加任务提示、处理提示和图像,并准备数据以供模型生成预测。
from typing import Dict
def run_example(task_prompt: str, text_input: str="", image=None) -> Dict:
prompt = task_prompt + text_input
inputs = processor(text=prompt, images=image, return_tensors="pt")
generated_ids = model.generate(
input_ids=inputs["input_ids"],
pixel_values=inputs["pixel_values"],
max_new_tokens=1024,
num_beams=3
)
generated_text = processor.batch_decode(generated_ids, skip_special_tokens=False)[0]
parsed_answer = processor.post_process_generation(generated_text, task=task_prompt, image_size=(image.width, image.height))
return parsed_answer
创建另一个函数,使用Supervision在指定图像上生成预测。
mask_annotator = sv.MaskAnnotator(color_lookup=sv.ColorLookup.INDEX)
def annotate_seg(image, detections):
annotated = mask_annotator.annotate(image, detections=detections)
return annotated
现在可以开始在图像上进行预测了。首先下载一张男子抱着狗的图片,并在文本输入中指定想要检测的物体,例如背包。
!wget -q https://media.roboflow.com/notebooks/examples/dog.jpeg
dog_image_path = "dog.jpeg"
dog_image = Image.open(dog_image_path)
dog_image
接下来,运行函数以调用Florence-2,输入文本输入、任务(本例中为分割)和图像。
text_input = "the backpack"
answer = run_example(task_prompt="<REFERRING_EXPRESSION_SEGMENTATION>", text_input=text_input, image=dog_image)
最后,将原始图像和预测图像并排显示。
detections = sv.Detections.from_lmm(sv.LMM.FLORENCE_2, answer, resolution_wh=dog_image.size)
annotated_image = annotate_seg(dog_image.copy(), detections)
sv.plot_images_grid(
images=[dog_image, annotated_image],
grid_size=(1, 2),
titles=['source image', 'segmented image']
)
Florence-2同样适用于检测小物体。例如,在足球场上的球虽然很小,但仍然可以被检测到。
!wget --no-check-certificate 'https://drive.google.com/uc?export=download&id=1LR84dxRJmmdLk60HJIZg4wSZ8ZO0fOV4' -O players.jpg
players_image_path = "/content/players.jpg"
players_image = Image.open(players_image_path)
players_image = players_image.convert("RGB")
players_image
使用与上述类似的代码,可以检测到球。
text_input = "ball"
answer = run_example(task_prompt="<REFERRING_EXPRESSION_SEGMENTATION>", text_input=text_input, image=players_image)
detections = sv.Detections.from_lmm(sv.LMM.FLORENCE_2, answer, resolution_wh=players_image.size)
annotated_image = annotate_seg(players_image.copy(), detections)
sv.plot_images_grid(
images=[players_image, annotated_image],
grid_size=(1, 2),
titles=['source image', 'segmented image']
)
尽管检测结果较为模糊,但仍然可以看到检测到的紫色轮廓。