Florence-2模型在实例分割中的应用

Florence-2模型因其训练数据的高质量，在视觉问答、图像描述、物体检测等多种任务中保持领先地位。本文将探讨如何利用Florence-2进行实例分割，这是一种结合了物体检测和语义分割的技术。在检测到物体后，将为边界框内的每个像素分配一个类别，从而实现更高准确度的物体分类。

设置Colab环境

首先，设置Colab以使用GPU，并安装必要的库，包括transformers、einops、timm和Roboflow Supervision。


        !nvidia-smi
        !pip install -q transformers einops timm flash_attn
        !pip install -q roboflow git+https://github.com/roboflow/supervision.git

导入必要的库

导入以下库以使用模型和注释器。


        from transformers import AutoProcessor, AutoModelForCausalLM
        import requests
        from PIL import Image
        import supervision as sv

加载Florence-2模型

从Hugging Face获取模型，并确保有一个访问令牌。


        CHECKPOINT = "microsoft/Florence-2-base-ft"
        model = AutoModelForCausalLM.from_pretrained(CHECKPOINT, trust_remote_code=True)
        processor = AutoProcessor.from_pretrained(CHECKPOINT, trust_remote_code=True)

创建分割函数

创建一个函数来生成分割结果。该函数将添加任务提示、处理提示和图像，并准备数据以供模型生成预测。


        from typing import Dict

        def run_example(task_prompt: str, text_input: str="", image=None) -> Dict:
              prompt = task_prompt + text_input
              inputs = processor(text=prompt, images=image, return_tensors="pt")
              generated_ids = model.generate(
                    input_ids=inputs["input_ids"],
                    pixel_values=inputs["pixel_values"],
                    max_new_tokens=1024,
                    num_beams=3
              )
              generated_text = processor.batch_decode(generated_ids, skip_special_tokens=False)[0]
              parsed_answer = processor.post_process_generation(generated_text, task=task_prompt, image_size=(image.width, image.height))
              return parsed_answer

使用Supervision可视化预测

创建另一个函数，使用Supervision在指定图像上生成预测。


        mask_annotator = sv.MaskAnnotator(color_lookup=sv.ColorLookup.INDEX)

        def annotate_seg(image, detections):
              annotated = mask_annotator.annotate(image, detections=detections)
              return annotated

使用Florence-2进行实例分割

现在可以开始在图像上进行预测了。首先下载一张男子抱着狗的图片，并在文本输入中指定想要检测的物体，例如背包。


        !wget -q https://media.roboflow.com/notebooks/examples/dog.jpeg
        dog_image_path = "dog.jpeg"
        dog_image = Image.open(dog_image_path)
        dog_image

接下来，运行函数以调用Florence-2，输入文本输入、任务（本例中为分割）和图像。


        text_input = "the backpack"
        answer = run_example(task_prompt="<REFERRING_EXPRESSION_SEGMENTATION>", text_input=text_input, image=dog_image)

最后，将原始图像和预测图像并排显示。


        detections = sv.Detections.from_lmm(sv.LMM.FLORENCE_2, answer, resolution_wh=dog_image.size)
        annotated_image = annotate_seg(dog_image.copy(), detections)
        sv.plot_images_grid(
              images=[dog_image, annotated_image],
              grid_size=(1, 2),
              titles=['source image', 'segmented image']
        )

检测小物体

Florence-2同样适用于检测小物体。例如，在足球场上的球虽然很小，但仍然可以被检测到。


        !wget --no-check-certificate 'https://drive.google.com/uc?export=download&id=1LR84dxRJmmdLk60HJIZg4wSZ8ZO0fOV4' -O players.jpg
        players_image_path = "/content/players.jpg"
        players_image = Image.open(players_image_path)
        players_image = players_image.convert("RGB")
        players_image

使用与上述类似的代码，可以检测到球。


        text_input = "ball"
        answer = run_example(task_prompt="<REFERRING_EXPRESSION_SEGMENTATION>", text_input=text_input, image=players_image)
        detections = sv.Detections.from_lmm(sv.LMM.FLORENCE_2, answer, resolution_wh=players_image.size)
        annotated_image = annotate_seg(players_image.copy(), detections)
        sv.plot_images_grid(
              images=[players_image, annotated_image],
              grid_size=(1, 2),
              titles=['source image', 'segmented image']
        )

尽管检测结果较为模糊，但仍然可以看到检测到的紫色轮廓。

Florence-2：视觉任务的统一表示模型

Florence-2是一个在多种视觉任务中表现出色，具有零样本和微调能力的模型。它通过大规模的FLD-5B数据集，实现了与大型模型相媲美的结果。

计算机视觉领域的基础模型：Florence模型解析

本文深入探讨了微软发布的Florence模型，分析了其在计算机视觉任务中的广泛应用和深远影响。

Florence-2模型在实例分割中的应用

设置Colab环境

导入必要的库

加载Florence-2模型

创建分割函数

使用Supervision可视化预测

使用Florence-2进行实例分割

检测小物体

Florence-2：视觉任务的统一表示模型

计算机视觉领域的基础模型：Florence模型解析

沪ICP备2024098111号-1

上海秋旦网络科技中心：上海市奉贤区金大公路8218号1幢联系电话：17898875485

Florence-2模型在实例分割中的应用

设置Colab环境

导入必要的库

加载Florence-2模型

创建分割函数

使用Supervision可视化预测

使用Florence-2进行实例分割

检测小物体

Florence-2：视觉任务的统一表示模型

计算机视觉领域的基础模型：Florence模型解析

沪ICP备2024098111号-1

上海秋旦网络科技中心：上海市奉贤区金大公路8218号1幢 联系电话：17898875485

上海秋旦网络科技中心：上海市奉贤区金大公路8218号1幢联系电话：17898875485