Featured image of post 利用Grounding DINO自动标注数据

利用Grounding DINO自动标注数据

使用Grounding DINO构建自己的YOLO TXT格式训练集

最近需要构建属于自己的YOLO训练数据集,如果手动标注2k张图片这工作量太大了,于是就寻找有什么可以自动标注的方法。碰巧的是, 知乎正好推荐了一篇相关的文章,里面介绍了Grounding DINO这个项目完美符合我的需求。那就不用多说了,直接修改他来实现我所需的功能。

# 需求

首先要明确一下我的需求:

  • 能根据prompt实现画框标注
  • 能对文件夹内的数据进行批量处理
  • 能将识别的结果保存为YOLO TXT格式

这个项目可以实现第一个功能,但对于第二和第三个功能就需要我自己来实现了。

# 实现

Grounding DINO可以在CPU上运行,不过为了效率考虑,最好还是运行在GPU上,为此我所有的代码都在Colab上实现。

Open In Colab

首先是环境配置,我们要挂载Google云盘以及安装Grounding DINO。

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
from google.colab import drive
drive.mount('/content/drive')

import os
HOME = os.getcwd()
print(HOME)

%cd {HOME}
!git clone https://github.com/IDEA-Research/GroundingDINO.git
%cd {HOME}/GroundingDINO
!pip install -q -e .
!pip install -q roboflow

CONFIG_PATH = os.path.join(HOME, "GroundingDINO/groundingdino/config/GroundingDINO_SwinB.cfg.py")
print(CONFIG_PATH, "; exist:", os.path.isfile(CONFIG_PATH))

%cd {HOME}
!mkdir {HOME}/weights
%cd {HOME}/weights
!wget -q https://github.com/IDEA-Research/GroundingDINO/releases/download/v0.1.0-alpha2/groundingdino_swinb_cogcoor.pth

WEIGHTS_NAME = "groundingdino_swinb_cogcoor.pth"
WEIGHTS_PATH = os.path.join(HOME, "weights", WEIGHTS_NAME)
print(WEIGHTS_PATH, "; exist:", os.path.isfile(WEIGHTS_PATH))

%cd {HOME}/GroundingDINO
from groundingdino.util.inference import load_model, load_image, predict, annotate
model = load_model(CONFIG_PATH, WEIGHTS_PATH)

通过上面这些代码,我们就成功配置好Grounding DINO并载入了模型。

下面就是我自己实现的训练集自动标注代码。

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
TRAIN_DATA_PATH = os.path.join("/content/my_data/images")
print(TRAIN_DATA_PATH, "; exist:", os.path.isdir(TRAIN_DATA_PATH))

TEXT_PROMPT = "face . phone"
BOX_TRESHOLD = 0.35
TEXT_TRESHOLD = 0.25

for filename in os.listdir(TRAIN_DATA_PATH):
    if filename.endswith(".png"):
        image_path = os.path.join(TRAIN_DATA_PATH, filename)
        image_source, image = load_image(image_path)
        
        boxes, logits, phrases = predict(
            model=model, 
            image=image, 
            caption=TEXT_PROMPT, 
            box_threshold=BOX_TRESHOLD, 
            text_threshold=TEXT_TRESHOLD
        )
        
        save_path = os.path.join(TRAIN_DATA_PATH,os.path.splitext(filename)[0])
        with open(save_path + ".txt", "w") as f:
          for i in range(len(phrases)):
            class_id = 0 if phrases[i] == 'face' else 1
            f.write(f"{class_id} {boxes[i][0]} {boxes[i][1]} {boxes[i][2]} {boxes[i][3]}\n")
        print(f"{filename} is over")

我们的代码会自动遍历文件夹下面所有的.png格式的文件,并调用Grounding DINO进行标注。在这段代码中,faceclass_id0phoneclass_id1

特别的,参考这篇issue,我们的prompt使用了.分割,这是为了避免不同顺序对结果的影响。

# 测试

由于拿到的数据是txt格式,人类很难识别,所以我们可以借助cv2将识别的结果在图片中框出,直接写个小脚本。

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
import os
import cv2

DATA_PATH = "D:\Code\Python\yolo"

for filename in os.listdir(DATA_PATH):
	if filename.endswith(".png"):
		image_path = os.path.join(DATA_PATH, filename)
		txt_path = os.path.join(DATA_PATH,os.path.splitext(filename)[0]) + '.txt'
		with open(txt_path, 'r') as f:
			# 读取图像文件
			image = cv2.imread(image_path)
			w = image.shape[1]
			h = image.shape[0]		
			while True:
				line = f.readline()
				if line:
					msg = line.split(" ")
					# print(x_center,",",y_center,",",width,",",height)
					x1 = int((float(msg[1]) - float(msg[3]) / 2) * w)  # x_center - width/2
					y1 = int((float(msg[2]) - float(msg[4]) / 2) * h)  # y_center - height/2
					x2 = int((float(msg[1]) + float(msg[3]) / 2) * w)  # x_center + width/2
					y2 = int((float(msg[2]) + float(msg[4]) / 2) * h)  # y_center + height/2
					print(x1,",",y1,",",x2,",",y2)
					if msg[0] == '0':
						# 画出框
						cv2.rectangle(image, (x1, y1), (x2, y2), (0, 255, 0), 2)
					else:
						cv2.rectangle(image, (x1, y1), (x2, y2), (255, 0, 0), 2)
				else:
					break		

		# 显示图像
		cv2.imshow('image', image)
		cv2.waitKey(0)
		
# cv2.destroyAllWindows()

这个模型识别的效果还是不错的,不过还是有些多余的识别需要手工去除。