用yolo v3训练自己的模型并进行客流检测

项目背景

keras yolo客流量检测.md

![result.jpg](https://raw.githubusercontent.com/AllentDan/PedestrianDetection/master/yoloImage/result.png)

目录 (Table of Contents)

[TOCM]

[TOC]

下载使用yolo官方权重文件测试

yolo介绍

不多说,想知道yolo检测原理的直接看这篇YOLO文章详细解读。yolo是美国某大学的研究生弄出来的一个端到端(end to end)网络,还上了TED做演讲,看看人家研究生,再看看我们哈哈哈。所谓端到端,我理解的在目标检测中也就是不用先分类,再切割图片识别,而是直接对图片预测目标位置和大小,也就是将two stages变成one stage检测。

配置环境

大家可以参考csdn博主王氏小明YOLO基础教程(一):Python环境搭建与测试,也可以参考我前面的博客

直接yolo检测客流看看效果

不得不说,即使是高清的动物世界视频,yolo官方权重检测出的效果其实也还可以,但是还是有许多误检测。更不要说老师给的如上图所示的av画质视频效果了。当然,yolo检测的速度还是很快的,双GTX 1080 ti gpu下可以达到30fps。从输出看,它的bounding box也比two stages的滑动窗口更能准确表示物体的位置和大小。

用YOLO训练自己的权重文件

先做样本集

可以说,训练深度网络,大概是三分之一的时间做样本,三分之一的时间敲代码,还有三分之一的时间调参数。制作样本,因为班里老师给的是视频,而yolo使用的标注工具是读图片的,所以省去自己写脚本弄,需要先改成每帧或几十帧获取一幅图片放到一个文件夹里面。

1
//代码落公司电脑了,很简单,百度“python 视频转图片”然后自己调调就好了

labelImage标注

按照项目背景中的参考博客,创建好文件夹以后(其实里面部分文件夹似乎没有用,但是你还是按照他的来吧),使用工具LabelImg,里面给的链接已经失效,所以我又在项目背景中列了另外一个,是labelImage的源码,它会教你使用。

在labelImage工具图形界面三步走,标注玩完一百幅图片大概需要半个到一个小时。请尽量将尽可能多的不同的图片放进去标注。你可以选择就标注一个类别head,也可以多加几类其他的,如果你内存够的话。

多建一个分类放负样本

嘣~结果非常好,暂时就先不贴图了,后面再更,包括引入滑动窗口对整张图片做识别。

………………………..

好了,2019年5月13日下午,我又回来了。像我之前所说,同时截出正负样本进行训练,然后用滑动窗口复制出许许多多的测试图放入网络中测试。

test.py生成训练时候的测试集和训练集

test.py

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
import os
import random

trainval_percent = 0.1
train_percent = 0.9
xmlfilepath = 'Annotations'
txtsavepath = 'ImageSets\Main'
total_xml = os.listdir(xmlfilepath)

num = len(total_xml)
list = range(num)
tv = int(num * trainval_percent)
tr = int(tv * train_percent)
trainval = random.sample(list, tv)
train = random.sample(trainval, tr)

ftrainval = open('ImageSets/Main/trainval.txt', 'w')
ftest = open('ImageSets/Main/test.txt', 'w')
ftrain = open('ImageSets/Main/train.txt', 'w')
fval = open('ImageSets/Main/val.txt', 'w')

for i in list:
name = total_xml[i][:-4] + '\n'
if i in trainval:
ftrainval.write(name)
if i in train:
ftest.write(name)
else:
fval.write(name)
else:
ftrain.write(name)

ftrainval.close()
ftrain.close()
fval.close()
ftest.close()
---------------------
作者:王氏小明
来源:CSDN
原文:https://blog.csdn.net/weixin_43472830/article/details/88320099
不要脸地直接拿来用了

一步步按教程来

说一下你可能遇到的问题,只有CPU的话直接加参考博客里os设置,我的是GPU,但是依然有问题。训练时候,第一个周期训练有时候是内存超了,然后黑屏或者闪退,这时候你可能需要减少你的训练集大小。改好以后,第一个周期依然很长时间,我第一次大概要20~40分钟,因为里面涉及到读取图片输入问题。然后后面每个周期就大概十几秒,每个step大概就几十ms。

我的权重文件

最后将我自己的权重文件放上来,因为训练时候,输入的样本就100张,然后训练了大概30个周期就收敛完了,loss降到20几,所以效果一般。另外,预测时候的score和IOU设置也不够好,所以最后效果,可以在前面看到,还是有很多问题的。但是总体来说比前面的svm和Inception好很多了。

我训练的权重文件在,密码是:b2p5

链接失效的话,可以在我blibli账号下留言哈哈。

我的yolo预测文件,使用时候根据需要改变score下限和IOU大小,然后使用detect_video函数或者detect_img函数就好,找出你满意的。步骤大概是:

1 终端python yolo.py或者在ide中打开运行

2 yolo=YOLO()

3 设置对应输入输出路径,调用detect_video函数

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
# -*- coding: utf-8 -*-
"""
Class definition of YOLO_v3 style detection model on image and video
"""

import colorsys
import os
from timeit import default_timer as timer

import numpy as np
from keras import backend as K
from keras.models import load_model
from keras.layers import Input
from PIL import Image, ImageFont, ImageDraw

from yolo3.model import yolo_eval, yolo_body, tiny_yolo_body
from yolo3.utils import letterbox_image
import os
from keras.utils import multi_gpu_model
import cv2

class YOLO(object):
_defaults = {
"model_path": 'logs/000/trained_weights_stage_1.h5',
"anchors_path": 'model_data/yolo_anchors.txt',
"classes_path": 'model_data/coco_classes.txt',
"score" : 0.15,
"iou" : 0.3,
"model_image_size" : (416, 416),
"gpu_num" : 1,
}

@classmethod
def get_defaults(cls, n):
if n in cls._defaults:
return cls._defaults[n]
else:
return "Unrecognized attribute name '" + n + "'"

def __init__(self, **kwargs):
self.__dict__.update(self._defaults) # set up default values
self.__dict__.update(kwargs) # and update with user overrides
self.class_names = self._get_class()
self.anchors = self._get_anchors()
self.sess = K.get_session()
self.boxes, self.scores, self.classes = self.generate()

def _get_class(self):
classes_path = os.path.expanduser(self.classes_path)
with open(classes_path) as f:
class_names = f.readlines()
class_names = [c.strip() for c in class_names]
return class_names

def _get_anchors(self):
anchors_path = os.path.expanduser(self.anchors_path)
with open(anchors_path) as f:
anchors = f.readline()
anchors = [float(x) for x in anchors.split(',')]
return np.array(anchors).reshape(-1, 2)

def generate(self):
model_path = os.path.expanduser(self.model_path)
assert model_path.endswith('.h5'), 'Keras model or weights must be a .h5 file.'

# Load model, or construct model and load weights.
num_anchors = len(self.anchors)
num_classes = len(self.class_names)
is_tiny_version = num_anchors==6 # default setting
try:
self.yolo_model = load_model(model_path, compile=False)
except:
self.yolo_model = tiny_yolo_body(Input(shape=(None,None,3)), num_anchors//2, num_classes) \
if is_tiny_version else yolo_body(Input(shape=(None,None,3)), num_anchors//3, num_classes)
self.yolo_model.load_weights(self.model_path) # make sure model, anchors and classes match
else:
assert self.yolo_model.layers[-1].output_shape[-1] == \
num_anchors/len(self.yolo_model.output) * (num_classes + 5), \
'Mismatch between model and given anchor and class sizes'

print('{} model, anchors, and classes loaded.'.format(model_path))

# Generate colors for drawing bounding boxes.
hsv_tuples = [(x / len(self.class_names), 1., 1.)
for x in range(len(self.class_names))]
self.colors = list(map(lambda x: colorsys.hsv_to_rgb(*x), hsv_tuples))
self.colors = list(
map(lambda x: (int(x[0] * 255), int(x[1] * 255), int(x[2] * 255)),
self.colors))
np.random.seed(10101) # Fixed seed for consistent colors across runs.
np.random.shuffle(self.colors) # Shuffle colors to decorrelate adjacent classes.
np.random.seed(None) # Reset seed to default.

# Generate output tensor targets for filtered bounding boxes.
self.input_image_shape = K.placeholder(shape=(2, ))
if self.gpu_num>=2:
self.yolo_model = multi_gpu_model(self.yolo_model, gpus=self.gpu_num)
boxes, scores, classes = yolo_eval(self.yolo_model.output, self.anchors,
len(self.class_names), self.input_image_shape,
score_threshold=self.score, iou_threshold=self.iou)
return boxes, scores, classes

def detect_image(self, image):
start = timer()

if self.model_image_size != (None, None):
assert self.model_image_size[0]%32 == 0, 'Multiples of 32 required'
assert self.model_image_size[1]%32 == 0, 'Multiples of 32 required'
boxed_image = letterbox_image(image, tuple(reversed(self.model_image_size)))
else:
new_image_size = (image.width - (image.width % 32),
image.height - (image.height % 32))
boxed_image = letterbox_image(image, new_image_size)
image_data = np.array(boxed_image, dtype='float32')

print(image_data.shape)
image_data /= 255.
image_data = np.expand_dims(image_data, 0) # Add batch dimension.

out_boxes, out_scores, out_classes = self.sess.run(
[self.boxes, self.scores, self.classes],
feed_dict={
self.yolo_model.input: image_data,
self.input_image_shape: [image.size[1], image.size[0]],
K.learning_phase(): 0
})

print('Found {} boxes for {}'.format(len(out_boxes), 'img'))

font = ImageFont.truetype(font='font/FiraMono-Medium.otf',
size=np.floor(3e-2 * image.size[1] + 0.5).astype('int32'))
thickness = (image.size[0] + image.size[1]) // 300

for i, c in reversed(list(enumerate(out_classes))):
predicted_class = self.class_names[c]
box = out_boxes[i]
score = out_scores[i]

label = '{} {:.2f}'.format(predicted_class, score)
draw = ImageDraw.Draw(image)
label_size = draw.textsize(label, font)

top, left, bottom, right = box
top = max(0, np.floor(top + 0.5).astype('int32'))
left = max(0, np.floor(left + 0.5).astype('int32'))
bottom = min(image.size[1], np.floor(bottom + 0.5).astype('int32'))
right = min(image.size[0], np.floor(right + 0.5).astype('int32'))
print(label, (left, top), (right, bottom))

if top - label_size[1] >= 0:
text_origin = np.array([left, top - label_size[1]])
else:
text_origin = np.array([left, top + 1])

# My kingdom for a good redistributable image drawing library.
for i in range(thickness):
draw.rectangle(
[left + i, top + i, right - i, bottom - i],
outline=self.colors[c])
draw.rectangle(
[tuple(text_origin), tuple(text_origin + label_size)],
fill=self.colors[c])
draw.text(text_origin, label, fill=(0, 0, 0), font=font)
del draw
draw = ImageDraw.Draw(image)
draw.text((3,15),"All:"+str(len(out_boxes)),(255,0,0))
del draw
end = timer()
print(end - start)
return image

def close_session(self):
self.sess.close()

def detect_video(yolo, video_path, output_path=""):
# import cv2
vid = cv2.VideoCapture(video_path)
if not vid.isOpened():
cv2.destroyAllWindows()
raise IOError("Couldn't open webcam or video")
video_FourCC = int(vid.get(cv2.CAP_PROP_FOURCC))
video_fps = vid.get(cv2.CAP_PROP_FPS)
video_size = (int(vid.get(cv2.CAP_PROP_FRAME_WIDTH)),
int(vid.get(cv2.CAP_PROP_FRAME_HEIGHT)))
isOutput = True if output_path != "" else False
if isOutput:
print("!!! TYPE:", type(output_path), type(video_FourCC), type(video_fps), type(video_size))
out = cv2.VideoWriter(output_path, video_FourCC, video_fps, video_size)
accum_time = 0
curr_fps = 0
fps = "FPS: ??"
prev_time = timer()
while True:
return_value, frame = vid.read()
image = Image.fromarray(frame)
image = yolo.detect_image(image)
result = np.asarray(image)
curr_time = timer()
exec_time = curr_time - prev_time
prev_time = curr_time
accum_time = accum_time + exec_time
curr_fps = curr_fps + 1
if accum_time > 1:
accum_time = accum_time - 1
fps = "FPS: " + str(curr_fps)
curr_fps = 0
# cv2.putText(result, text=fps, org=(3, 15), fontFace=cv2.FONT_HERSHEY_SIMPLEX,
# fontScale=0.50, color=(255, 0, 0), thickness=2)
cv2.namedWindow("result", cv2.WINDOW_NORMAL)
cv2.imshow("result", result)
if isOutput:
out.write(result)
if cv2.waitKey(1) & 0xFF == ord('q'):
break
yolo.close_session()
def detect_img(yolo):
while True:
img = input('Input image filename:')
try:
image = Image.open(img)
except:
print('Open Error! Try again!')
continue
else:
r_image = yolo.detect_image(image)
r_image.show()
yolo.close_session()