试用SVM+HOG+SlideWindows做客流检测(成功)

项目背景

  • 刚刚用谷歌的Inception V3尝试检测行人,效果一般,想看的点
  • 不死心,还想用传统方法挣扎下,因为传统方法计算很快,就尝试用HOG+SVM训练自己的样本集,再进行测试,主要参考这篇博客,但是不得不说,该博客用的python2,跑在Python3里面问题相当的多,代码也不知道从哪搬运过来的,被中英混合着注释各种改,但是初始代码想必是官方给的example
  • 如果想直接看yolo检测,可以左转出门,下方有,后面更
![cover](https://github.com/AllentDan/PedestrianDetection/raw/master/images/threeePeople.jpg)

还原参考博客内容

下载修改

不多说,weget指令就是好,linux就是牛逼。但是Windows也不差,直接选中weget的目标url右键转到(不确定其他浏览器,谷歌chrome安装迅雷插件)就可以跳出迅雷下载了,完了用IDE打开,一堆错误,毕竟是篇远古博客了,而且涂改也是相当严重。不过大部分就是print问题,花点功夫的事情

修改统一图片尺寸

还是上篇博客弄好的样本集,只不过这次先统一尺寸,因为OpenCV的SVM需要读入图片是64*128,虽然最后训练也都是一维的1024向量。没啥多说的,直接po代码

1
2
3
4
5
6
7
8
9
10
11
12
13
14
import numpy as np
import cv2
from os.path import dirname, join, basename
from glob import glob

num=0
for fn in glob(join(dirname(__file__)+'/nega','*.jpg')): #获取位置的nega文件夹下所有的jpg图片,nega文件夹下面是我存放的所有负样本
print(fn)
img = cv2.imread(fn)
res=cv2.resize(img,(64,128),interpolation=cv2.INTER_AREA)#线性插值并统一尺寸
cv2.imwrite('E:\\Python\\SVM4Pedestrian\\nega_resized\\'+str(num)+'.jpg',res)#新建nega_resized文件夹,将所有处理后的图片写到该文件夹
num=num+1

print ('all done!')

train+predict.py

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
# -*- coding: utf-8 -*-
import numpy as np
import cv2
#from matplotlib import pyplot as plt
from os.path import dirname, join, basename
import sys
from glob import glob


bin_n = 16*16 # Number of bins

def hog(img):
x_pixel,y_pixel=194,259
gx = cv2.Sobel(img, cv2.CV_32F, 1, 0)
gy = cv2.Sobel(img, cv2.CV_32F, 0, 1)
mag, ang = cv2.cartToPolar(gx, gy)
bins = np.int32(bin_n*ang/(2*np.pi))
x_pixel_int=int(x_pixel/2)
y_pixel_int=int(y_pixel/2)
bin_cells = bins[:x_pixel_int,:y_pixel_int], bins[x_pixel_int:,:y_pixel_int], bins[:x_pixel_int,y_pixel_int:], bins[x_pixel_int:,y_pixel_int:]
mag_cells = mag[:x_pixel_int,:y_pixel_int], mag[x_pixel_int:,:y_pixel_int], mag[:x_pixel_int,y_pixel_int:], mag[x_pixel_int:,y_pixel_int:]
hists = [np.bincount(b.ravel(), m.ravel(), bin_n) for b, m in zip(bin_cells, mag_cells)]
hist = np.hstack(hists) # hist is a 64 bit vector
return hist

img={}
num=0
for fn in glob(join(dirname(__file__)+'/posi_resized', '*.jpg')):
img[num] = cv2.imread(fn,0)#参数加0,只读取黑白数据,去掉0,就是彩色读取。
# print img[num].shape
num=num+1
positive=num
for fn in glob(join(dirname(__file__)+'/nega_resized', '*.jpg')):
img[num] = cv2.imread(fn,0)#参数加0,只读取黑白数据,去掉0,就是彩色读取。
# print img[num].shape
num=num+1

trainpic=[]
for i in img:
# print type(i)
trainpic.append(img[i])

#hogdata = [map(hog,img[i]) for i in img]
hogdata=list(map(hog,trainpic))
trainData = np.float32(hogdata).reshape(-1,bin_n*4)
responses = np.int32(np.repeat(1.0,trainData.shape[0])[:,np.newaxis])
responses[positive:trainData.shape[0]]=-1.0
#print (type(trainData))


svm = cv2.ml.SVM_create() #创建SVM model
#属性设置
svm.setType(cv2.ml.SVM_C_SVC)
svm.setKernel(cv2.ml.SVM_LINEAR)
svm.setC(0.01)
#训练
result = svm.train(trainData,cv2.ml.ROW_SAMPLE,responses)
svm.save('svm_cat_data.dat')
######测试
test_temp=[]
for fn in glob(join(dirname(__file__)+'/predict_resized', '*.jpg')):
img=cv2.imread(fn,0)#参数加0,只读取黑白数据,去掉0,就是彩色读取。
test_temp.append(img)
testdata=list(map(hog,test_temp))
testData = np.float32(testdata).reshape(-1,bin_n*4)
re=svm.predict(testData)

报错修改

train.py跑通后,尝试测试下训练集里面的第一个,一般都会预测准确,不然SVM也不用混了。除去一些可以立马解决的小问题,主要问题其实就三个:

第一个是里面的map函数,会报错TypeError: float() argument must be a string or a number, not ‘map’,老版本Python可以没有问题,但是3以后的基本需要外面套个list(),参考这个大哥的方法

第二个就是集中在OpenCV更新后,SVM相关的函数都发生了变化,被移动到ml属性下面了,需要先调用ml再调用SVM,此外创建SVM和配置SVM都发生了变化;

第三个就是保存SVM模型后,调用模型文件预测会出问题,而且我没有解决,只好不保存,直接训练完后测试,最后直接和训练跑在一个py文件里面了。

SVM分类结果

非常心痛,测试了28个图片,错了三个,和神经网络跑出来准确率天差地别。可能原因其实很多,图片本身信息不够,区分度也不会好。SVM算法本身和神经网络准确率差别在那。不用做滑动窗口了,直接放弃,白瞎一下午功夫。

结论

可能先用CNN提取特征再用SVM分类识别会提高准确率,但是已经不想浪费精力了,直接yolo走起。

2019.5.20更新

windows下使用yolo检测视频还是不容易,听说用GPU跑个几百张图片都得几天时间,样本集制作也是麻烦,想了想换了个svm工具,不再使用OpenCV自带的svm分类器分类训练。因为它自带的只能分类,不能给出分类的置信度,另外分类效果也很差劲。改用sklearn工具包里的svm分类器,就可以得到近似的置信度,然后可以在非极大值抑制中使用。效果见封面,不过调参是个大工程(玄学)

介绍下sklearn的SVM

Scikit-learn is a free software machine learning library for the Python programming language. It features various classification, regression and clustering algorithms including support vector machines, …

上面是维基百科的一通blabla,反正就是Python的一个有用工具包。下面重点介绍下它的SVM,方法很简单:

1
2
3
4
5
6
7
8
9
10
from sklearn import svm

X = [[0, 0], [1, 1], [1, 0]] #样本
y = [0, 1, 1] # 标签
clf = svm.SVC() # 定义分类器
clf.fit(X, y) # 训练

result = clf.predict([[2, 2]]) # 预测
print(result) # 打印

调参时直接参考下面三个博客

sklearn.svm.SVC 参数说明

机器学习-训练模型的保存与恢复(sklearn)

sklearn中SVM调参说明及经验总结

sklearn_svm_train.py

直接训练部分代码:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
# -*- coding: utf-8 -*-
"""
Created on Mon May 20 15:07:38 2019

@author: Allent_Computer
"""

from sklearn import svm
from sklearn.externals import joblib
import numpy as np
import cv2
#from matplotlib import pyplot as plt
from os.path import dirname, join, basename
import shutil
from glob import glob

bin_n = 16*16 # Number of bins
(winW, winH) = (64, 64)

def hog(img):
x_pixel,y_pixel=352,288
gx = cv2.Sobel(img, cv2.CV_32F, 1, 0)
gy = cv2.Sobel(img, cv2.CV_32F, 0, 1)
mag, ang = cv2.cartToPolar(gx, gy)
bins = np.int32(bin_n*ang/(2*np.pi))
x_pixel_int=int(x_pixel/2)
y_pixel_int=int(y_pixel/2)
bin_cells = bins[:x_pixel_int,:y_pixel_int], bins[x_pixel_int:,:y_pixel_int], bins[:x_pixel_int,y_pixel_int:], bins[x_pixel_int:,y_pixel_int:]
mag_cells = mag[:x_pixel_int,:y_pixel_int], mag[x_pixel_int:,:y_pixel_int], mag[:x_pixel_int,y_pixel_int:], mag[x_pixel_int:,y_pixel_int:]
hists = [np.bincount(b.ravel(), m.ravel(), bin_n) for b, m in zip(bin_cells, mag_cells)]
hist = np.hstack(hists) # hist is a 64 bit vector
return hist

img={}
num=0
for fn in glob(join(dirname(__file__)+'/posi_resized', '*.jpg')):
img[num] = cv2.imread(fn,0)#参数加0,只读取黑白数据,去掉0,就是彩色读取。
num=num+1
positive=num
for fn in glob(join(dirname(__file__)+'/nega_resized', '*.jpg')):
img[num] = cv2.imread(fn,0)#参数加0,只读取黑白数据,去掉0,就是彩色读取。
num=num+1

trainpic=[]
for i in img:
trainpic.append(img[i])

hogdata=list(map(hog,trainpic))
trainData = np.float32(hogdata).reshape(-1,bin_n*4)
responses = np.int32(np.repeat(1.0,trainData.shape[0]))
responses[positive:trainData.shape[0]]=-1.0
trainData=list(trainData)

test_temp=[]
for fn in glob(join(dirname(__file__)+'/predict_resized', '*.jpg')):
img=cv2.imread(fn,0)#参数加0,只读取黑白数据,去掉0,就是彩色读取。
test_temp.append(img)
testdata=list(map(hog,test_temp))
testData = np.float32(testdata).reshape(-1,bin_n*4)

model = svm.SVC(C=1,gamma=1,kernel='linear',probability=True)
model.fit(trainData, responses) # training the svc model
joblib.dump(model, "train_model.m")

test_temp=[]
for fn in glob(join(dirname(__file__)+'/predict_resized', '*.jpg')):
img=cv2.imread(fn,0)#参数加0,只读取黑白数据,去掉0,就是彩色读取。
test_temp.append(img)
testdata=list(map(hog,test_temp))
testData = np.float32(testdata).reshape(-1,bin_n*4)
#model.predict(testData)

sklearn_svm_predict.py

直接测试图片部分代码:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
# -*- coding: utf-8 -*-
"""
Created on Mon May 20 10:06:37 2019

@author: Allent_Computer
"""


from sklearn import svm
from sklearn.model_selection import GridSearchCV
from sklearn.externals import joblib
import numpy as np
import cv2
#from matplotlib import pyplot as plt
from os.path import dirname, join, basename
import shutil
import sys
from glob import glob


bin_n = 16*16 # hog中用的

#通过指定的因子来调整图像的大小
def frame_resize(img, scaleFactor):
return cv2.resize(img, (int(img.shape[1] * (1 / scaleFactor)),
int(img.shape[0] * (1 / scaleFactor))), interpolation=cv2.INTER_AREA)

#建立图像金字塔,返回被调整过大小的图像直到宽度和高度都达到所规定的最小约束
def pyramid(image, scale=1.5, minSize=(100, 80)):
# 迭代获取图像
yield image

while True:
#print(image.shape)
image = frame_resize(image, scale)
if image.shape[0] < minSize[1] or image.shape[1] < minSize[0]:
#print(image.shape)
break
yield image

#非极大值抑制,消除重叠窗口
def py_nms(boxs,thresh_l=0, thresh_r=0.9, mode="Union"):
"""
greedily select boxes with high confidence
keep boxes overlap <= thresh
rule out overlap > thresh
:param dets: [[x1, y1, x2, y2 score]]
:param thresh: retain overlap <= thresh
:return: indexes to keep
"""
dets=np.array(boxs)
if len(dets) == 0:
return []
x1 = dets[:, 0]
y1 = dets[:, 1]
x2 = dets[:, 2]
y2 = dets[:, 3]
scores = dets[:, 4]
areas = (x2 - x1 + 1) * (y2 - y1 + 1)
order = scores.argsort()[::-1]
keep = []
while order.size > 0:
i = order[0]
keep.append(i)
xx1 = np.maximum(x1[i], x1[order[1:]])
yy1 = np.maximum(y1[i], y1[order[1:]])
xx2 = np.minimum(x2[i], x2[order[1:]])
yy2 = np.minimum(y2[i], y2[order[1:]])

w = np.maximum(0.0, xx2 - xx1 + 1)
h = np.maximum(0.0, yy2 - yy1 + 1)
inter = w * h
if mode == "Union":
ovr = inter / (areas[i] + areas[order[1:]] - inter)
elif mode == "Minimum":
ovr = inter / np.minimum(areas[i], areas[order[1:]])
inds = np.where(ovr<=thresh_r)[0]
# print((ovr<=thresh_r))
# print((ovr>=thresh_l))
# print((ovr<=thresh_r) & (ovr>=thresh_l))
order = order[inds + 1]
return dets[keep]

#提取图片的hog特征
def hog(img):
x_pixel,y_pixel=352,288
gx = cv2.Sobel(img, cv2.CV_32F, 1, 0)
gy = cv2.Sobel(img, cv2.CV_32F, 0, 1)
mag, ang = cv2.cartToPolar(gx, gy)
bins = np.int32(bin_n*ang/(2*np.pi))
x_pixel_int=int(x_pixel/2)
y_pixel_int=int(y_pixel/2)
bin_cells = bins[:x_pixel_int,:y_pixel_int], bins[x_pixel_int:,:y_pixel_int], bins[:x_pixel_int,y_pixel_int:], bins[x_pixel_int:,y_pixel_int:]
mag_cells = mag[:x_pixel_int,:y_pixel_int], mag[x_pixel_int:,:y_pixel_int], mag[:x_pixel_int,y_pixel_int:], mag[x_pixel_int:,y_pixel_int:]
hists = [np.bincount(b.ravel(), m.ravel(), bin_n) for b, m in zip(bin_cells, mag_cells)]
hist = np.hstack(hists) # hist is a 64 bit vector
return hist

#滑动窗口,产生图片用于预测
def sliding_window(image, stepSize, windowSize):
# slide a window across the image
for y in range(0, image.shape[0], stepSize):
for x in range(0, image.shape[1], stepSize):
# yield the current window
yield (x, y, image[y:y + windowSize[1], x:x + windowSize[0]])

#获取训练样本,输入到trainData
img={}
num=0
for fn in glob(join(dirname(__file__)+'/posi_resized', '*.jpg')):
img[num] = cv2.imread(fn,0)#参数加0,只读取黑白数据,去掉0,就是彩色读取。
num=num+1
positive=num
for fn in glob(join(dirname(__file__)+'/nega_resized', '*.jpg')):
img[num] = cv2.imread(fn,0)#参数加0,只读取黑白数据,去掉0,就是彩色读取。
num=num+1
trainpic=[]
for i in img:
trainpic.append(img[i])
hogdata=list(map(hog,trainpic))
trainData = np.float32(hogdata).reshape(-1,bin_n*4)
responses = np.int32(np.repeat(1.0,trainData.shape[0]))
responses[positive:trainData.shape[0]]=-1.0
trainData=list(trainData)

#获取测试样本,用于测试SVM分类效果
test_temp=[]
for fn in glob(join(dirname(__file__)+'/predict_resized', '*.jpg')):
img=cv2.imread(fn,0)#参数加0,只读取黑白数据,去掉0,就是彩色读取。
test_temp.append(img)
testdata=list(map(hog,test_temp))
testData = np.float32(testdata).reshape(-1,bin_n*4)

###该部分为训练过程代码,因为已经保存好m文件,所以注释掉了,直接load训练好的到model
#model = svm.SVC(C=0.01,gamma=1,kernel='linear',probability=True) # class
#, param_grid={"C":[0.001, 0.01, 0.1], "gamma": [1, 0.1, 0.01]}, cv=4
#model = GridSearchCV(svm.SVC(probability=True), param_grid={"C":[0.00001, 0.0001], "gamma": [0.5, 5]}, cv=4)

#model.fit(trainData, responses) # training the svc model
#print("The best parameters are %s with a score of %0.2f"
# % (model.best_params_, model.best_score_))
#joblib.dump(model, "train_model.m")
#result1 = model.predict(testData)
#result2=model.predict_proba(testData)
#print(result1)
#print(result2)

#load模型到model,并测试
model=joblib.load("E:\\Python\\SVM4Pedestrian\\train_model.m")
image=cv2.imread("E:\\Python\\detected_image.jpg")#彩色读入,用于显示
imggray=cv2.imread("E:\\Python\\detected_image.jpg",0)#灰色读入,用于预测
(winW, winH) = (64, 64)#滑动窗口大小
winWBox=int(winW)#存储高斯金字塔缩小的尺寸对应的box应该有多大
winHBox=int(winH)
boxs=[]#存储box,用于非极大抑制

for img in pyramid(imggray,1.5):#用滑动窗口+非极大抑制数人头
for (x, y, window) in sliding_window(img, stepSize=21, windowSize=(winW, winH)):
if window.shape[0] != winH or window.shape[1] != winW:
continue
clone = img.copy()
cv2.rectangle(clone, (x, y), (x + winW, y + winH), (0, 255, 0), 2)
cut_img=img[y:y + winH,x:x+winW]
cut_img=cv2.resize(cut_img,(64,128),interpolation=cv2.INTER_AREA)
data=hog(cut_img)[None,:]
data = np.float32(data).reshape(-1,bin_n*4)
prediction=model.predict_proba(data)[0]
if prediction[1] >0.98:
scaleBox=float(winWBox)/float(winW)
box=[x*scaleBox,y*scaleBox,x*scaleBox + winWBox,y*scaleBox + winHBox,prediction[1]]
boxs.append(box)
cv2.rectangle(clone, (x, y), (x + winW, y + winH), (0, 255, 0), 2)
cv2.imshow("Window", clone)
cv2.waitKey(1)
winWBox=winWBox*1.5
winHBox=winHBox*1.5

#将结果画图
result=py_nms(boxs,thresh_l=0.0,thresh_r=0.1,mode="Union")
saveNum=0
for i in range(len(result)):
x1=int(result[i][0])
y1=int(result[i][1])
x2=int(result[i][2])
y2=int(result[i][3])
save_img=image[y1:y2,x1:x2]
cv2.imwrite(str(saveNum)+".jpg",save_img)
saveNum+=1
cv2.rectangle(image, (x1,y1 ), (x2, y2), (0, 255, 0), 1)
cv2.putText(image,str(saveNum),(x1,y1),cv2.FONT_HERSHEY_SIMPLEX,0.6,(0,255,0),1)

cv2.putText(image,"All:"+str(saveNum),(0,15),cv2.FONT_HERSHEY_SIMPLEX,0.6,(0,255,0),1)

cv2.imshow("result:",image)
k = cv2.waitKey(0) # waitkey代表读取键盘的输入,括号里的数字代表等待多长时间,单位ms。 0代表一直等待
if k ==27: # 键盘上Esc键的键值
cv2.destroyAllWindows()
cv2.imwrite("E:\\Python\\SVM4Pedestrian\\result.jpg",image)

效果

多人检测时候的效果如下:

![多人检测](https://github.com/AllentDan/PedestrianDetection/raw/master/images/multiPeople.jpg)

三人检测时候效果如下:

![三人检测](https://github.com/AllentDan/PedestrianDetection/raw/master/images/threeePeople.jpg)

单人检测效果如下:

![cover](https://github.com/AllentDan/PedestrianDetection/raw/master/images/singlePerson.jpg)

可以看到还是有很多问题的,而且这只是找的几个效果还可以的参数下的。想要更加实用,就需要更多的时间调整参数,修改样本,更改步长这种。

耗时

用i7的CPU处理器,时间上是处理一个352*288的图片耗时零点几秒,这样即使是CPU也可以处理一个大概20000帧的视频,用时也就一两个小时

结论

好用,快捷,待改进