卷积神经网络原理

来源：微智科技网

一、CNN卷积神经网络原理简介

本文主要是详细地解读CNN的实现代码。如果你没学习过CNN，在此推荐周晓艺师兄的博文：，以及UFLDL上的、

CNN的最大特点就是稀疏连接（局部感受）和权值共享，如下面两图所示，左为稀疏连接，右为权值共享。稀疏连接和权值共享可以减少所要训练的参数，减少计算复杂度。至于CNN的结构，以经典的LeNet5来说明：

这个图真是无处不在，一谈CNN，必说LeNet5，这图来自于这篇论文：，论文很长，第7页那里开始讲LeNet5这个结构，建议看看那部分。

我这里简单说一下，LeNet5这张图从左到右，先是input，这是输入层，即输入的图片。input-layer到C1这部分就是一个卷积层（convolution运算），C1到S2是一个子采样层（pooling运算），关于卷积和子采样的具体过程可以参考下图：然后，S2到C3又是卷积，C3到S4又是子采样，可以发现，卷积和子采样都是成对出现的，卷积后面一般跟着子采样。S4到C5之间是全连接的，这就相当于一个MLP的隐含层了（如果你不清楚MLP，参考《》）。C5到F6同样是全连接，也是相当于一个MLP的隐含层。最后从F6到输出output，其实就是一个分类器，这一层就叫分类层。

ok，CNN的基本结构大概就是这样，由输入、卷积层、子采样层、全连接层、分类层、输出这些基本“构件”组成，一般根据具体的应用或者问题，去确定要多少卷积层和子采样层、采用什么分类器。当确定好了结构以后，如何求解层与层之间的连接参数？一般采用向前传播（FP）+向后传播（BP）的方法来训练。具体可参考上面给出的链接。

二、CNN卷积神经网络代码详细解读（基于python+theano）代码来自于深度学习教程：，这个代码实现的是一个简化了的LeNet5，具体如下：

• • • •

没有实现location-specific gain and bias parameters 用的是maxpooling，而不是average_pooling 分类器用的是softmax，LeNet5用的是rbf

LeNet5第二层并不是全连接的，本程序实现的是全连接

另外，代码里将卷积层和子采用层合在一起，定义为

“LeNetConvPoolLayer“（卷积采样层），这好理解，因为它们总是成对出现。但是有个地方需要注意，代码中将卷积后的输出直接作为子采样层的输入，而没有加偏置b再通过sigmoid函数进行映射，即没有了下图中fx后面的bx以及sigmoid映射，也即直接由fx得到Cx。

最后，代码中第一个卷积层用的卷积核有20个，第二个卷积层用50个，而不是上面那张LeNet5图中所示的6个和16个。

了解了这些，下面看代码：（1）导入必要的模块

[python]

1. importcPickle 2. importgzip 3. importos 4. importsys 5. importtime 6. importnumpy 7. importtheano

8. importtheano.tensorasT 9. fromimportdownsample 10. fromimportconv

（2）定义CNN的基本\"构件\"

CNN的基本构件包括卷积采样层、隐含层、分类器，如下

•

定义LeNetConvPoolLayer（卷积+采样层）

见代码注释：

[python]

1. \"\"\" 2.

3. 卷积+下采样合成一个层LeNetConvPoolLayer 4.

5. rng:随机数生成器，用于初始化W 6. 7. ?

8. filter_shape:(numberoffilters,numinputfeaturemaps,filterheight,filter

width)

9. image_shape:(batchsize,numinputfeaturemaps,imageheight,imagewidth) 10. poolsize:(#rows,#cols) 11. \"\"\" 12.

13. classLeNetConvPoolLayer(object):

14. def__init__(self,rng,input,filter_shape,image_shape,poolsize=(2, 15. 2)):

16. #assert?condition，condition为True，则继续往下执行，condition为False，

中断程序

17. #image_shape[1]和filter_shape[1]都是numinputfeaturemaps，它们必须是一样

的。

18. assertimage_shape[1]==filter_shape[1] 19. self.input=input

20. #每个隐层神经元（即像素）与上一层的连接数为

numinputfeaturemaps*filterheight*filterwidth。 21. #可以用numpy.prod(filter_shape[1:])来求得 22.

23. fan_in=numpy.prod(filter_shape[1:]) 24. #lowerlayer上每个神经元获得的梯度来自于：

\"numoutputfeaturemaps*filterheight*filterwidth\"/poolingsize 25. fan_out=(filter_shape[0]*numpy.prod(filter_shape[ 26. 2:])/

27. numpy.prod(poolsize))

28. #以上求得fan_in、fan_out，将它们代入公式，以此来随机初始化W,W就是线性卷积

核

29. W_bound=numpy.sqrt(6./(fan_in+fan_out)) 30. self.W=theano.shared( 31. numpy.asarray(

32. rng.uniform(low=-W_bound,high=W_bound,size=filter_shape), 33. 34. ),

35. borrow=True 36. ) 37.

38. #thebiasisa1Dtensor--onebiasperoutputfeaturemap 39. #偏置b是一维向量，每个输出图的特征图都对应一个偏置， 40.

41. #而输出的特征图的个数由filter个数决定，因此用filter_shape[0]即

numberoffilters来初始化

42. b_values=numpy.zeros((filter_shape[0

43. self.b=theano.shared(value=b_values,borrow=True) 44. #将输入图像与filter卷积，conv.conv2d函数 45.

46. #卷积完没有加b再通过sigmoid，这里是一处简化。 47.

48. conv_out=conv.conv2d( 49. input=input, 50. filters=self.W,

51. filter_shape=filter_shape, 52. image_shape=image_shape 53. ) 54.

55. #maxpooling，最大子采样过程 56.

57. pooled_out=downsample.max_pool_2d( 58. input=conv_out, 59. ds=poolsize, 60. ignore_border=True 61. ) 62.

63. #加偏置，再通过tanh映射，得到卷积+子采样层的最终输出 .

65. #因为b是一维向量，这里用维度转换函数dimshuffle将其reshape。比如b是(10,)， 66.

67. #则b.dimshuffle('x',0,'x','x'))将其reshape为(1,10,1,1) 68. self.output=T.tanh(pooled_out+self.b.dimshuffle('x', 69. 0, 70. 'x', 71. 'x'))

72. #卷积+采样层的参数 73.

74. self.params=[self.W, 75. self.b]

•

定义隐含层HiddenLayer

这个跟上一篇文章《》中的HiddenLayer是一致的，直接拿过来：

[python]

1. \"\"\" 2. 3. 注释： 4.

5. 这是定义隐藏层的类，首先明确：隐藏层的输入即input，输出即隐藏层的神经元个数。

输入层与隐藏层是全连接的。 6.

7. 假设输入是n_in维的向量（也可以说时n_in个神经元），隐藏层有n_out个神经元，

则因为是全连接， 8.

9. 一共有n_in*n_out个权重，故W大小时(n_in,n_out),n_in行n_out列，每一列对

应隐藏层的每一个神经元的连接权重。 10.

11. b是偏置，隐藏层有n_out个神经元，故b时n_out维向量。 12. 13. ?

14. input训练模型所用到的所有输入，并不是MLP的输入层，MLP的输入层的神经元个数

时n_in，而这里的参数input大小是（n_example,n_in）,每一行一个样本，即每一行作为MLP的输入层。 15.

16. activation:激活函数,这里定义为函数tanh 17. 18. \"\"\" 19.

20. classHiddenLayer(object):

21. def__init__(self,rng,input,n_in,n_out,W=None, 22. b=None,

23. activation=T.tanh):

24. self.input=input#类HiddenLayer的input即所传递进来的input 25. \"\"\" 26. 注释： 27. ?

28. 另外，W的初始化有个规则：如果使用tanh函数，则在-sqrt(6./(n_in+n_hidden))

到sqrt(6./(n_in+n_hidden))之间均匀

29. 抽取数值来初始化W，若时sigmoid函数，则以上再乘4倍。 30. \"\"\"

31. #如果W未初始化，则根据上述方法初始化。

32. #加入这个判断的原因是：有时候我们可以用训练好的参数来初始化W，见我的上一篇文

章。 33. ifWisNone:

34. W_values=numpy.asarray( 35. rng.uniform(

36. low=-numpy.sqrt(6./(n_in+n_out)), 37. high=numpy.sqrt(6./(n_in+n_out)), 38. size=(n_in,n_out) 39. ), 40. 41. ) 42. 43. if

44. W_values*=4

45. W=theano.shared(value=W_values,name='W', 46. borrow=True) 47. ifbisNone: 48.

49. b=theano.shared(value=b_values,name='b', 50. borrow=True)

51. #用上面定义的W、b来初始化类HiddenLayer的W、b 52. self.W=W 53. self.b=b 54. #隐含层的输出

55. lin_output=T.dot(input,self.W)+self.b? 56.

57. self.output=(

58. lin_outputifactivationisNone 59. elseactivation(lin_output) 60. ) 61.

62. #隐含层的参数

63. self.params=[self.W, . self.b]

•

定义分类器（Softmax回归）

采用Softmax，这跟《》中的LogisticRegression是一样的，直接拿过来：

[python]

1. \"\"\" 2.

3. 定义分类层LogisticRegression，也即Softmax回归 4.

5. 在deeplearning?tutorial中，直接将LogisticRegression视为Softmax， 6. 而我们所认识的二类别的逻辑回归就是当n_out=2时的LogisticRegression 7. 8. \"\"\" 9.

10. #参数说明： 11.

12. #input，大小就是(n_example,n_in)，其中n_example是一个batch的大小， 13.

14. #因为我们训练时用的是Minibatch?SGD，因此input这样定义 15. #n_in,即上一层(隐含层)的输出 16.

17. #n_out,输出的类别数?

18. classLogisticRegression(object): 19. def__init__(self,input,n_in,n_out):

20. #W大小是n_in行n_out列，b为n_out维向量。即：每个输出对应W的一列以及b的

一个元素。

21. self.W=theano.shared( 22. value=numpy.zeros( 23. (n_in,n_out), 24. 25. ), 26. name='W', 27. borrow=True 28. ) 29.

30. self.b=theano.shared( 31. value=numpy.zeros( 32. (n_out,), 33. 34. ), 35. name='b', 36. borrow=True 37. ) 38.

39. #input是(n_example,n_in)，W是（n_in,n_out）,点乘得到(n_example,n_out)，

加上偏置b， 40. 41.

42. #故p_y_given_x每一行代表每一个样本被估计为各类别的概率?

43. #PS：b是n_out维向量，与(n_example,n_out)矩阵相加，内部其实是先复制

n_example个b， 44.

45. #然后(n_example,n_out)矩阵的每一行都加b 46.

47. selfself.W)+self.b)

48. #argmax返回最大值下标，因为本例数据集是MNIST，下标刚好就是类别。axis=1表示

按行操作。

49.

50. self.y_pred=T.argmax(self.p_y_given_x, 51. axis=1)

52. #params，LogisticRegression的参数? 53. self.params=[self.W, 54. self.b]

到这里，CNN的基本”构件“都有了，下面要用这些”构件“组装成LeNet5（当然，是简化的，上面已经说了），具体来说，就是组装成：

LeNet5=input+LeNetConvPoolLayer_1+LeNetConvPoolLayer_2+HiddenLayer+LogisticRegression+output。然后将其应用于MNIST数据集，用BP算法去解这个模型，得到最优的参数。

（3）加载MNIST数据集（）

[python]

1. \"\"\" 2.

3. 加载MNIST数据集load_data() 4. 5. \"\"\" 6.

7. defload_data(dataset): 8. #

9. dataset是数据集的路径，程序首先检测该路径下有没有MNIST数据集，没有的话就下

载MNIST数据集

10. #这一部分就不解释了，与softmax回归算法无关。 11.

12. ifdata_dir==\"\"andnot

13. #Checkifdatasetisinthedatadirectory. 14. 15. 0], 16. 17. \"..\",

18. \"data\", 19. dataset 20. ) 21.

22. ifordata_file==: 23. dataset=new_path 24. if(notanddata_file==: 25. importurllib 26. origin=( 27. 28. ) 29.

30. print'Downloadingdatafrom%s'%origin 31. urllib.urlretrieve(origin,dataset) 32. print'...loadingdata' 33. 34.

35. #主要用到python里的gzip.open()函数,以及?cPickle.load()。 36. #‘rb’表示以二进制可读的方式打开文件 37.

38. f=gzip.open(dataset,'rb')

39. train_set,valid_set,test_set=cPickle.load(f) 40. f.close()

41. #将数据设置成sharedvariables，主要时为了GPU加速，只有sharedvariables才

能存到GPUmemory中

42. #GPU里数据类型只能是float。而data_y是类别，所以最后又转换为int返回 43.

44. defshared_dataset(data_xy,borrow=True): 45. data_x,data_y=data_xy

46. shared_x=theano.shared(numpy.asarray(data_x, 47.

48. borrow=borrow)

49. shared_y=theano.shared(numpy.asarray(data_y, 50.

51. borrow=borrow)

52. returnshared_x,T.cast(shared_y,'int32') 53. test_set_x,test_set_y=shared_dataset(test_set) 54. valid_set_x,valid_set_y=shared_dataset(valid_set) 55. train_set_x,train_set_y=shared_dataset(train_set)

56. rval=[(train_set_x,train_set_y),(valid_set_x,valid_set_y), 57. (test_set_x,test_set_y)] 58. returnrval

（4）实现LeNet5并测试

[python]

1. \"\"\" 2.

3. 实现LeNet5 4.

5. LeNet5有两个卷积层，第一个卷积层有20个卷积核，第二个卷积层有50个卷积核 6. 7. \"\"\" 8.

9. defevaluate_lenet5(learning_rate=0.1, 10. n_epochs=200, 11. dataset=, 12. nkerns=[20, 13. 50],

14. batch_size=500): 15. \"\"\" 16.

17. learning_rate:学习速率，随机梯度前的系数。

18. n_epochs训练步数，每一步都会遍历所有batch，即所有样本

19. batch_size,这里设置为500，即每遍历完500个样本，才计算梯度并更新参数 20. nkerns=[20,50],每一个LeNetConvPoolLayer卷积核的个数，第一个

LeNetConvPoolLayer有 21. 20个卷积核，第二个有50个 22. \"\"\" 23. 23455) 24.

25. #加载数据

26. datasets=load_data(dataset)

27. train_set_x,train_set_y=datasets[0] 28. valid_set_x,valid_set_y=datasets[1] 29. test_set_x,test_set_y=datasets[2] 30. #

31. 计算batch的个数

32. n_train_batches=train_set_x.get_value(borrow=True).shape[0]

33. n_valid_batches=valid_set_x.get_value(borrow=True).shape[0] 34. n_test_batches=test_set_x.get_value(borrow=True).shape[0] 35. n_train_batches/=batch_size 36. n_valid_batches/=batch_size 37. n_test_batches/=batch_size

38. #定义几个变量，index表示batch下标，x表示输入的训练数据，y对应其标签 39. index=T.lscalar() 40. x=T.matrix('x') 41. y=T.ivector('y') 42. ###################### 43. #BUILDACTUALMODEL# 44. ###################### 45. print'...buildingthemodel'

46. #我们加载进来的batch大小的数据是(batch_size,28*28)，但是

LeNetConvPoolLayer的输入是四维的，所以要reshape 47. layer0_input=x.reshape((batch_size,1, 48. 28, 49. 28))

50. #layer0即第一个LeNetConvPoolLayer层

51. #输入的单张图片(28,28)，经过conv得到(28-5+1,28-5+1)=(24,24)， 52. #经过maxpooling得到(24/2,24/2)=(12,12)

53. #因为每个batch有batch_size张图，第一个LeNetConvPoolLayer层有nkerns[0]

个卷积核， 54.

55. #故layer0输出为(batch_size,nkerns[0],12,12) 56. layer0=LeNetConvPoolLayer( 57. rng,

58. input=layer0_input, 59. image_shape=(batch_size,1, 60. 28, 61. 28),

62. filter_shape=(nkerns[0], 63. 1, . 5, 65. 5),

66. poolsize=(2, 67. 2) 68. ) 69.

70. #layer1即第二个LeNetConvPoolLayer层 71.

72. #输入是layer0的输出，每张特征图为(12,12),经过conv得到

(12-5+1,12-5+1)=(8,8),

73. #经过maxpooling得到(8/2,8/2)=(4,4)

74. #因为每个batch有batch_size张图（特征图），第二个LeNetConvPoolLayer层有

nkerns[1]个卷积核 75.

76. #，故layer1输出为(batch_size,nkerns[1],4,4) 77. layer1=LeNetConvPoolLayer( 78. rng,

79. input=layer0.output,

80. image_shape=(batch_size,nkerns[0], 81. 12,

82. 12),#输入nkerns[0]张特征图，即layer0输出nkerns[0]张特征图 83. filter_shape=(nkerns[1], 84. nkerns[0], 85. 5, 86. 5),

87. poolsize=(2, 88. 2) . ) 90.

91. #前面定义好了两个LeNetConvPoolLayer（layer0和layer1），layer1后面接layer2，

这是一个全连接层，相当于MLP里面的隐含层 92.

93. #故可以用MLP中定义的HiddenLayer来初始化layer2，layer2的输入是二维的

(batch_size,num_pixels)，

94. #故要将上层中同一张图经不同卷积核卷积出来的特征图合并为一维向量，

95.

96. #也就是将layer1的输出(batch_size,nkerns[1],4,4)flatten为

(batch_size,nkerns[1]*4*4)=(500，800),作为layer2的输入。 97. #(500，800)表示有500个样本，每一行代表一个样本。layer2的输出大小是

(batch_size,n_out)=(500,500) 98. 99. 2) 100.

101. layer2=HiddenLayer( 102. rng,

103. input=layer2_input, 104. n_in=nkerns[1]*4*4, 105. n_out=500, 106. activation=T.tanh 107. ) 108.

109. #最后一层layer3是分类层，用的是逻辑回归中定义的LogisticRegression， 110.

111. #layer3的输入是layer2的输出(500,500)，layer3的输出就是

(batch_size,n_out)=(500,10) 112.

113. layer3=LogisticRegression(input=layer2.output,n_in=500, 114. n_out=10) 115. #代价函数NLL 116.

117. cost=layer3.negative_log_likelihood(y)

118. #test_model计算测试误差，x、y根据给定的index具体化，然后调用layer3， 119. #layer3又会逐层地调用layer2、layer1、layer0，故test_model其实就是整个

CNN结构， 120.

121. #test_model的输入是x、y，输出是layer3.errors(y)的输出，即误差。 122.

123. test_model=theano.function( 124. [index],

125. layer3.errors(y), 126. givens={

127. x:test_set_x[index*batch_size:(index+1)*batch_size], 128. y:test_set_y[index*batch_size:(index+1)*batch_size] 129. } 130. 131. ) 132.

133. #validate_model，验证模型，分析同上。 134.

135. validate_model=theano.function( 136. [index],

137. layer3.errors(y), 138. givens={

139. x:valid_set_x[index*batch_size:(index+1)*batch_size], 140. y:valid_set_y[index*batch_size:(index+1)*batch_size] 141. } 142. 143. ) 144.

145. #下面是train_model，涉及到优化算法即SGD，需要计算梯度、更新参数 146. 147. #参数集

148. params=layer3.params+layer2.params+layer1.params+layer0.params 149. #对各个参数的梯度

150. grads=T.grad(cost,params)

151. #因为参数太多，在updates规则里面一个一个具体地写出来是很麻烦的，所以下面

用了一个for..in..,自动生成规则对(param_i,param_i-learning_rate*grad_i) 152. updates=[

153. (param_i,param_i-learning_rate*grad_i) 154. forparam_i,grad_iinzip(params,grads) 155. ] 156.

157. #train_model，代码分析同test_model。train_model里比test_model、

validation_model多出updates规则 158.

159. train_model=theano.function( 160. [index], 161. cost,

162. updates=updates, 163. givens={

1. x:train_set_x[index*batch_size:(index+1)*batch_size], 165. y:train_set_y[index*batch_size:(index+1)*batch_size] 166. } 167. 168. ) 169.

170. ############### 171. #开始训练# 172. ############### 173. print'... 174. training' 175. patience=10000 176. patience_increase=2

177. improvement_threshold=0.995

178. validation_frequency=min(n_train_batches,patience/2)

179. #这样设置validation_frequency可以保证每一次epoch都会在验证集上测试。 180. best_validation_loss=numpy.inf#最好的验证集上的loss，最好即最小 181. best_iter=0

182. #最好的迭代次数，以batch为单位。比如best_iter=10000，说明在训练完第10000

个batch时，达到best_validation_loss 183. test_score=0.

184. start_time=time.clock() 185. epoch=0

186. done_looping=False

187. #下面就是训练过程了，while循环控制的时步数epoch，一个epoch会遍历所有的

batch，即所有的图片。 188.

1. #for循环是遍历一个个batch，一次一个batch地训练。for循环体里会用

train_model(minibatch_index)去训练模型， 190.

191. #train_model里面的updatas会更新各个参数。 192.

193. #for循环里面会累加训练过的batch数iter，当iter是validation_frequency

倍数时则会在验证集上测试， 194.

195. #如果验证集的损失this_validation_loss小于之前最佳的损失

best_validation_loss， 196.

197. #则更新best_validation_loss和best_iter，同时在testset上测试。 198.

199. #如果验证集的损失this_validation_loss小于

best_validation_loss*improvement_threshold时则更新patience。 200.

201. #当达到最大步数n_epoch时，或者patience203. while(epoch205. forminibatch_indexinxrange(n_train_batches): 206. iter=(epoch-1)*n_train_batches+minibatch_index 207. ifiter%100==0:

208. print'training@iter=',iter

209. cost_ij=train_model(minibatch_index)

210. #cost_ij?没什么用，后面都没有用到,只是为了调用train_model，而train_model

有返回值

211. if(iter+1)%validation_frequency==0: 212. #computezero-onelossonvalidationset 213. validation_losses=[validate_model(i)fori 214. inxrange(n_valid_batches)]

215. this_validation_loss=numpy.mean(validation_losses) 216. print('epoch%i,minibatch%i/%i,validationerror%f%%'% 217. (epoch,minibatch_index+1,n_train_batches, 218. this_validation_loss*100.))

219. ifthis_validation_loss220. ifthis_validation_loss222. patience=max(patience,iter*patience_increase) 223. best_validation_loss=this_validation_loss 224. best_iter=iter 225. test_losses=[ 226. test_model(i)

227. foriinxrange(n_test_batches) 228. ] 229.

230. test_score=numpy.mean(test_losses)

231. print(('epoch%i,minibatch%i/%i,testerrorof' 232. 'bestmodel%f%%')%

233. (epoch,minibatch_index+1,n_train_batches, 234. test_score*100.)) 235. ifpatience<=iter: 236. done_looping=True 237. break

238. end_time=time.clock()

239. print('Optimization?complete.') 240.

241. print('Bestvalidationscoreof%f%%obtainedatiteration%i,' 242. 'withtestperformance%f%%'%

243. (best_validation_loss*100.,best_iter+1,test_score*100.)) 244. print>>sys.stderr,('Thecodeforfile'+ 245. 1]+

246. 'ranfor%.2fm'%((end_time-start_time)/ 247. 60.))

Convolutional Neural Networks (LeNet)

The Convolution Operator

ConvOp is the main workhorse for implementing a convolutional layer in Theano. ConvOp is used by, which takes two symbolic inputs:

•

a 4D tensor corresponding to a mini-batch of input images. The shape of the tensor is as follows: [mini-batch size, number of input feature maps, image height, image width].

•

a 4D tensor corresponding to the weight matrix. The shape of the

tensor is: [number of feature maps at layer m, number of feature maps at layer m-1, filter height, filter width]

Below is the Theano code for implementing a convolutional layer similar to the one of Figure 1. The input consists of 3 features maps (an RGB color image) of size 120x160. We use two convolutional filters with 9x9 receptive fields.

import theano from theano import tensor as T from import conv2d import numpy rng = numpy.random.RandomState(23455) # instantiate 4D tensor for input input = T.tensor4(name='input') # initialize shared variable for weights. w_shp = (2, 3, 9, 9) w_bound = numpy.sqrt(3 * 9 * 9) W = theano.shared( numpy.asarray( rng.uniform( low=-1.0 / w_bound, high=1.0 / w_bound, size=w_shp), dtype=input.dtype), name ='W') # initialize shared variable for bias (1D tensor) with random values # IMPORTANT: biases are usually initialized to zero. However in this # particular application, we simply apply the convolutional layer to # an image without learning the parameters. We therefore initialize # them to random values to \"simulate\" learning. b_shp = (2,) b = theano.shared(numpy.asarray( rng.uniform(low=-.5, high=.5, size=b_shp), dtype=input.dtype), name ='b') # build symbolic expression that computes the convolution of input with filters in w conv_out = conv2d(input, W) # build symbolic expression to add bias and apply activation function, i.e. produce neural net layer output # A few words on ``dimshuffle`` : # ``dimshuffle`` is a powerful tool in reshaping a tensor; # what it allows you to do is to shuffle dimension around # but also to insert new ones along which the tensor will be # broadcastable; # dimshuffle('x', 2, 'x', 0, 1) # This will work on 3d tensors with no broadcastable # dimensions. The first dimension will be broadcastable, # then we will have the third dimension of the input tensor as # the second of the resulting tensor, etc. If the tensor has # shape (20, 30, 40), the resulting tensor will have dimensions # (1, 40, 1, 20, 30). (AxBxC tensor is mapped to 1xCx1xAxB tensor) # More examples: # dimshuffle('x') -> make a 0d (scalar) into a 1d vector # dimshuffle(0, 1) -> identity # dimshuffle(1, 0) -> inverts the first and second dimensions # dimshuffle('x', 0) -> make a row out of a 1d vector (N to 1xN) # dimshuffle(0, 'x') -> make a column out of a 1d vector (N to Nx1) # dimshuffle(2, 0, 1) -> AxBxC to CxAxB # dimshuffle(0, 'x', 1) -> AxB to Ax1xB # dimshuffle(1, 'x', 0) -> AxB to Bx1xA output = T.nnet.sigmoid(conv_out + b.dimshuffle('x', 0, 'x', 'x')) # create theano function to compute filtered images f = theano.function([input], output) MaxPooling

from import pool input = T.dtensor4('input') maxpool_shape = (2, 2) pool_out = pool.pool_2d(input, maxpool_shape, ignore_border=True) f = theano.function([input],pool_out) invals = numpy.random.RandomState(1).rand(3, 2, 5, 5) print 'With ignore_border set to True:' print 'invals[0, 0, :, :] =\\n', invals[0, 0, :, :] print 'output[0, 0, :, :] =\\n', f(invals)[0, 0, :, :] pool_out = pool.pool_2d(input, maxpool_shape, ignore_border=False) f = theano.function([input],pool_out) print 'With ignore_border set to False:' print 'invals[1, 0, :, :] =\\n ', invals[1, 0, :, :] print 'output[1, 0, :, :] =\\n ', f(invals)[1, 0, :, :] The Full Model: LeNet

请注意，术语“卷积”可以对应于不同的数算：在原来的LeNet模型的卷积：在这项工作中，每个输出特征映射只能连接到输入特征映射的一个子集。

在这里，我们使用的第一个操作，所以这个模型略有不同，从原来的LeNet研究。使用2的原因之一。将减少所需的计算量，但现代硬件使其具有完全连接模式的快速性。另一个原因是稍微减少自由参数的数量，但是我们还有其他的正则化技术。

class LeNetConvPoolLayer(object): \"\"\"Pool Layer of a convolutional network \"\"\" def __init__(self, rng, input, filter_shape, image_shape, poolsize=(2, 2)): \"\"\" Allocate a LeNetConvPoolLayer with shared variable internal parameters. :param rng: a random number generator used to initialize weights :param input: symbolic image tensor, of shape image_shape :type filter_shape: tuple or list of length 4 :param filter_shape: (number of filters, num input feature maps, filter height, filter width) :type image_shape: tuple or list of length 4 :param image_shape: (batch size, num input feature maps, image height, image width) :type poolsize: tuple or list of length 2 :param poolsize: the downsampling (pooling) factor (#rows, #cols) \"\"\" assert image_shape[1] == filter_shape[1] self.input = input # there are \"num input feature maps * filter height * filter width\" # inputs to each hidden unit fan_in = numpy.prod(filter_shape[1:]) # each unit in the lower layer receives a gradient from: # \"num output feature maps * filter height * filter width\" / # pooling size fan_out = (filter_shape[0] * numpy.prod(filter_shape[2:]) // numpy.prod(poolsize)) # initialize weights with random weights W_bound = numpy.sqrt(6. / (fan_in + fan_out)) self.W = theano.shared( numpy.asarray( rng.uniform(low=-W_bound, high=W_bound, size=filter_shape), dtype=theano.config.floatX ), borrow=True ) # the bias is a 1D tensor -- one bias per output feature map b_values = numpy.zeros((filter_shape[0],), dtype=theano.config.floatX) self.b = theano.shared(value=b_values, borrow=True) # convolve input feature maps with filters conv_out = conv2d( input=input, filters=self.W, filter_shape=filter_shape, input_shape=image_shape ) # pool each feature map individually, using maxpooling pooled_out = pool.pool_2d( input=conv_out, ds=poolsize, ignore_border=True ) # add the bias term. Since the bias is a vector (1D array), we first # reshape it to a tensor of shape (1, n_filters, 1, 1). Each bias will # thus be broadcasted across mini-batches and feature map # width & height self.output = T.tanh(pooled_out + self.b.dimshuffle('x', 0, 'x', 'x')) # store parameters of this layer self.params = [self.W, self.b] # keep track of model input self.input = input 注意，在初始化权重值时，扇入取决于接收字段的大小和输入特征映射的数量。

最后，利用Logistic回归和多层感知器的分类MNIST数字隐含定义类定义的回归类，我们可以实例化网络如下。

x = T.matrix('x') # the data is presented as rasterized images y = T.ivector('y') # the labels are presented as 1D vector of # [int] labels ###################### # BUILD ACTUAL MODEL # ###################### print('... building the model') # Reshape matrix of rasterized images of shape (batch_size, 28 * 28) # to a 4D tensor, compatible with our LeNetConvPoolLayer # (28, 28) is the size of MNIST images. layer0_input = x.reshape((batch_size, 1, 28, 28)) # Construct the first convolutional pooling layer: # filtering reduces the image size to (28-5+1 , 28-5+1) = (24, 24) # maxpooling reduces this further to (24/2, 24/2) = (12, 12) # 4D output tensor is thus of shape (batch_size, nkerns[0], 12, 12) layer0 = LeNetConvPoolLayer( rng, input=layer0_input, image_shape=(batch_size, 1, 28, 28), filter_shape=(nkerns[0], 1, 5, 5), poolsize=(2, 2) ) # Construct the second convolutional pooling layer # filtering reduces the image size to (12-5+1, 12-5+1) = (8, 8) # maxpooling reduces this further to (8/2, 8/2) = (4, 4) # 4D output tensor is thus of shape (batch_size, nkerns[1], 4, 4) layer1 = LeNetConvPoolLayer( rng, input=layer0.output, image_shape=(batch_size, nkerns[0], 12, 12), filter_shape=(nkerns[1], nkerns[0], 5, 5), poolsize=(2, 2) ) # the HiddenLayer being fully-connected, it operates on 2D matrices of # shape (batch_size, num_pixels) (i.e matrix of rasterized images). # This will generate a matrix of shape (batch_size, nkerns[1] * 4 * 4), # or (500, 50 * 4 * 4) = (500, 800) with the default values. layer2_input = layer1.output.flatten(2) # construct a fully-connected sigmoidal layer layer2 = HiddenLayer( rng, input=layer2_input, n_in=nkerns[1] * 4 * 4, n_out=500, activation=T.tanh ) # classify the values of the fully-connected sigmoidal layer layer3 = LogisticRegression(input=layer2.output, n_in=500, n_out=10) # the cost we minimize during training is the NLL of the model cost = layer3.negative_log_likelihood(y) # create a function to compute the mistakes that are made by the model test_model = theano.function( [index], layer3.errors(y), givens={ x: test_set_x[index * batch_size: (index + 1) * batch_size], y: test_set_y[index * batch_size: (index + 1) * batch_size] } ) validate_model = theano.function( [index], layer3.errors(y), givens={ x: valid_set_x[index * batch_size: (index + 1) * batch_size], y: valid_set_y[index * batch_size: (index + 1) * batch_size] } ) # create a list of all model parameters to be fit by gradient descent params = layer3.params + layer2.params + layer1.params + layer0.params # create a list of gradients for all model parameters grads = T.grad(cost, params) # train_model is a function that updates the model parameters by # SGD Since this model has many parameters, it would be tedious to # manually create an update rule for each model parameter. We thus # create the updates list by automatically looping over all # (params[i], grads[i]) pairs. updates = [ (param_i, param_i - learning_rate * grad_i) for param_i, grad_i in zip(params, grads) ] train_model = theano.function( [index], cost, updates=updates, givens={ x: train_set_x[index * batch_size: (index + 1) * batch_size], y: train_set_y[index * batch_size: (index + 1) * batch_size] } ) Tips and Tricks

因篇幅问题不能全部显示，请点此查看更多更全内容

查看全文