win7 iis 默认网站属性,wordpress自建站邮箱,更改网站主题,深圳东门网红打卡地摘要#xff1a; 真正掌握一种算法#xff0c;最实际的方法#xff0c;完全手写出来。 LSTM#xff08;Long Short Tem Memory#xff09;特殊递归神经网络#xff0c;神经元保存历史记忆#xff0c;解决自然语言处理统计方法只能考虑最近n个词语而忽略更久前词语的问题…摘要 真正掌握一种算法最实际的方法完全手写出来。 LSTMLong Short Tem Memory特殊递归神经网络神经元保存历史记忆解决自然语言处理统计方法只能考虑最近n个词语而忽略更久前词语的问题。
真正掌握一种算法最实际的方法完全手写出来。LSTMLong Short Tem Memory特殊递归神经网络神经元保存历史记忆解决自然语言处理统计方法只能考虑最近n个词语而忽略更久前词语的问题。用途word representationembedding(词语向量)、sequence to sequence learning输入句子预测句子、机器翻译、语音识别等。100多行原始python代码实现基于LSTM二进制加法器。https://iamtrask.github.io/2015/11/15/anyone-can-code-lstm/ 翻译http://blog.csdn.net/zzukun/article/details/49968129 import copy, numpy as np
np.random.seed(0)最开始引入numpy库矩阵操作。def sigmoid(x):output 1/(1np.exp(-x))return output声明sigmoid激活函数神经网络基础内容常用激活函数sigmoid、tan、relu等sigmoid取值范围[0, 1]tan取值范围[-1,1]x是向量返回output是向量。def sigmoid_output_to_derivative(output):return output*(1-output)声明sigmoid求导函数。加法器思路二进制加法是二进制位相加记录满二进一进位训练时随机cab样本输入a、b输出c是整个lstm预测过程训练由a、b二进制向c各种转换矩阵和权重神经网络。int2binary {}声明词典由整型数字转成二进制存起来不用随时计算提前存好读取更快。binary_dim 8largest_number pow(2,binary_dim)声明二进制数字维度8二进制能表达最大整数2^8256largest_number。binary np.unpackbits(np.array([range(largest_number)],dtypenp.uint8).T,axis1)
for i in range(largest_number):int2binary[i] binary[i]预先把整数到二进制转换词典存起来。alpha 0.1
input_dim 2
hidden_dim 16
output_dim 1设置参数alpha是学习速度input_dim是输入层向量维度输入a、b两个数是2hidden_dim是隐藏层向量维度隐藏层神经元个数output_dim是输出层向量维度输出一个c是1维。从输入层到隐藏层权重矩阵是216维从隐藏层到输出层权重矩阵是161维隐藏层到隐藏层权重矩阵是16*16维synapse_0 2*np.random.random((input_dim,hidden_dim)) - 1
synapse_1 2*np.random.random((hidden_dim,output_dim)) - 1
synapse_h 2*np.random.random((hidden_dim,hidden_dim)) - 12x-1np.random.random生成从0到1之间随机浮点数2x-1使其取值范围在[-1, 1]。synapse_0_update np.zeros_like(synapse_0)
synapse_1_update np.zeros_like(synapse_1)
synapse_h_update np.zeros_like(synapse_h)声明三个矩阵更新Delta。for j in range(10000):进行10000次迭代。a_int np.random.randint(largest_number/2)
a int2binary[a_int]
b_int np.random.randint(largest_number/2)
b int2binary[b_int]
c_int a_int b_int
c int2binary[c_int]随机生成样本包含二进制a、b、ccaba_int、b_int、c_int分别是a、b、c对应整数格式。d np.zeros_like(c)d存模型对c预测值。overallError 0全局误差观察模型效果。layer_2_deltas list()存储第二层(输出层)残差输出层残差计算公式推导公式http://deeplearning.stanford.edu/wiki/index.php/%E5%8F%8D%E5%90%91%E4%BC%A0%E5%AF%BC%E7%AE%97%E6%B3%95 。layer_1_values list()
layer_1_values.append(np.zeros(hidden_dim))存储第一层(隐藏层)输出值赋0值作为上一个时间值。for position in range(binary_dim):遍历二进制每一位。X np.array([[a[binary_dim - position - 1],b[binary_dim - position - 1]]])
y np.array([[c[binary_dim - position - 1]]]).TX和y分别是样本输入和输出二进制值第position位X对于每个样本有两个值分别是a和b对应第position位。把样本拆成每个二进制位用于训练二进制加法存在进位标记正好适合利用LSTM长短期记忆训练每个样本8个二进制位是一个时间序列。layer_1 sigmoid(np.dot(X,synapse_0) np.dot(layer_1_values[-1],synapse_h))公式Ct sigma(W0·Xt Wh·Ct-1)layer_2 sigmoid(np.dot(layer_1,synapse_1))这里使用的公式是C2 sigma(W1·C1)layer_2_error y - layer_2计算预测值和真实值误差。layer_2_deltas.append((layer_2_error)*sigmoid_output_to_derivative(layer_2))反向传导计算delta添加到数组layer_2_deltasoverallError np.abs(layer_2_error[0])计算累加总误差用于展示和观察。d[binary_dim - position - 1] np.round(layer_2[0][0])存储预测position位输出值。layer_1_values.append(copy.deepcopy(layer_1))存储中间过程生成隐藏层值。future_layer_1_delta np.zeros(hidden_dim)存储下一个时间周期隐藏层历史记忆值先赋一个空值。for position in range(binary_dim):遍历二进制每一位。X np.array([[a[position],b[position]]])取出X值从大位开始更新反向传导按时序逆着一级一级更新。layer_1 layer_1_values[-position-1]取出位对应隐藏层输出。prev_layer_1 layer_1_values[-position-2]取出位对应隐藏层上一时序输出。layer_2_delta layer_2_deltas[-position-1]取出位对应输出层delta。layer_1_delta (future_layer_1_delta.dot(synapse_h.T) layer_2_delta.dot(synapse_1.T)) * sigmoid_output_to_derivative(layer_1)神经网络反向传导公式加上隐藏层?值。synapse_1_update np.atleast_2d(layer_1).T.dot(layer_2_delta)累加权重矩阵更新对权重(权重矩阵)偏导等于本层输出与下一层delta点乘。synapse_h_update np.atleast_2d(prev_layer_1).T.dot(layer_1_delta)前一时序隐藏层权重矩阵更新前一时序隐藏层输出与本时序delta点乘。synapse_0_update X.T.dot(layer_1_delta)输入层权重矩阵更新。future_layer_1_delta layer_1_delta记录本时序隐藏层delta。synapse_0 synapse_0_update * alpha
synapse_1 synapse_1_update * alpha
synapse_h synapse_h_update * alpha权重矩阵更新。synapse_0_update * 0
synapse_1_update * 0
synapse_h_update * 0更新变量归零。if(j % 1000 0):print Error: str(overallError)print Pred: str(d)print True: str(c)out 0for index,x in enumerate(reversed(d)):out x*pow(2,index)print str(a_int) str(b_int) str(out)print ------------每训练1000个样本输出总误差信息运行时看收敛过程。LSTM最简单实现没有考虑偏置变量只有两个神经元。完整LSTM python实现。完全参照论文great intro paper实现,代码来源https://github.com/nicodjimenez/lstm 作者解释http://nicodjimenez.github.io/2014/08/08/lstm.html 具体过程参考http://colah.github.io/posts/2015-08-Understanding-LSTMs/ 图。import random
import numpy as np
import mathdef sigmoid(x):return 1. / (1 np.exp(-x))声明sigmoid函数。def rand_arr(a, b, *args):np.random.seed(0)return np.random.rand(*args) * (b - a) a生成随机矩阵取值范围[a,b)shape用args指定。class LstmParam:def __init__(self, mem_cell_ct, x_dim):self.mem_cell_ct mem_cell_ctself.x_dim x_dimconcat_len x_dim mem_cell_ct# weight matricesself.wg rand_arr(-0.1, 0.1, mem_cell_ct, concat_len)self.wi rand_arr(-0.1, 0.1, mem_cell_ct, concat_len)self.wf rand_arr(-0.1, 0.1, mem_cell_ct, concat_len)self.wo rand_arr(-0.1, 0.1, mem_cell_ct, concat_len)# bias termsself.bg rand_arr(-0.1, 0.1, mem_cell_ct)self.bi rand_arr(-0.1, 0.1, mem_cell_ct)self.bf rand_arr(-0.1, 0.1, mem_cell_ct)self.bo rand_arr(-0.1, 0.1, mem_cell_ct)# diffs (derivative of loss function w.r.t. all parameters)self.wg_diff np.zeros((mem_cell_ct, concat_len))self.wi_diff np.zeros((mem_cell_ct, concat_len))self.wf_diff np.zeros((mem_cell_ct, concat_len))self.wo_diff np.zeros((mem_cell_ct, concat_len))self.bg_diff np.zeros(mem_cell_ct)self.bi_diff np.zeros(mem_cell_ct)self.bf_diff np.zeros(mem_cell_ct)self.bo_diff np.zeros(mem_cell_ct)LstmParam类传递参数mem_cell_ct是lstm神经元数目x_dim是输入数据维度concat_len是mem_cell_ct与x_dim长度和wg是输入节点权重矩阵wi是输入门权重矩阵wf是忘记门权重矩阵wo是输出门权重矩阵bg、bi、bf、bo分别是输入节点、输入门、忘记门、输出门偏置wg_diff、wi_diff、wf_diff、wo_diff分别是输入节点、输入门、忘记门、输出门权重损失bg_diff、bi_diff、bf_diff、bo_diff分别是输入节点、输入门、忘记门、输出门偏置损失初始化按照矩阵维度初始化损失矩阵归零。 def apply_diff(self, lr 1):self.wg - lr * self.wg_diffself.wi - lr * self.wi_diffself.wf - lr * self.wf_diffself.wo - lr * self.wo_diffself.bg - lr * self.bg_diffself.bi - lr * self.bi_diffself.bf - lr * self.bf_diffself.bo - lr * self.bo_diff# reset diffs to zeroself.wg_diff np.zeros_like(self.wg)self.wi_diff np.zeros_like(self.wi)self.wf_diff np.zeros_like(self.wf)self.wo_diff np.zeros_like(self.wo)self.bg_diff np.zeros_like(self.bg)self.bi_diff np.zeros_like(self.bi)self.bf_diff np.zeros_like(self.bf)self.bo_diff np.zeros_like(self.bo)定义权重更新过程先减损失再把损失矩阵归零。class LstmState:def __init__(self, mem_cell_ct, x_dim):self.g np.zeros(mem_cell_ct)self.i np.zeros(mem_cell_ct)self.f np.zeros(mem_cell_ct)self.o np.zeros(mem_cell_ct)self.s np.zeros(mem_cell_ct)self.h np.zeros(mem_cell_ct)self.bottom_diff_h np.zeros_like(self.h)self.bottom_diff_s np.zeros_like(self.s)self.bottom_diff_x np.zeros(x_dim)LstmState存储LSTM神经元状态包括g、i、f、o、s、hs是内部状态矩阵(记忆)h是隐藏层神经元输出矩阵。class LstmNode:def __init__(self, lstm_param, lstm_state):# store reference to parameters and to activationsself.state lstm_stateself.param lstm_param# non-recurrent input to nodeself.x None# non-recurrent input concatenated with recurrent inputself.xc NoneLstmNode对应样本输入x是输入样本xxc是用hstack把x和递归输入节点拼接矩阵hstack是横拼矩阵vstack是纵拼矩阵。 def bottom_data_is(self, x, s_prev None, h_prev None):# if this is the first lstm node in the networkif s_prev None: s_prev np.zeros_like(self.state.s)if h_prev None: h_prev np.zeros_like(self.state.h)# save data for use in backpropself.s_prev s_prevself.h_prev h_prev# concatenate x(t) and h(t-1)xc np.hstack((x, h_prev))self.state.g np.tanh(np.dot(self.param.wg, xc) self.param.bg)self.state.i sigmoid(np.dot(self.param.wi, xc) self.param.bi)self.state.f sigmoid(np.dot(self.param.wf, xc) self.param.bf)self.state.o sigmoid(np.dot(self.param.wo, xc) self.param.bo)self.state.s self.state.g * self.state.i s_prev * self.state.fself.state.h self.state.s * self.state.oself.x xself.xc xcbottom和top是两个方向输入样本从底部输入反向传导从顶部向底部传导bottom_data_is是输入样本过程把x和先前输入拼接成矩阵用公式wxb分别计算g、i、f、o值激活函数tanh和sigmoid。每个时序神经网络有四个神经网络层(激活函数)最左边忘记门直接生效到记忆C第二个输入门依赖输入样本数据按照一定“比例”影响记忆C“比例”通过第三个层(tanh)实现取值范围是[-1,1]可以正向影响也可以负向影响最后一个输出门每一时序产生输出既依赖输入样本x和上一时序输出还依赖记忆C设计模仿生物神经元记忆功能。 def top_diff_is(self, top_diff_h, top_diff_s):# notice that top_diff_s is carried along the constant error carouselds self.state.o * top_diff_h top_diff_sdo self.state.s * top_diff_hdi self.state.g * dsdg self.state.i * dsdf self.s_prev * ds# diffs w.r.t. vector inside sigma / tanh functiondi_input (1. - self.state.i) * self.state.i * didf_input (1. - self.state.f) * self.state.f * dfdo_input (1. - self.state.o) * self.state.o * dodg_input (1. - self.state.g ** 2) * dg# diffs w.r.t. inputsself.param.wi_diff np.outer(di_input, self.xc)self.param.wf_diff np.outer(df_input, self.xc)self.param.wo_diff np.outer(do_input, self.xc)self.param.wg_diff np.outer(dg_input, self.xc)self.param.bi_diff di_inputself.param.bf_diff df_inputself.param.bo_diff do_inputself.param.bg_diff dg_input# compute bottom diffdxc np.zeros_like(self.xc)dxc np.dot(self.param.wi.T, di_input)dxc np.dot(self.param.wf.T, df_input)dxc np.dot(self.param.wo.T, do_input)dxc np.dot(self.param.wg.T, dg_input)# save bottom diffsself.state.bottom_diff_s ds * self.state.fself.state.bottom_diff_x dxc[:self.param.x_dim]self.state.bottom_diff_h dxc[self.param.x_dim:]反向传导整个训练过程核心。假设在t时刻lstm输出预测值h(t)实际输出值是y(t)之间差别是损失假设损失函数为l(t) f(h(t), y(t)) ||h(t) - y(t)||^2欧式距离整体损失函数是L(t) ∑l(t)t从1到TT表示整个事件序列最大长度。最终目标是用梯度下降法让L(t)最小化找到一个最优权重w使得L(t)最小当w发生微小变化L(t)不再变化达到局部最优即L对w偏导梯度为0。dL/dw表示当w发生单位变化L变化多少dh(t)/dw表示当w发生单位变化h(t)变化多少dL/dh(t)表示当h(t)发生单位变化时L变化多少(dL/dh(t)) * (dh(t)/dw)表示第t时序第i个记忆单元w发生单位变化L变化多少把所有由1到M的i和所有由1到T的t累加是整体dL/dw。第i个记忆单元h(t)发生单位变化整个从1到T时序所有局部损失l的累加和是dL/dh(t)h(t)只影响从t到T时序局部损失l。假设L(t)表示从t到T损失和L(t) ∑l(s)。h(t)对w导数。L(t) l(t) L(t1)dL(t)/dh(t) dl(t)/dh(t) dL(t1)/dh(t)用下一时序导数得出当前时序导数规律推导计算T时刻导数往前推在T时刻dL(T)/dh(T) dl(T)/dh(T)。class LstmNetwork():def __init__(self, lstm_param):self.lstm_param lstm_paramself.lstm_node_list []# input sequenceself.x_list []def y_list_is(self, y_list, loss_layer):Updates diffs by setting target sequencewith corresponding loss layer.Will *NOT* update parameters. To update parameters,call self.lstm_param.apply_diff()assert len(y_list) len(self.x_list)idx len(self.x_list) - 1# first node only gets diffs from label ...loss loss_layer.loss(self.lstm_node_list[idx].state.h, y_list[idx])diff_h loss_layer.bottom_diff(self.lstm_node_list[idx].state.h, y_list[idx])# here s is not affecting loss due to h(t1), hence we set equal to zerodiff_s np.zeros(self.lstm_param.mem_cell_ct)self.lstm_node_list[idx].top_diff_is(diff_h, diff_s)idx - 1### ... following nodes also get diffs from next nodes, hence we add diffs to diff_h### we also propagate error along constant error carousel using diff_swhile idx 0:loss loss_layer.loss(self.lstm_node_list[idx].state.h, y_list[idx])diff_h loss_layer.bottom_diff(self.lstm_node_list[idx].state.h, y_list[idx])diff_h self.lstm_node_list[idx 1].state.bottom_diff_hdiff_s self.lstm_node_list[idx 1].state.bottom_diff_sself.lstm_node_list[idx].top_diff_is(diff_h, diff_s)idx - 1return lossdiff_h(预测结果误差发生单位变化损失L多少dL(t)/dh(t)数值计算)由idx从T往前遍历到1计算loss_layer.bottom_diff和下一个时序bottom_diff_h和作为diff_h(第一次遍历即T不加bottom_diff_h)。loss_layer.bottom_diff def bottom_diff(self, pred, label):diff np.zeros_like(pred)diff[0] 2 * (pred[0] - label)return diffl(t) f(h(t), y(t)) ||h(t) - y(t)||^2导数l(t) 2 * (h(t) - y(t))。当s(t)发生变化L(t)变化来源s(t)影响h(t)和h(t1)影响L(t)。h(t1)不会影响l(t)。左边式子(dL(t)/dh(t)) * (dh(t)/ds(t))由t1到t来逐级反推dL(t)/ds(t)。神经元self.state.h self.state.s self.state.oh(t) s(t) o(t)dh(t)/ds(t) o(t)dL(t)/dh(t)是top_diff_h。top_diff_isBottom means input to the layer, top means output of the layer. Caffe also uses this terminology. bottom表示神经网络层输入top表示神经网络层输出和caffe概念一致。def top_diff_is(self, top_diff_h, top_diff_s):top_diff_h表示当前t时序dL(t)/dh(t), top_diff_s表示t1时序记忆单元dL(t)/ds(t)。 ds self.state.o * top_diff_h top_diff_sdo self.state.s * top_diff_hdi self.state.g * dsdg self.state.i * dsdf self.s_prev * ds前缀d表达误差L对某一项导数(directive)。ds是在根据公式dL(t)/ds(t)计算当前t时序dL(t)/ds(t)。do是计算dL(t)/do(t)h(t) s(t) o(t)dh(t)/do(t) s(t)dL(t)/do(t) (dL(t)/dh(t)) (dh(t)/do(t)) top_diff_h * s(t)。di是计算dL(t)/di(t)。s(t) f(t) s(t-1) i(t) g(t)。dL(t)/di(t) (dL(t)/ds(t)) (ds(t)/di(t)) ds g(t)。dg是计算dL(t)/dg(t)dL(t)/dg(t) (dL(t)/ds(t)) (ds(t)/dg(t)) ds i(t)。df是计算dL(t)/df(t)dL(t)/df(t) (dL(t)/ds(t)) (ds(t)/df(t)) ds s(t-1)。 di_input (1. - self.state.i) * self.state.i * didf_input (1. - self.state.f) * self.state.f * dfdo_input (1. - self.state.o) * self.state.o * dodg_input (1. - self.state.g ** 2) * dgsigmoid函数导数tanh函数导数。di_input(1. - self.state.i) * self.state.isigmoid导数当i神经元输入发生单位变化时输出值有多大变化再乘di表示当i神经元输入发生单位变化时误差L(t)发生多大变化dL(t)/d i_input(t)。 self.param.wi_diff np.outer(di_input, self.xc)self.param.wf_diff np.outer(df_input, self.xc)self.param.wo_diff np.outer(do_input, self.xc)self.param.wg_diff np.outer(dg_input, self.xc)self.param.bi_diff di_inputself.param.bf_diff df_inputself.param.bo_diff do_inputself.param.bg_diff dg_inputw_diff是权重矩阵误差b_diff是偏置误差用于更新。 dxc np.zeros_like(self.xc)dxc np.dot(self.param.wi.T, di_input)dxc np.dot(self.param.wf.T, df_input)dxc np.dot(self.param.wo.T, do_input)dxc np.dot(self.param.wg.T, dg_input)累加输入xdiffx在四处起作用四处diff加和后作xdiff。 self.state.bottom_diff_s ds * self.state.fself.state.bottom_diff_x dxc[:self.param.x_dim]self.state.bottom_diff_h dxc[self.param.x_dim:]bottom_diff_s是在t-1时序上s变化和t时序上s变化时f倍关系。dxc是x和h横向合并矩阵分别取两部分diff信息bottom_diff_x和bottom_diff_h。def x_list_clear(self):self.x_list []def x_list_add(self, x):self.x_list.append(x)if len(self.x_list) len(self.lstm_node_list):# need to add new lstm node, create new state memlstm_state LstmState(self.lstm_param.mem_cell_ct, self.lstm_param.x_dim)self.lstm_node_list.append(LstmNode(self.lstm_param, lstm_state))# get index of most recent x inputidx len(self.x_list) - 1if idx 0:# no recurrent inputs yetself.lstm_node_list[idx].bottom_data_is(x)else:s_prev self.lstm_node_list[idx - 1].state.sh_prev self.lstm_node_list[idx - 1].state.hself.lstm_node_list[idx].bottom_data_is(x, s_prev, h_prev)添加训练样本输入x数据。def example_0():# learns to repeat simple sequence from random inputsnp.random.seed(0)# parameters for input data dimension and lstm cell countmem_cell_ct 100x_dim 50concat_len x_dim mem_cell_ctlstm_param LstmParam(mem_cell_ct, x_dim)lstm_net LstmNetwork(lstm_param)y_list [-0.5,0.2,0.1, -0.5]input_val_arr [np.random.random(x_dim) for _ in y_list]for cur_iter in range(100):print cur iter: , cur_iterfor ind in range(len(y_list)):lstm_net.x_list_add(input_val_arr[ind])print y_pred[%d] : %f % (ind, lstm_net.lstm_node_list[ind].state.h[0])loss lstm_net.y_list_is(y_list, ToyLossLayer)print loss: , losslstm_param.apply_diff(lr0.1)lstm_net.x_list_clear()初始化LstmParam指定记忆存储单元数为100指定输入样本x维度是50。初始化LstmNetwork训练模型生成4组各50个随机数分别以[-0.5,0.2,0.1, -0.5]作为y值训练每次喂50个随机数和一个y值迭代100次。lstm输入一串连续质数预估下一个质数。小测试生成100以内质数循环拿出50个质数序列作x第51个质数作y拿出10个样本参与训练1w次均方误差由0.17973最终达到了1.05172e-06几乎完全正确import numpy as np
import sysfrom lstm import LstmParam, LstmNetworkclass ToyLossLayer:Computes square loss with first element of hidden layer array.classmethoddef loss(self, pred, label):return (pred[0] - label) ** 2 classmethoddef bottom_diff(self, pred, label):diff np.zeros_like(pred)diff[0] 2 * (pred[0] - label)return diffclass Primes:def __init__(self):self.primes list()for i in range(2, 100):is_prime Truefor j in range(2, i-1):if i % j 0:is_prime Falseif is_prime:self.primes.append(i)self.primes_count len(self.primes)def get_sample(self, x_dim, y_dim, index):result np.zeros((x_dimy_dim))for i in range(index, index x_dim y_dim):result[i-index] self.primes[i%self.primes_count]/100.0return resultdef example_0():mem_cell_ct 100x_dim 50concat_len x_dim mem_cell_ctlstm_param LstmParam(mem_cell_ct, x_dim)lstm_net LstmNetwork(lstm_param)primes Primes()x_list []y_list []for i in range(0, 10):sample primes.get_sample(x_dim, 1, i)x sample[0:x_dim]y sample[x_dim:x_dim1].tolist()[0]x_list.append(x)y_list.append(y)for cur_iter in range(10000):if cur_iter % 1000 0:print y_list, y_listfor ind in range(len(y_list)):lstm_net.x_list_add(x_list[ind])if cur_iter % 1000 0:print y_pred[%d] : %f % (ind, lstm_net.lstm_node_list[ind].state.h[0])loss lstm_net.y_list_is(y_list, ToyLossLayer)if cur_iter % 1000 0:print loss: , losslstm_param.apply_diff(lr0.01)lstm_net.x_list_clear()if __name__ __main__:example_0()质数列表全都除以100这个代码训练数据必须是小于1数值。torch是深度学习框架。1tensorflow谷歌主推时下最火小型试验和大型计算都可以基于python缺点是上手相对较难速度一般2torchfacebook主推用于小型试验开源应用较多基于lua上手较快网上文档较全缺点是lua语言相对冷门3mxnetAmazon主推主要用于大型计算基于python和R缺点是网上开源项目较少4caffefacebook主推用于大型计算基于c、python缺点是开发不是很方便5theano速度一般基于python评价很好。torch github上lstm实现项目比较多。在mac上安装torch。https://github.com/torch/torch7/wiki/Cheatsheet#installing-and-running-torch 。git clone https://github.com/torch/distro.git ~/torch --recursive
cd ~/torch; bash install-deps;
./install.shqt安装不成功问题自己单独安装。brew install cartr/qt4/qt安装后需要手工加到~/.bash_profile中。. ~/torch/install/bin/torch-activatesource ~/.bash_profile后执行th使用torch。安装itorch安装依赖brew install zeromq
brew install openssl
luarocks install luacrypto OPENSSL_DIR/usr/local/opt/openssl/git clone https://github.com/facebook/iTorch.git
cd iTorch
luarocks make
用卷积神经网络实现图像识别。创建pattern_recognition.luarequire nn
require paths
if (not paths.filep(cifar10torchsmall.zip)) thenos.execute(wget -c https://s3.amazonaws.com/torch7/data/cifar10torchsmall.zip)os.execute(unzip cifar10torchsmall.zip)
end
trainset torch.load(cifar10-train.t7)
testset torch.load(cifar10-test.t7)
classes {airplane, automobile, bird, cat,
deer, dog, frog, horse, ship, truck}
setmetatable(trainset,
{__index function(t, i)return {t.data[i], t.label[i]}
end}
);
trainset.data trainset.data:double() -- convert the data from a ByteTensor to a DoubleTensor.function trainset:size()return self.data:size(1)
end
mean {} -- store the mean, to normalize the test set in the future
stdv {} -- store the standard-deviation for the future
for i1,3 do -- over each image channelmean[i] trainset.data[{ {}, {i}, {}, {} }]:mean() -- mean estimationprint(Channel .. i .. , Mean: .. mean[i])trainset.data[{ {}, {i}, {}, {} }]:add(-mean[i]) -- mean subtractionstdv[i] trainset.data[{ {}, {i}, {}, {} }]:std() -- std estimationprint(Channel .. i .. , Standard Deviation: .. stdv[i])trainset.data[{ {}, {i}, {}, {} }]:div(stdv[i]) -- std scaling
end
net nn.Sequential()
net:add(nn.SpatialConvolution(3, 6, 5, 5)) -- 3 input image channels, 6 output channels, 5x5 convolution kernel
net:add(nn.ReLU()) -- non-linearity
net:add(nn.SpatialMaxPooling(2,2,2,2)) -- A max-pooling operation that looks at 2x2 windows and finds the max.
net:add(nn.SpatialConvolution(6, 16, 5, 5))
net:add(nn.ReLU()) -- non-linearity
net:add(nn.SpatialMaxPooling(2,2,2,2))
net:add(nn.View(16*5*5)) -- reshapes from a 3D tensor of 16x5x5 into 1D tensor of 16*5*5
net:add(nn.Linear(16*5*5, 120)) -- fully connected layer (matrix multiplication between input and weights)
net:add(nn.ReLU()) -- non-linearity
net:add(nn.Linear(120, 84))
net:add(nn.ReLU()) -- non-linearity
net:add(nn.Linear(84, 10)) -- 10 is the number of outputs of the network (in this case, 10 digits)
net:add(nn.LogSoftMax()) -- converts the output to a log-probability. Useful for classification problems
criterion nn.ClassNLLCriterion()
trainer nn.StochasticGradient(net, criterion)
trainer.learningRate 0.001
trainer.maxIteration 5
trainer:train(trainset)
testset.data testset.data:double() -- convert from Byte tensor to Double tensor
for i1,3 do -- over each image channeltestset.data[{ {}, {i}, {}, {} }]:add(-mean[i]) -- mean subtractiontestset.data[{ {}, {i}, {}, {} }]:div(stdv[i]) -- std scaling
end
predicted net:forward(testset.data[100])
print(classes[testset.label[100]])
print(predicted:exp())
for i1,predicted:size(1) doprint(classes[i], predicted[i])
end
correct 0
for i1,10000 dolocal groundtruth testset.label[i]local prediction net:forward(testset.data[i])local confidences, indices torch.sort(prediction, true) -- true means sort in descending orderif groundtruth indices[1] thencorrect correct 1end
endprint(correct, 100*correct/10000 .. % )
class_performance {0, 0, 0, 0, 0, 0, 0, 0, 0, 0}
for i1,10000 dolocal groundtruth testset.label[i]local prediction net:forward(testset.data[i])local confidences, indices torch.sort(prediction, true) -- true means sort in descending orderif groundtruth indices[1] thenclass_performance[groundtruth] class_performance[groundtruth] 1end
endfor i1,#classes doprint(classes[i], 100*class_performance[i]/1000 .. %)
end
执行th pattern_recognition.lua。首先下载cifar10torchsmall.zip样本有50000张训练用图片10000张测试用图片分别都标注包括airplane、automobile等10种分类对trainset绑定__index和size方法兼容nn.Sequential使用绑定函数看lua教程http://tylerneylon.com/a/learn-lua/ ,trainset数据正规化数据转成均值为1方差为1的double类型张量。初始化卷积神经网络模型包括两层卷积、两层池化、一个全连接以及一个softmax层进行训练学习率为0.001迭代5次模型训练好后对测试机第100号图片做预测打印出整体正确率以及每种分类准确率。https://github.com/soumith/cvpr2015/blob/master/Deep%20Learning%20with%20Torch.ipynb 。torch可以方便支持gpu计算需要对代码做修改。比较流行的seq2seq基本都用lstm组成编码器解码器模型实现开源实现大都基于one-hot embedding(没有词向量表达信息量大)。word2vec词向量 seq2seq模型只有一个lstm单元机器人。下载《甄环传》小说原文。上网随便百度“甄环传 txt”下载下来把文件转码成utf-8编码把windows回车符都替换成n以便后续处理。对甄环传切词。切词工具word_segment.py到github下载地址在https://github.com/warmheartli/ChatBotCourse/blob/master/word_segment.py 。python ./word_segment.py zhenhuanzhuan.txt zhenhuanzhuan.segment
生成词向量。用word2vecword2vec源码 https://github.com/warmheartli/ChatBotCourse/tree/master/word2vec 。make编译即可执行。./word2vec -train ./zhenhuanzhuan.segment -output vectors.bin -cbow 1 -size 200 -window 8 -negative 25 -hs 0 -sample 1e-4 -threads 20 -binary 1 -iter 15
生成一个vectors.bin文件基于甄环传原文生成的词向量文件。训练代码。# -*- coding: utf-8 -*-import sys
import math
import tflearn
import chardet
import numpy as np
import structseq []max_w 50
float_size 4
word_vector_dict {}def load_vectors(input):从vectors.bin加载词向量返回一个word_vector_dict的词典key是词value是200维的向量print begin load vectorsinput_file open(input, rb)# 获取词表数目及向量维度words_and_size input_file.readline()words_and_size words_and_size.strip()words long(words_and_size.split( )[0])size long(words_and_size.split( )[1])print words , wordsprint size , sizefor b in range(0, words):a 0word # 读取一个词while True:c input_file.read(1)word word cif False c or c :breakif a max_w and c ! n:a a 1word word.strip()vector []for index in range(0, size):m input_file.read(float_size)(weight,) struct.unpack(f, m)vector.append(weight)# 将词及其对应的向量存到dict中word_vector_dict[word.decode(utf-8)] vectorinput_file.close()print load vectors finishdef init_seq():读取切好词的文本文件加载全部词序列file_object open(zhenhuanzhuan.segment, r)vocab_dict {}while True:line file_object.readline()if line:for word in line.decode(utf-8).split( ):if word_vector_dict.has_key(word):seq.append(word_vector_dict[word])else:breakfile_object.close()def vector_sqrtlen(vector):len 0for item in vector:len item * itemlen math.sqrt(len)return lendef vector_cosine(v1, v2):if len(v1) ! len(v2):sys.exit(1)sqrtlen1 vector_sqrtlen(v1)sqrtlen2 vector_sqrtlen(v2)value 0for item1, item2 in zip(v1, v2):value item1 * item2return value / (sqrtlen1*sqrtlen2)def vector2word(vector):max_cos -10000match_word for word in word_vector_dict:v word_vector_dict[word]cosine vector_cosine(vector, v)if cosine max_cos:max_cos cosinematch_word wordreturn (match_word, max_cos)def main():load_vectors(./vectors.bin)init_seq()xlist []ylist []test_X None#for i in range(len(seq)-100):for i in range(10):sequence seq[i:i20]xlist.append(sequence)ylist.append(seq[i20])if test_X is None:test_X np.array(sequence)(match_word, max_cos) vector2word(seq[i20])print right answer, match_word, max_cosX np.array(xlist)Y np.array(ylist)net tflearn.input_data([None, 20, 200])net tflearn.lstm(net, 200)net tflearn.fully_connected(net, 200, activationlinear)net tflearn.regression(net, optimizersgd, learning_rate0.1,lossmean_square)model tflearn.DNN(net)model.fit(X, Y, n_epoch500, batch_size10,snapshot_epochFalse,show_metricTrue)model.save(model)predict model.predict([test_X])#print predict#for v in test_X:# print vector2word(v)(match_word, max_cos) vector2word(predict[0])print predict, match_word, max_cosmain()load_vectors从vectors.bin加载词向量init_seq加载甄环传切词文本并存到一个序列里vector2word求距离某向量最近词模型只有一个lstm单元。经过500个epoch训练均方损失降到0.33673以0.941794432002余弦相似度预测出下一个字。强大gpu调整参数整篇文章都训练修改代码predict部分不断输出下一个字自动吐出甄环体。基于tflearn实现tflearn官方文档examples实现seq2seq直接调用tensorflow中的tensorflow/python/ops/seq2seq.py基于one-hot embedding方法一定没有词向量效果好。本文作者利炳根原文链接