精华内容
下载资源
问答
  • pytorch LSTM图像分类

    千次阅读 热门讨论 2019-07-07 14:05:52
    一个问题:pytorch官方文档对LSTM的输入参数的格式是input of shape (seq_len, batch, input_size),但是本例中images.reshape(-1, sequence_length, input_size)的输入格式为batch,seq_len, input_size,是不是官文...
    • 一个问题:pytorch官方文档对LSTM的输入参数的格式是input of shape (seq_len, batch, input_size),但是本例中images.reshape(-1, sequence_length, input_size)的输入格式为batch,seq_len, input_size,是不是官文写错了?
    import torch 
    import torch.nn as nn
    import torchvision
    import torchvision.transforms as transforms
    
    
    # Device configuration
    device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
    
    # Hyper-parameters
    sequence_length = 28
    input_size = 28
    hidden_size = 128
    num_layers = 2
    num_classes = 10
    batch_size = 100
    num_epochs = 2
    learning_rate = 0.01
    
    # MNIST dataset
    train_dataset = torchvision.datasets.MNIST(root='../../data/',
                                               train=True, 
                                               transform=transforms.ToTensor(),
                                               download=True)
    
    test_dataset = torchvision.datasets.MNIST(root='../../data/',
                                              train=False, 
                                              transform=transforms.ToTensor())
    
    # Data loader
    train_loader = torch.utils.data.DataLoader(dataset=train_dataset,
                                               batch_size=batch_size, 
                                               shuffle=True)
    
    test_loader = torch.utils.data.DataLoader(dataset=test_dataset,
                                              batch_size=batch_size, 
                                              shuffle=False)
    
    # Recurrent neural network (many-to-one)
    class RNN(nn.Module):
        def __init__(self, input_size, hidden_size, num_layers, num_classes):
            super(RNN, self).__init__()
            self.hidden_size = hidden_size
            self.num_layers = num_layers
            
            self.lstm = nn.LSTM(input_size, hidden_size, num_layers, batch_first=True)
            self.fc = nn.Linear(hidden_size, num_classes)
        
        def forward(self, x):
            # 初始化的隐藏元和记忆元,通常它们的维度是一样的
            h0 = torch.zeros(self.num_layers, x.size(0), self.hidden_size).to(device) #x.size(0)是batch_size
            c0 = torch.zeros(self.num_layers, x.size(0), self.hidden_size).to(device)
            
            # Forward propagate LSTM
            out, _ = self.lstm(x, (h0, c0))  # out: tensor of shape (batch_size, seq_length, hidden_size)
            
            # Decode the hidden state of the last time step
            out = self.fc(out[:, -1, :])
            return out
    
    model = RNN(input_size, hidden_size, num_layers, num_classes).to(device)
    
    
    # Loss and optimizer
    criterion = nn.CrossEntropyLoss()
    optimizer = torch.optim.Adam(model.parameters(), lr=learning_rate)
    
    # Train the model
    total_step = len(train_loader)
    for epoch in range(num_epochs):
        for i, (images, labels) in enumerate(train_loader):
            images = images.reshape(-1, sequence_length, input_size).to(device)
            labels = labels.to(device)
            print('size',images.shape)
            # Forward pass
            outputs = model(images)
            print(outputs.size())
            loss = criterion(outputs, labels)
            
            # Backward and optimize
            optimizer.zero_grad()
            loss.backward()
            optimizer.step()
            
            if (i+1) % 100 == 0:
                print ('Epoch [{}/{}], Step [{}/{}], Loss: {:.4f}' 
                       .format(epoch+1, num_epochs, i+1, total_step, loss.item()))
    
    # Test the model
    with torch.no_grad():
        correct = 0
        total = 0
        for images, labels in test_loader:
            images = images.reshape(-1, sequence_length, input_size).to(device)
            labels = labels.to(device)
            outputs = model(images)
            _, predicted = torch.max(outputs.data, 1)
            total += labels.size(0)
            correct += (predicted == labels).sum().item()
    
        print('Test Accuracy of the model on the 10000 test images: {} %'.format(100 * correct / total)) 
    
    # Save the model checkpoint
    torch.save(model.state_dict(), 'model.ckpt')
    
    展开全文
  • LSTM实现手写数字识别 # 使用rnn实现mnist数据集分类 # 了解rnn的基本结构和实现 import tensorflow as tf import numpy as np from tensorflow . examples . tutorials . mnist import input_...

    最近论文需要用到深度学习,发现一年没用过的tensorflow,基本上都忘了。这篇文章是北大软微曹健老师的视频笔记

    基础

    在tensorflow中,用张量表示数据,用计算图搭建神网络,用会话(session)执行计算图,优化线上的权重,得到模型。
    张量: n维数组,0阶是标亮、1阶是向量、2阶是矩阵、n阶就是张量了
    计算图: 搭建神经网络的计算过程,只搭
    看一个最简单的计算图:

    import tensorflow as tf
    a = tf.constant([1.0, 2.0])
    b = tf.constant([3.0, 4.0])
    result = a + b
    print(result)
    

    结果

    Tensor("add:0", shape=(2,), dtype=float32)
    

    这里可以看出来,计算图只描述计算过程,不计算运算结果

    会话: 如果要得到计算图的计算结果,就需要使用会话

    with tf.session() as sess:
    	print(sess.run(result))
    

    参数
    Variable表示 tensorflow 中的变量,线上的权重W,用变量表示,随机给初值

    # random_normal表示生成标准正太分布的矩阵,其他可选的还有 truncated_normal() ->去掉过大偏离点的正态分布 random_uniform ->平均分布
    # [2,3]表示矩阵为两行三列的矩阵
    # stddev表示矩阵的标准差为2
    # mean表示均值为0
    # seed是随机种子,如果不设置,每次生成的随机数会不一样
    w = tf.Variable(tf.random_normal([2,3], stddev=2, mean=0, seed=1))
    
    tf.zeros([3,2], int32) #三行两列的全0数组
    tf.ones([3,2], int32) #三行两列的全1数组
    tf.fill([3,2],6) #三行两列,数组值全是6
    tf.constant([3,2,1]) #直接生成[3,2,1] tensorflow中运算,就需要使用tensorflow的数据格式
    

    占位符
    在神经网络的训练中输入参数可能有多组,用placeholder来接收输入参数x

    x = tf.placeholder(tf.float32, shape(1,2)) #生成一个一行两列的占位符
    

    全局参数初始化
    Variable生成的过程,只是指定了初始化的方式,由于计算图本身不进行任何运算,因此并没有真正初始化,如果要初始化就需要使用下面的方式:

    with tf.Session() as sess:
    	init_op = tf.global_variables_initializer()
    	sess.run(init_op)
    

    一个简单训练的实现过程

    神经网络的实现过程大致可以分为以下几步:

    1. 准备数据集,提取特征,作为输入喂给神经网络
    2. 搭建NN结构,从输入到输出(包括搭建计算图,会话执行,前向传播)
    3. 大量特征数据喂给NN,迭代优化NN参数(反向传播,优化参数,训练模型)
    4. 使用训练好的模型预测和分类

    下面看一个简单的例子:

    import tensorflow as tf
    import numpy as np
    BATCH_SIZE = 8
    seed = 23455
    
    # 准备数据集
    rng = np.random.RandomState(seed)
    X = rng.rand(32, 2) #产生32行2列的随机矩阵
    Y = [[int(x0+x1<1)] for (x0,x1) in X]
    print(Y)
    
    # 搭建网络结构
    x = tf.placeholder(tf.float32, shape=(None, 2))
    y_ = tf.placeholder(tf.float32, shape=(None, 1))
    
    w1 = tf.Variable(tf.random_normal([2,3], stddev=1, seed=1))
    w2 = tf.Variable(tf.random_normal([3,1], stddev=1, seed=1))
    
    a = tf.matmul(x, w1)
    y = tf.matmul(a, w2)
    loss = tf.reduce_mean(tf.square(y-y_))
    train_step = tf.train.GradientDescentOptimizer(0.001).minimize(loss)
    #train_step = tf.train.MomentumOptimizer(0.001).minimize(loss)
    #train_step = tf.train.AdamOptimizer(0.001).minimize(loss)
    
    #训练模型
    with tf.Session() as sess:
        init_op = tf.global_variables_initializer()
        sess.run(init_op)
        STEPS = 3000
        for i in range(STEPS):
            start = (i*BATCH_SIZE) % 32
            end = start + BATCH_SIZE
            sess.run(train_step, feed_dict={x: X[start: end], y_ : Y[start: end]})
            if i % 500 == 0:
                total_loss = sess.run(loss, feed_dict={x:X, y_:Y})
                print("After %d training step(s), loss on all data is %g" % (i, total_loss))
    print("end")
    

    神经网络的优化

    激活函数,常用的有

    tf.nn.relu()
    tf.nn.sigmoid()
    tf.nn.tanh()
    

    损失函数 常用的有:

    #均芳误差
    loss_mse = tf.reduce_mean(tf.square(y_ - y))
    
    #交叉熵表示两个概率分布之间的距离,越大距离越远
    #y小于1e-12为1e-12,大于1.0为1.0(这就要求输入数据都在0-1之间,毕竟是概率)
    ce = -tf.reduce_mean(y_ * tf.log(tf.clip_by_value(y, 1e-12, 1.0)))
    
    #如果输出不满足 0-1 的要求,就无法使用交叉熵,这个时候可以让输出经过softmax来满足这一点
    #tf.argmax()和np.argmax()效果一样,只是需要运行才能有最终的结果
    #如果axis=0,返回列最大元素所在的索引,如果axis=1,返回行最大元素的索引(由于标签大多数是one-hot的形式,每一行1所在的元素就是正确的类别)
    ce = tf.nn.sparse_softmax_cross_entropy_with_logits(logits=y, labels=tf.argmax(y_, axis=1))
    cem = tf.reduce_mean(ce)
    

    自定义损失函数
    下面的损失函数表示,如果y大于y_,执行第一个式子,COST(y-y_),否则执行第二个式子,PROFIT(y_-y)

    loss = tf.reduce_sum(tf.where(tf.greater(y, y_), COST*(y-y_), PROFIT*(y_ - y)))
    

    学习率:
    普通学习率的设置上面已经用过了,这里记录下指数衰减的学习率

     learning_rate = tf.train.exponential_decay(
    	LEARNINF_RATE_BASE,	#学习率基础
    	global_step,	#运行多少轮
    	LEARNING_RATE_STEP,	#多久更新一次学习率,一般就是 总样本数/batch_size
    	LEARNING_RATE_DECAY,	#学习衰减率(0,1)
    	staircase=True
    )
    

    滑动平均
    也叫影子值,记录了每个参数一段时间过往值的平均,增加了模型的泛化性(就好像给参数加上了影子,参数变化,影子缓慢跟随)

    影子 = 衰减率 * 影子 + (1-衰减率)* 参数
    衰减率 = min{MOVING_AVERAGE_DECAY, (1+轮数)/(10+轮数)}
    

    其中 MOVING_AVERAGE_DECAY 是超参数(一般是一个比较大的数,比如0.99)
    tf中使用滑动窗口平均

    #global_step 当前轮数
    ema = tf.train.ExponentialMovingAverage(MOVING_AVERAGE_DECAY), global_step)
    
    # ema_op = ema.apply([])
    ema_op = ema.apply(tf.trainable_variables())	#trainable_variables可以将所有待训练的参数整理成列表的形式返回
    with tf.control_dependencies([train_step, ema_op]):
    	train_op = tf.no_op(name='train')
    	
    # 返回某些参数的滑动平均值
    ema.average(参数名)
    

    正则化:

    loss(w) = tf.contrib.layers.l1_regularizer(REGULARIZER)(w)	#l1正则化项
    loss(w) = tf.contrib.laryers.l2_regularizer(REGULARIZER)(w) #l2正则化项
    
    # 把内容加到集合对应位置做加法
    tf.add_to_collection('losses', tf.contrib.layers.l2_regularizer(regularizer)(w))
    loss = cem + tf.add_n(tf.get_collection('losses')) #再加上交叉熵为总loss
    

    模块化的神经网络搭建

    前向传播(forward.py):

    def forward(x, regularizer):
    	w=
    	b=
    	y=
    	return y
    	
    def get_weight(shape, reguarizer):
    	w = tf.Variable( ) #赋初值
    	tf.add_to_collection('losses', tf.contrib.layers.l2_regularizer(regularizer)(w)) #将w的损失加入到总损失losses中
    	return w
    	
    def get_bias(shape):
    	b = tf.Variable( )	#赋初值
    	return b
    

    反向传播(backward.py)

    def backward():
    	x = tf.placeholder( )
    	y_ = tf.placeholder( )
    	y = forward.forward(x, REGULARIZER)
    	global_step = tf.Variable(0, trainable=False)
    	loss = # 这里有多种,前面的交叉熵、均方误差都可以
    	loss = loss + tf.add_n(tf.get_collection('losses'))	# 加入正则化项
    	learning_rate = #这里学习率可以是固定值,也可以是前面说到的指数衰减学习率	
    

    大概就是这么个流程,视频后面内容没看了。

    LSTM实现手写数字识别

    # 使用rnn实现mnist数据集分类
    # 了解rnn的基本结构和实现
    
    import tensorflow as tf
    import numpy as np
    from tensorflow.examples.tutorials.mnist import input_data
    
    mnist = input_data.read_data_sets('MNIST_data', one_hot=True)
    
    lr = 0.001
    training_iters = 100000
    batch_size = 128
    
    #28*28的图片,每个向量是一行,一共28行
    n_input = 28
    n_step = 28
    n_hidden_unis = 128
    n_classes = 10
    
    x = tf.placeholder(tf.float32, [None, n_step, n_input])
    y = tf.placeholder(tf.float32, [None, n_classes])
    
    weights = {
        'in' : tf.Variable(tf.random_normal([n_input, n_hidden_unis])),
        'out' : tf.Variable(tf.random_normal([n_hidden_unis, n_classes]))
    }
    
    biases = {
        'in' : tf.Variable(tf.constant(0.1, shape=[n_hidden_unis,])),
        'out' : tf.Variable(tf.constant(0.1, shape=[n_classes,]))
    }
    
    def RNN(X, weights, biases):
        #: X[batch_size,28,28] -> X[128*28, 28]
        X = tf.reshape(X, [-1, n_input])
        X_in = tf.matmul(X, weights['in']+biases['in'])
    
        #: x_in[128*28,128] -> x_in[128,28,128]
        X_in = tf.reshape(X_in, [-1, n_step, n_hidden_unis])
    
        # cell: forget_bias=1.0 表示不希望忘记前面的东西    state_is_tuple
        lstm_cell = tf.nn.rnn_cell.BasicLSTMCell(n_hidden_unis, forget_bias=1.0, state_is_tuple=True)
        _init_state = lstm_cell.zero_state(batch_size, dtype=tf.float32)
    
        #time_major=False time_step 不是第一个维度
        outputs, state = tf.nn.dynamic_rnn(lstm_cell, X_in, initial_state = _init_state, time_major=False)
    
        # hidden layer for output as the final result
        # tf.transpose(outputs, [1,0,2])) 将第一和第二两个维度交换(维度1换到0的位置,就是第一个维度,维度0换到1的位置就是第二个维度)
        outputs = tf.unstack(tf.transpose(outputs, [1,0,2]))
        result = tf.matmul(outputs[-1], weights['out']+biases['out'])
        # results = tf.matmul(state[1], weights['out']) + biases['out'] 在这个例子中和上面结果是一样的
        return result
    
    
    
    pred = RNN(x, weights, biases)
    cost = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits=pred, labels=y))
    train_op = tf.train.AdamOptimizer(lr).minimize(cost)
    
    correct_pred = tf.equal(tf.argmax(pred, 1), tf.argmax(y, 1))    #one-hot编码的预测结果
    accuracy = tf.reduce_mean(tf.cast(correct_pred, tf.float32))    #tf.cast数据类型变换
    
    init = tf.initialize_all_variables()
    with tf.Session() as sess:
        sess.run(init)
        step = 0
        while step * batch_size < training_iters:
            batch_xs, batch_ys = mnist.train.next_batch(batch_size)
            batch_xs = batch_xs.reshape([batch_size, n_step, n_input])
            sess.run([train_op], feed_dict={
                x: batch_xs,
                y: batch_ys
            })
            if step % 20 == 0:
                print(sess.run(accuracy, feed_dict={
                    x: batch_xs,
                    y: batch_ys
                }))
    
            step += 1
    
    展开全文
  • Encoder+Decoder+LSTM 预测图像

    千次阅读 热门讨论 2019-09-23 20:49:25
    Git代码地址:...利用卷积网络对每连续的16张图像进行Encoder特征提取,然后将提取的特征序列输入到循环神经网络(LSTM)中,之后通过Decoder反卷积成原图像大小的troch(3,12,8,...

    Git代码地址:https://github.com/wdf19961118/LSTM 

    问题描述:

    在这里基于卷积循环神经网络,做一个图像序列的预测。输入连续的16张图像帧,图像大小(3,128,128)。利用卷积网络对每连续的16张图像进行Encoder特征提取,然后将提取的特征序列输入到循环神经网络(LSTM)中,之后通过Decoder反卷积成原图像大小的troch(3,12,8,128),也可以当做根据前16帧生成了第17帧图像,原序列第17帧图像作为label,计算loss。

    数据预处理:

    1、想得到一个txt文本,里面每一行记录连续的17帧图像的路径

    2、如何生成我们想要的txt路径文件 ?

    数据集存储方式:

    1)文件名按数字顺序:0,1,2。。。

    2)每个文件夹下面都是一个视频的分解帧,命名方式如下:

    3、代码:

    import os
    #
    dir='/home/lab226/wdf/imgsrc'
    fp = open('./img_path.txt','w+')
    imgfile_list = os.listdir('/home/lab226/wdf/imgsrc')
    #对文件夹列表按文件名的数字顺序排序
    imgfile_list.sort(key= lambda x:int(x[:]))
    #print(img_list)
    seqsize =17
    for imgfile in imgfile_list:
        filepath = os.path.join(dir,imgfile)
        img_list = os.listdir(filepath)
        #这个排序比较重要,因为我们要顺序取,但是文件的存储方式并不是按照我们理解的数字顺序存储
        img_list.sort(key=lambda x: int(x[:-4]))
        #滑窗取序列,步长为8
        for i in range(0, len(img_list)-seqsize, 8):
            for j in range(i,i+seqsize):
                 img = img_list[j]
                 path = os.path.join(filepath, img)
                 if j == i+seqsize-1:
                    fp.write(path+'\n')
                 else:
                    fp.write(path+' ')
    fp.close()

     

     数据加载:

    我写了自己的SeqDataset,改写了Dataset类中的__getitem__()函数,使得每次迭代返回连续的16张图像和第17张标签图像。详细代码如下:

    class SeqDataset(Dataset):
        def __init__(self, txt, transform=None, target_transform=None, loader=default_loader):
            fh = open(txt, 'r')
            imgseqs = []
            for line in fh:
                line = line.strip('\n')
                line = line.rstrip()
                imgseqs.append(line)
            self.num_samples = len(imgseqs)
            self.imgseqs = imgseqs
            self.transform = transform
            self.target_transform = target_transform
            self.loader = loader
    
        def __getitem__(self, index):
            current_index = np.random.choice(range(0, self.num_samples))
            imgs_path = self.imgseqs[current_index].split()
            current_imgs = []
            current_imgs_path = imgs_path[:len(imgs_path)-1]
            current_label_path = imgs_path[len(imgs_path)-1]
            current_label = self.loader(current_label_path)
    
    
    
            for frame in current_imgs_path:
                img = self.loader(frame)
                if self.transform is not None:
                    img = self.transform(img)
                current_imgs.append(img)
            current_label = self.transform(current_label)
            #print(current_label.shape)
            batch_cur_imgs = np.stack(current_imgs, axis=0)  
            return batch_cur_imgs, current_label
    
    
    transform_list = [
            transforms.ToTensor()
            ]
    
    data_transforms = transforms.Compose( transform_list )
    
    train_data = SeqDataset(txt='./img_path.txt',transform=data_transforms)
    train_loader = DataLoader(train_data, shuffle=True, num_workers=20,batch_size=BATCH_SIZE)

     

    模型介绍:

    由Encoder+LSTM和Decoder这两部分组成 

    具体代码:

    class EncoderMUG2d_LSTM(nn.Module):
        def __init__(self, input_nc=3, encode_dim=1024, lstm_hidden_size=1024, seq_len=SEQ_SIZE, num_lstm_layers=1, bidirectional=False):
            super(EncoderMUG2d_LSTM, self).__init__()
            self.seq_len = seq_len
            self.num_directions = 2 if bidirectional else 1
            self.num_lstm_layers = num_lstm_layers
            self.lstm_hidden_size = lstm_hidden_size
            #3*128*128
            self.encoder = nn.Sequential(
                nn.Conv2d(input_nc, 32, 4,2,1), # 32*64*64
                nn.BatchNorm2d(32),
                nn.LeakyReLU(0.2, inplace=True),
                #32*63*63
                nn.Conv2d(32, 64, 4, 2, 1), # 64*32*32
                nn.BatchNorm2d(64),
                nn.LeakyReLU(0.2, inplace=True),
                #64*31*31
                nn.Conv2d(64, 128, 4, 2, 1), # 128*16*16
                nn.BatchNorm2d(128),
                nn.LeakyReLU(0.2, inplace=True),
    
                nn.Conv2d(128, 256, 4, 2, 1), # 256*8*8
                nn.BatchNorm2d(256),
                nn.LeakyReLU(0.2, inplace=True),
    
                nn.Conv2d(256, 512, 4, 2, 1), # 512*4*4
                nn.BatchNorm2d(512),
                nn.LeakyReLU(0.2, inplace=True),
    
                nn.Conv2d(512, 512, 4, 2, 1),  # 512*2*2 
                nn.BatchNorm2d(512),
                nn.LeakyReLU(0.2, inplace=True),
    
                nn.Conv2d(512, 1024, 4, 2, 1),  # 1024*1*1
                nn.BatchNorm2d(1024),
                nn.LeakyReLU(0.2, inplace=True),
    
            )
    
            self.fc = nn.Linear(1024, encode_dim)
            self.lstm = nn.LSTM(encode_dim, encode_dim, batch_first=True)
    
        def init_hidden(self, x):
            batch_size = x.size(0)
            h = x.data.new(
                    self.num_directions * self.num_lstm_layers, batch_size, self.lstm_hidden_size).zero_()
            c = x.data.new(
                    self.num_directions * self.num_lstm_layers, batch_size, self.lstm_hidden_size).zero_()
            return Variable(h), Variable(c)
    
    
        def forward(self, x):
            #x.shape [batchsize,seqsize,3,128,128]
            B = x.size(0)
            x = x.view(B * SEQ_SIZE, 3, 128, 128) #x.shape[batchsize*seqsize,3,128,128]
            # [batchsize*seqsize, 3, 128, 128] -> [batchsize*seqsize, 1024,1,1]
            x = self.encoder(x)
            #[batchsize * seqsize, 1024, 1, 1]-> [batchsize*seqsize, 1024]
            x = x.view(-1, 1024)
            # [batchsize * seqsize, 1024]
            x = self.fc(x)
            # [batchsize , seqsize ,1024]
            x = x.view(-1, SEQ_SIZE, x.size(1))
            h0, c0 = self.init_hidden(x)
            output, (hn,cn) = self.lstm(x,(h0,c0))
            return hn
    
    class DecoderMUG2d(nn.Module):
        def __init__(self, output_nc=3, encode_dim=1024): #output size: 64x64
            super(DecoderMUG2d, self).__init__()
    
            self.project = nn.Sequential(
                nn.Linear(encode_dim, 1024*1*1),
                nn.ReLU(inplace=True)
            )
            self.decoder = nn.Sequential(
                nn.ConvTranspose2d(1024, 512, 4), # 512*4*4
                nn.BatchNorm2d(512),
                nn.ReLU(True),
    
                nn.ConvTranspose2d(512, 256, 4, stride=2), # 256*10*10
                nn.BatchNorm2d(256),
                nn.ReLU(True),
    
                nn.ConvTranspose2d(256, 128, 4), # 128*13*13
                nn.BatchNorm2d(128),
                nn.ReLU(True),
    
                nn.ConvTranspose2d(128, 64, 4,stride=2),  # 64*28*28
                nn.BatchNorm2d(64),
                nn.ReLU(True),
    
                nn.ConvTranspose2d(64, 32, 4),  # 32*31*31
                nn.BatchNorm2d(32),
                nn.ReLU(True),
    
                nn.ConvTranspose2d(32, 16, 4,stride=2),  # 16*64*64
                nn.BatchNorm2d(16),
                nn.ReLU(True),
    
                nn.ConvTranspose2d(16, output_nc, 4, stride=2, padding=1),  # 3*128*128
                nn.Sigmoid(),
            )
        def forward(self, x):
            x = self.project(x)
            x = x.view(-1, 1024, 1, 1)
            decode = self.decoder(x)
            return decode
    
    class net(nn.Module):
        def __init__(self):
            super(net,self).__init__()
            self.n1 = EncoderMUG2d_LSTM()
            self.n2 = DecoderMUG2d()
    
        def forward(self, x):
            output = self.n1(x)
            output = self.n2(output) #B*3*128*128
            return output

     

     

    训练过程: 

    if __name__ == '__main__':
        model = net()
        if torch.cuda.is_available():
            model.cuda()
        optimizer = optim.Adam(model.parameters(), lr=learning_rate)
        loss_func = nn.MSELoss()
    
        inputs, label = next(iter(train_loader))
        
        for epoch in range(10):
            print('epoch {}'.format(epoch + 1))
            train_loss = 0.
            train_acc = 0.
            #count = 1
            for batch_x, batch_y in train_loader:
                inputs, label = Variable(batch_x).cuda(), Variable(batch_y).cuda()
                output = model(inputs)
                loss = loss_func(output, label)/label.shape[0]
                optimizer.zero_grad()
                loss.backward()
                optimizer.step()
            
            print('epoch: {}, Loss: {:.4f}'.format(epoch + 1, loss.data.cpu().numpy()))
    
            if (epoch + 1) % 5 == 0:  # 每 5 次,保存一下解码的图片和原图片
                pic = to_img(output.cpu().data)
                img = to_img(label.cpu().data)
                if not os.path.exists('./conv_autoencoder'):
                    os.mkdir('./conv_autoencoder')
                save_image(pic, './conv_autoencoder/decode_image_{}.png'.format(epoch + 1))
                save_image(img, './conv_autoencoder/raw_image_{}.png'.format(epoch + 1))
            #count = count +1
    
        torch.save(model.state_dict(), PATH_SAVE)
    

     

     

    展开全文
  • LSTM架构基于牛津LSTM教程评估图像序列。 对于每个属性,我们有多个图像和一个标签,分别对应于“销售价格”十分位或其他一些相关任务。 该架构分析每个图像,然后在最后输出类标签,如示意图所示: ![示意图]...
  • 运动图像任务中基于LSTM的EEG分类
  • 通过向LSTM输入结构词来描述图像
  • 为了探究更多网络图像分类的效果,尝试LSTM网络处理,顺便谈一谈对循环神经网络的简单理解。最终效果:7M模型85%准确率,单层网络。对比之间做的CNN效果(7M模型,95%准确率,但存在过拟合问题),文章链接...

    为了探究更多网络图像分类的效果,尝试LSTM网络处理,顺便谈一谈对循环神经网络的简单理解。最终效果:7M模型85%准确率,单层网络。对比之间做的CNN效果(7M模型,95%准确率,但存在过拟合问题),文章链接https://blog.csdn.net/qq_36187544/article/details/90669462(附源代码)

    目录

    项目源码百度云

    循环神经网络粗浅理解

    调参

    tensorboard展示

    源代码


    项目源码百度云

    注:图片都是经过预处理的,统一大小,不然会报错!图像处理文件路径可以参考上面的CNN网络链接

    链接:https://pan.baidu.com/s/1h0pKo5-p-JDPtM-iUs84_Q 
    提取码:j44p 

    models,logs 两个文件夹用于存放模型文件和日志文件,现均为空,带上文件夹让程序可以直接运行
    data 数据文件夹,详细图参考上右图,分为7类,每类下有图片。为了防止数据外泄,只在lh1中放了一张图片,可以查看图片是何样
    setting.py 配置文件
    rnn_train.py 网络训练文件,主文件

    循环神经网络粗浅理解

    百度一搜各种LSTM,RNN详解,这里只简单说一下:

    RNN说白了就是序列化,以28×28图片为例,生成28个CELL,最后对output[28]输出处理一下即可:

    所以,对于RGB彩图,先看代码,再说下原理,网络框架部分代码:

    def rnn_graph(x, rnn_size, out_size, width, height, channel):
        '''
        循环神经网络计算图
        :param x:输入数据
        :param rnn_size:
        :param out_size:
        :param width:
        :param height:
        :return:
        '''
        # 权重及偏置
        w = weight_variable([rnn_size, out_size])
        b = bias_variable([out_size])
        # LSTM
        # rnn_size这里指BasicLSTMCell的num_units,指输出的向量维度
        lstm_cell = tf.nn.rnn_cell.BasicLSTMCell(rnn_size)
        # transpose的作用将(?,32,448,3)形状转为(32,?,448,3),?为batch-size,32为高,448为宽,3为通道数(彩图)
        # 准备划分为32个相同网络,输入序列为(448,3),这样速度较快,逻辑较为符合一般思维
        x = tf.transpose(x, [1,0,2,3])
        # reshape -1 代表自适应,这里按照图像每一列的长度为reshape后的列长度
        x = tf.reshape(x, [-1, channel*width])
        # split默任在第一维即0 dimension进行分割,分割成height份,这里实际指把所有图片向量按对应行号进行重组
        x = tf.split(x, height)
        # 这里RNN会有与输入层相同数量的输出层,我们只需要最后一个输出
        outputs, status = tf.nn.static_rnn(lstm_cell, x, dtype=tf.float32)
        y_conv = tf.add(tf.matmul(outputs[-1], w), b)
        return y_conv

    (32,?,448,3)格式的数据传入网络目的:分为32个cell,每个序列对应448*3,即3色的横向条状序列!

    如果格式转为以竖向条状序列更改可如下,这样做网络将很大:

    # x = tf.transpose(x, [1,0,2,3])
    # x = tf.reshape(x, [-1, channel*width])
    # x = tf.split(x, height)
    x = tf.transpose(x, [2,0,1,3])
    x = tf.reshape(x, [-1, channel*height])
    x = tf.split(x, width)

    如果调整为3个cell,每个原色的图作为一个输入也是同理!


    调参

    1.batch-size,很重要,合适的batch-size才能收敛合适,https://blog.csdn.net/qq_36187544/article/details/90478051

    2.学习率:

    3.序列多少?RNN网络的核心思想之一是前后序列有关,所以考虑一张长方形图片分为横条和竖条效果是不是不一样?后发现基本一样。。。。。。那就采用小序列进行训练,这样可以加快训练速度

    4.RNN中num_units参数,越大学习到的特征越多,准确率提升,相当于增宽神经网络

    5.没有尝试加深网络,单层测试,准确率85%


    tensorboard展示

    数据流图:

    损失和准确率:


    源代码

    rnn_train.py源代码:

    import os
    import tensorflow as tf
    from time import time
    import numpy as np
    from LSTM.setting import batch_size, width, height, rnn_size, out_size, channel, learning_rate, num_epoch
    
    '''
    训练主函数
    tensorboard --logdir=D:\python\LSTM\logs
    '''
    
    def weight_variable(shape, w_alpha=0.01):
        '''
        增加噪音,随机生成权重
        :param shape: 权重形状
        :param w_alpha:随机噪声
        :return:
        '''
        initial = w_alpha * tf.random_normal(shape)
        return tf.Variable(initial)
    def bias_variable(shape, b_alpha=0.1):
        '''
        增加噪音,随机生成偏置项
        :param shape:权重形状
        :param b_alpha:随机噪声
        :return:
        '''
        initial = b_alpha * tf.random_normal(shape)
        return tf.Variable(initial)
    def rnn_graph(x, rnn_size, out_size, width, height, channel):
        '''
        循环神经网络计算图
        :param x:输入数据
        :param rnn_size:
        :param out_size:
        :param width:
        :param height:
        :return:
        '''
        # 权重及偏置
        w = weight_variable([rnn_size, out_size])
        b = bias_variable([out_size])
        # LSTM
        # rnn_size这里指BasicLSTMCell的num_units,指输出的向量维度
        lstm_cell = tf.nn.rnn_cell.BasicLSTMCell(rnn_size)
        # transpose的作用将(?,32,448,3)形状转为(32,?,448,3),?为batch-size,32为高,448为宽,3为通道数(彩图)
        # 准备划分为32个相同网络,输入序列为(448,3),这样速度较快,逻辑较为符合一般思维
        x = tf.transpose(x, [1,0,2,3])
        # reshape -1 代表自适应,这里按照图像每一列的长度为reshape后的列长度
        x = tf.reshape(x, [-1, channel*width])
        # split默任在第一维即0 dimension进行分割,分割成height份,这里实际指把所有图片向量按对应行号进行重组
        x = tf.split(x, height)
        # 这里RNN会有与输入层相同数量的输出层,我们只需要最后一个输出
        outputs, status = tf.nn.static_rnn(lstm_cell, x, dtype=tf.float32)
        y_conv = tf.add(tf.matmul(outputs[-1], w), b)
        return y_conv
    
    def accuracy_graph(y, y_conv):
        '''
        偏差计算图
        :param y:
        :param y_conv:
        :return:
        '''
        correct = tf.equal(tf.argmax(y_conv, 1), tf.argmax(y, 1))
        accuracy = tf.reduce_mean(tf.cast(correct, tf.float32))
        return accuracy
    
    def get_batch(image_list,label_list,img_width,img_height,batch_size,capacity,channel):
        '''
        #通过读取列表来载入批量图片及标签
        :param image_list: 图片路径list
        :param label_list: 标签list
        :param img_width: 图片宽度
        :param img_height: 图片高度
        :param batch_size:
        :param capacity:
        :return:
        '''
        image = tf.cast(image_list,tf.string)
        label = tf.cast(label_list,tf.int32)
        input_queue = tf.train.slice_input_producer([image,label],shuffle=True)
        label = input_queue[1]
        image_contents = tf.read_file(input_queue[0])
    
        image = tf.image.decode_jpeg(image_contents,channels=channel)
        image = tf.cast(image,tf.float32)
        if channel==3:
            image -= [42.79902,42.79902,42.79902] # 减均值
        elif channel == 1:
            image -= 42.79902  # 减均值
        image.set_shape((img_height,img_width,channel))
        image_batch,label_batch = tf.train.batch([image,label],batch_size=batch_size,num_threads=64,capacity=capacity)
        label_batch = tf.reshape(label_batch,[batch_size])
    
        return image_batch,label_batch
    
    def get_file(file_dir):
        '''
        通过文件路径获取图片路径及标签
        :param file_dir: 文件路径
        :return:
        '''
        images = []
        for root,sub_folders,files in os.walk(file_dir):
            for name in files:
                images.append(os.path.join(root,name))
        labels = []
        for label_name in images:
            letter = label_name.split("\\")[-2]
            if letter =="lh1":labels.append(0)
            elif letter =="lh2":labels.append(1)
            elif letter == "lh3":labels.append(2)
            elif letter == "lh4":labels.append(3)
            elif letter == "lh5":labels.append(4)
            elif letter == "lh6":labels.append(5)
            elif letter == "lh7":
                labels.append(6)
    
        print("check for get_file:",images[0],"label is ",labels[0])
        #shuffle
        temp = np.array([images,labels])
        temp = temp.transpose()
        np.random.shuffle(temp)
        image_list = list(temp[:,0])
        label_list = list(temp[:,1])
        label_list = [int(float(i)) for i in label_list]
        return image_list,label_list
    
    #标签格式重构
    def onehot(labels):
        n_sample = len(labels)
        n_class = 7  # max(labels) + 1
        onehot_labels = np.zeros((n_sample,n_class))
        onehot_labels[np.arange(n_sample),labels] = 1
        return onehot_labels
    
    if __name__ == '__main__':
        startTime = time()
        # 按照图片大小申请占位符
        x = tf.placeholder(tf.float32, [None, height, width, channel])
        y = tf.placeholder(tf.float32)
        # rnn模型
        y_conv = rnn_graph(x, rnn_size, out_size, width, height, channel)
        # 独热编码转化
        y_conv_prediction = tf.argmax(y_conv, 1)
        y_real = tf.argmax(y, 1)
        # 优化计算图
        loss = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits=y_conv, labels=y))
        optimizer = tf.train.GradientDescentOptimizer(learning_rate).minimize(loss)
        # 偏差
        accuracy = accuracy_graph(y, y_conv)
        # 自训练图像
        xs, ys = get_file('./data/train1')  # 获取图像列表与标签列表
        image_batch, label_batch = get_batch(xs, ys, img_width=width, img_height=height, batch_size=batch_size, capacity=256,channel=channel)
        # 验证集
        xs_val, ys_val = get_file('./data/test1')  # 获取图像列表与标签列表
        image_val_batch, label_val_batch = get_batch(xs_val, ys_val, img_width=width, img_height=height,batch_size=455, capacity=256,channel=channel)
        # 启动会话.开始训练
        sess = tf.Session()
        sess.run(tf.global_variables_initializer())
        saver = tf.train.Saver()
    
        # 启动线程
        coord = tf.train.Coordinator()  # 使用协调器管理线程
        threads = tf.train.start_queue_runners(coord=coord, sess=sess)
        # 日志记录
        summary_writer = tf.summary.FileWriter('./logs/', graph=sess.graph, flush_secs=15)
        summary_writer2 = tf.summary.FileWriter('./logs/plot2/', flush_secs=15)
        tf.summary.scalar(name='loss_func', tensor=loss)
        tf.summary.scalar(name='accuracy', tensor=accuracy)
        merged_summary_op = tf.summary.merge_all()
    
        step = 0
        acc_rate = 0.98
        epoch_start_time = time()
        for i in range(num_epoch):
            batch_x, batch_y = sess.run([image_batch, label_batch])
            batch_y = onehot(batch_y)
    
            merged_summary,_,loss_show = sess.run([merged_summary_op,optimizer,loss], feed_dict={x: batch_x, y: batch_y})
            summary_writer.add_summary(merged_summary, global_step=i)
    
            if i % (int(7000//batch_size)) == 0:
                batch_x_test, batch_y_test = sess.run([image_val_batch, label_val_batch])
                batch_y_test = onehot(batch_y_test)
                batch_x_test = batch_x_test.reshape([-1, height, width, channel])
                merged_summary_val,acc,prediction_val_out,real_val_out,loss_show = sess.run([merged_summary_op,accuracy,y_conv_prediction,y_real,loss],feed_dict={x: batch_x_test, y: batch_y_test})
                summary_writer2.add_summary(merged_summary_val, global_step=i)
    
                # 输出每个类别正确率
                lh1_right, lh2_right, lh3_right, lh4_right, lh5_right, lh6_right, lh7_right = 0, 0, 0, 0, 0, 0, 0
                lh1_wrong, lh2_wrong, lh3_wrong, lh4_wrong, lh5_wrong, lh6_wrong, lh7_wrong = 0, 0, 0, 0, 0, 0, 0
                for ii in range(len(prediction_val_out)):
                    if prediction_val_out[ii] == real_val_out[ii]:
                        if real_val_out[ii] == 0:
                            lh1_right += 1
                        elif real_val_out[ii] == 1:
                            lh2_right += 1
                        elif real_val_out[ii] == 2:
                            lh3_right += 1
                        elif real_val_out[ii] == 3:
                            lh4_right += 1
                        elif real_val_out[ii] == 4:
                            lh5_right += 1
                        elif real_val_out[ii] == 5:
                            lh6_right += 1
                        elif real_val_out[ii] == 6:
                            lh7_right += 1
                    else:
                        if real_val_out[ii] == 0:
                            lh1_wrong += 1
                        elif real_val_out[ii] == 1:
                            lh2_wrong += 1
                        elif real_val_out[ii] == 2:
                            lh3_wrong += 1
                        elif real_val_out[ii] == 3:
                            lh4_wrong += 1
                        elif real_val_out[ii] == 4:
                            lh5_wrong += 1
                        elif real_val_out[ii] == 5:
                            lh6_wrong += 1
                        elif real_val_out[ii] == 6:
                            lh7_wrong += 1
                print(step, "correct rate :", ((lh1_right) / (lh1_right + lh1_wrong)), ((lh2_right) / (lh2_right + lh2_wrong)),
                      ((lh3_right) / (lh3_right + lh3_wrong)), ((lh4_right) / (lh4_right + lh4_wrong)),
                      ((lh5_right) / (lh5_right + lh5_wrong)), ((lh6_right) / (lh6_right + lh6_wrong)),
                      ((lh7_right) / (lh7_right + lh7_wrong)))
                print(step, "准确的估计准确率为",(((lh1_right) / (lh1_right + lh1_wrong))+((lh2_right) / (lh2_right + lh2_wrong))+
                      ((lh3_right) / (lh3_right + lh3_wrong))+((lh4_right) / (lh4_right + lh4_wrong))+
                      ((lh5_right) / (lh5_right + lh5_wrong))+((lh6_right) / (lh6_right + lh6_wrong))+
                      ((lh7_right) / (lh7_right + lh7_wrong)))/7)
    
    
                epoch_end_time = time()
                print("takes time:",(epoch_end_time-epoch_start_time), ' step:', step, ' accuracy:', acc," loss_fun:",loss_show)
                epoch_start_time = epoch_end_time
                # 偏差满足要求,保存模型
                if acc >= acc_rate:
                    model_path = os.getcwd() + os.sep + '\models\\'+str(acc_rate) + "LSTM.model"
                    saver.save(sess, model_path, global_step=step)
                    break
                if step % 10 == 0 and step != 0:
                    model_path = os.getcwd() + os.sep + '\models\\'  + str(acc_rate)+ "LSTM"+str(step)+".model"
                    print(model_path)
                    saver.save(sess, model_path, global_step=step)
                step += 1
    
        duration = time() - startTime
        print("total takes time:",duration)
        summary_writer.close()
    
        coord.request_stop()  # 通知线程关闭
        coord.join(threads)  # 等其他线程关闭这一函数才返回
    
    
    

     

    展开全文
  • 了解LSTM网络(From http://colah.github.io/posts/2015-08-Understanding-LSTMs/) 递归神经网络简单提要 遗忘门:第一个利用上一次的输出和这次的输入通过sigmoid函数的0~1的取值转化为一个权重矩阵,通过与Ct...
  • 实施StyleNet:使用LSTM生成样式化的图像标题 战队:蔡丽莎,刘德华 介绍 该项目的目的是实现一种图像字幕模型,该模型具有生成风格化字幕(浪漫或有趣)的能力。 我们将基于Microsoft Research Redmond的论文“ ...
  • LSTM RNN

    2018-04-14 16:24:32
    LSTM是神经网络一个相当简单的延伸扩展,而且在过去几年里取得了很多惊人成就。我第一次了解到LSTM时,简直有点目瞪口呆。不知道你能不能从下图中发现LSTM之美。OK,咱们这就开始切入正题。先简单介绍一下神经网络和...
  • LSTM简述

    2019-04-01 19:17:12
    基于 LSTM 的系统可以学习翻译语言、控制机器人、图像分析、文档摘要、语音识别图像识别、手写识别、控制聊天机器人、预测疾病、点击率和股票、合成音乐等等任务。 STM区别于RNN的地方,主要就在...

空空如也

空空如也

1 2 3 4 5 ... 20
收藏数 790
精华内容 316
关键字:

lstm图像