本文翻译自http://www.asimovinstitute.org/neural-network-zoo/
其中部分代码源于Keras开源社区
本文不得用于任何形式的商业用途,如果需要转载请与作者:SCP-173 联系,如果发现未经允许复制转载,将保留追求其法律责任的权利。
Keras中文文档地址
新的神经网络架构随时随地都在出现,要时刻保持最新还有点难度。要把所有这些缩略语指代的网络(DCIGN,IiLSTM,DCGAN,知道吗?)都弄清,一开始估计还无从下手。
因此,我决定弄一个“作弊表”。这些图里面话的大多数都是神经网络,可也有一些是完全不同的物种。尽管所有这些架构都是新奇独特的,但当我开始把它们画下来的时候,每种架构的底层关系逐渐清晰。
一个问题是要把它们画成节点图:实际上这并没有展示出它们是如何被使用的。比如说,VAE 看起来跟 AE 差不都,但这两种网络的训练过程其实大不一样。训练好的网络的使用方法就更不同了,因为 VAE 是生成器,在新样本中插入噪音的,而 AE 则仅仅是将它们得到的输入映射到它们“记忆”中最近的训练样本!需要说明的是,这个图谱并没有清晰呈现不同节点内在工作原理(那个留做后话)。
要编一份完全的名单尤其困难,因为新的架构随时都在出现。即使是发表过的架构,有意识地去把它们都找全也有很大麻烦,要不就是有时候会落下一些。因此,虽说这幅图会为你提供一些见解,但可千万别认为这幅图里的内容就是全部了。
图中所描绘的每种架构,我都配上了非常非常简短的说明,希望有用。
# MLP model
from keras.models import Model
from keras.layers import Input, Dense
def mlp(nb_input, hidden_layers):
# nb_input is the shape of input;
# hidden_layers is a list like [200, 200, 30]
# The structrue is described as <input-->[200]-->[200]-->[30]-->output>
nb_hidden = len(hidden_layers)
input = Input(shape=(nb_input,))
mod = input
for i in range(nb_hidden):
mod = Dense(hidden_layers[i])(mod)
model = Model(input=input, output=mod)
return model
# RBM model
from __future__ import division
import numpy as np
from keras import initializations, regularizers, constraints
from keras import backend as K
from keras.layers.core import Layer, Dense
from .backend import random_binomial
import theano
class RBM(Layer):
"""
Restricted Boltzmann Machine (RBM).
"""
# keras.core.Layer part (modified from keras.core.Dense)
# ------------------------------------------------------
def __init__(self, input_dim, hidden_dim, init='glorot_uniform', weights=None, name=None,
W_regularizer=None, bx_regularizer=None, bh_regularizer=None, #activity_regularizer=None,
W_constraint=None, bx_constraint=None, bh_constraint=None):
super(RBM, self).__init__()
self.init = initializations.get(init)
self.input_dim = input_dim
self.hidden_dim = hidden_dim
self.input = K.placeholder(ndim = 2)
self.W = self.init((self.input_dim, self.hidden_dim))
self.bx = K.zeros((self.input_dim))
self.bh = K.zeros((self.hidden_dim))
self.params = [self.W, self.bx, self.bh]
self.regularizers = []
self.W_regularizer = regularizers.get(W_regularizer)
if self.W_regularizer:
self.W_regularizer.set_param(self.W)
self.regularizers.append(self.W_regularizer)
self.bx_regularizer = regularizers.get(bx_regularizer)
if self.bx_regularizer:
self.bx_regularizer.set_param(self.bx)
self.regularizers.append(self.bx_regularizer)
self.bh_regularizer = regularizers.get(bh_regularizer)
if self.bh_regularizer:
self.bh_regularizer.set_param(self.bh)
self.regularizers.append(self.bh_regularizer)
self.W_constraint = constraints.get(W_constraint)
self.bx_constraint = constraints.get(bx_constraint)
self.bh_constraint = constraints.get(bh_constraint)
self.constraints = [self.W_constraint, self.bx_constraint, self.bh_constraint]
if weights is not None:
self.set_weights(weights)
if name is not None:
self.set_name(name)
def set_name(self, name):
self.W.name = '%s_W' % name
self.bx.name = '%s_bx' % name
self.bh.name = '%s_bh' % name
@property
def nb_input(self):
return 1
@property
def nb_output(self):
return 0 # RBM has no output, use get_h_given_x_layer(), get_x_given_h_layer() instead
def get_input(self, train=False):
return self.input
def get_output(self, train=False):
return None # RBM has no output, use get_h_given_x_layer(), get_x_given_h_layer() instead
def get_config(self):
return {"name": self.__class__.__name__,
"input_dim": self.input_dim,
"hidden_dim": self.hidden_dim,
"init": self.init.__name__,
"W_regularizer": self.W_regularizer.get_config() if self.W_regularizer else None,
"bx_regularizer": self.bx_regularizer.get_config() if self.bx_regularizer else None,
"bh_regularizer": self.bh_regularizer.get_config() if self.bh_regularizer else None,
#"activity_regularizer": self.activity_regularizer.get_config() if self.activity_regularizer else None,
"W_constraint": self.W_constraint.get_config() if self.W_constraint else None,
"bx_constraint": self.bx_constraint.get_config() if self.bx_constraint else None,
"bh_constraint": self.bh_constraint.get_config() if self.bh_constraint else None}
# persistence, copied from keras.models.Sequential
def save_weights(self, filepath, overwrite=False):
# Save weights to HDF5
import h5py
import os.path
# if file exists and should not be overwritten
if not overwrite and os.path.isfile(filepath):
import sys
get_input = input
if sys.version_info[:2] <= (2, 7):
get_input = raw_input
overwrite = get_input('[WARNING] %s already exists - overwrite? [y/n]' % (filepath))
while overwrite not in ['y', 'n']:
overwrite = get_input('Enter "y" (overwrite) or "n" (cancel).')
if overwrite == 'n':
return
print('[TIP] Next time specify overwrite=True in save_weights!')
f = h5py.File(filepath, 'w')
weights = self.get_weights()
f.attrs['nb_params'] = len(weights)
for n, param in enumerate(weights):
param_name = 'param_{}'.format(n)
param_dset = f.create_dataset(param_name, param.shape, dtype=param.dtype)
param_dset[:] = param
f.flush()
f.close()
# -------------
# RBM internals
# -------------
def free_energy(self, x):
"""
Compute free energy for Bernoulli RBM, given visible units.
The marginal probability p(x) = sum_h 1/Z exp(-E(x, h)) can be re-arranged to the form
p(x) = 1/Z exp(-F(x)), where the free energy F(x) = -sum_j=1^H log(1 + exp(x^T W[:,j] + bh_j)) - bx^T x,
in case of the Bernoulli RBM energy function.
"""
wx_b = K.dot(x, self.W) + self.bh
hidden_term = K.sum(K.log(1 + K.exp(wx_b)), axis=1)
vbias_term = K.dot(x, self.bx)
return -hidden_term - vbias_term
def sample_h_given_x(self, x):
"""
Draw sample from p(h|x).
For Bernoulli RBM the conditional probability distribution can be derived to be
p(h_j=1|x) = sigmoid(x^T W[:,j] + bh_j).
"""
h_pre = K.dot(x, self.W) + self.bh # pre-sigmoid (used in cross-entropy error calculation for better numerical stability)
h_sigm = K.sigmoid(h_pre) # mean of Bernoulli distribution ('p', prob. of variable taking value 1), sometimes called mean-field value
h_samp = random_binomial(shape=h_sigm.shape, n=1, p=h_sigm)
# random sample
# \hat{h} = 1, if p(h=1|x) > uniform(0, 1)
# 0, otherwise
# pre and sigm are returned to compute cross-entropy
return h_samp, h_pre, h_sigm
def sample_x_given_h(self, h):
"""
Draw sample from p(x|h).
For Bernoulli RBM the conditional probability distribution can be derived to be
p(x_i=1|h) = sigmoid(W[i,:] h + bx_i).
"""
x_pre = K.dot(h, self.W.T) + self.bx # pre-sigmoid (used in cross-entropy error calculation for better numerical stability)
x_sigm = K.sigmoid(x_pre) # mean of Bernoulli distribution ('p', prob. of variable taking value 1), sometimes called mean-field value
x_samp = random_binomial(shape=x_sigm.shape, n=1, p=x_sigm)
# random sample
# \hat{x} = 1, if p(x=1|h) > uniform(0, 1)
# 0, otherwise
# pre and sigm are returned to compute cross-entropy
return x_samp, x_pre, x_sigm
def gibbs_xhx(self, x0):
"""
Perform one step of Gibbs sampling, starting from visible sample.
h1 ~ p(h|x0)
x1 ~ p(x|h1)
"""
h1, h1_pre, h1_sigm = self.sample_h_given_x(x0)
x1, x1_pre, x1_sigm = self.sample_x_given_h(h1)
# pre and sigm are returned to compute cross-entropy
return x1, x1_pre, x1_sigm
def mcmc_chain(self, x, nb_gibbs_steps):
"""
Perform Markov Chain Monte Carlo, run k steps of Gibbs sampling,
starting from visible data, return point estimate at end of chain.
x0 (data) -> h1 -> x1 -> ... -> xk (reconstruction, negative sample)
"""
xi = x
for i in xrange(nb_gibbs_steps):
xi, xi_pre, xi_sigm = self.gibbs_xhx(xi)
x_rec, x_rec_pre, x_rec_sigm = xi, xi_pre, xi_sigm
x_rec = theano.gradient.disconnected_grad(x_rec) # avoid back-propagating gradient through the Gibbs sampling
# this is similar to T.grad(.., consider_constant=[chain_end])
# however, as grad() is called in keras.optimizers.Optimizer,
# we do it here instead to avoid having to change Keras' code
return x_rec, x_rec_pre, x_rec_sigm
def contrastive_divergence_loss(self, nb_gibbs_steps=1):
"""
Compute contrastive divergence loss with k steps of Gibbs sampling (CD-k).
Result is a Theano expression with the form loss = f(x).
"""
def loss(x):
x_rec, _, _ = self.mcmc_chain(x, nb_gibbs_steps)
cd = K.mean(self.free_energy(x)) - K.mean(self.free_energy(x_rec))
return cd
return loss
def reconstruction_loss(self, nb_gibbs_steps=1):
"""
Compute binary cross-entropy between the binary input data and the reconstruction generated by the model.
Result is a Theano expression with the form loss = f(x).
Useful as a rough indication of training progress (see Hinton2010).
Summed over feature dimensions, mean over samples.
"""
def loss(x):
_, pre, _ = self.mcmc_chain(x, nb_gibbs_steps)
# NOTE:
# when computing log(sigmoid(x)) and log(1 - sigmoid(x)) of cross-entropy,
# if x is very big negative, sigmoid(x) will be 0 and log(0) will be nan or -inf
# if x is very big positive, sigmoid(x) will be 1 and log(1-0) will be nan or -inf
# Theano automatically rewrites this kind of expression using log(sigmoid(x)) = -softplus(-x), which
# is more stable numerically
# however, as the sigmoid() function used in the reconstruction is inside a scan() operation, Theano
# doesn't 'see' it and is not able to perform the change; as a work-around we use pre-sigmoid value
# generated inside the scan() and apply the sigmoid here
#
# NOTE:
# not sure how important this is; in most cases seems to work fine using just T.nnet.binary_crossentropy()
# for instance; keras.objectives.binary_crossentropy() simply clips the value entering the log(); and
# this is only used for monitoring, not calculating gradient
cross_entropy_loss = -T.mean(T.sum(x*T.log(T.nnet.sigmoid(pre)) + (1 - x)*T.log(1 - T.nnet.sigmoid(pre)), axis=1))
return cross_entropy_loss
return loss
def free_energy_gap(self, x_train, x_test):
"""
Computes the free energy gap between train and test set, F(x_test) - F(x_train).
In order to avoid overfitting, we cannot directly monitor if the probability of held out data is
starting to decrease, due to the partition function.
We can however compute the ratio p(x_train)/p(x_test), because here the partition functions cancel out.
This ratio should be close to 1, if it is > 1, the model may be overfitting.
The ratio can be compute as,
r = p(x_train)/p(x_test) = exp(-F(x_train) + F(x_test)).
Alternatively, we compute the free energy gap,
gap = F(x_test) - F(x_train),
where F(x) indicates the mean free energy of test data and a representative subset of
training data respectively.
The gap should around 0 normally, but when it starts to grow, the model may be overfitting.
However, even when the gap is growing, the probability of the training data may be growing even faster,
so the probability of the test data may still be improving.
See: Hinton, "A Practical Guide to Training Restricted Boltzmann Machines", UTML TR 2010-003, 2010, section 6.
"""
return T.mean(self.free_energy(x_train)) - T.mean(self.free_energy(x_test))
def get_h_given_x_layer(self, as_initial_layer=False):
"""
Generates a new Dense Layer that computes mean of Bernoulli distribution p(h|x), ie. p(h=1|x).
"""
if as_initial_layer:
layer = Dense(input_dim=self.input_dim, output_dim=self.hidden_dim, activation='sigmoid', weights=[self.W.get_value(), self.bh.get_value()])
else:
layer = Dense(output_dim=self.hidden_dim, activation='sigmoid', weights=[self.W.get_value(), self.bh.get_value()])
return layer
def get_x_given_h_layer(self, as_initial_layer=False):
"""
Generates a new Dense Layer that computes mean of Bernoulli distribution p(x|h), ie. p(x=1|h).
"""
if as_initial_layer:
layer = Dense(input_dim=self.hidden_dim, output_dim=self.input_dim, activation='sigmoid', weights=[self.W.get_value().T, self.bx.get_value()])
else:
layer = Dense(output_dim=self.input_dim, activation='sigmoid', weights=[self.W.get_value().T, self.bx.get_value()])
return layer
# AE model
import keras.backend as K
from keras.layers import Input, Dense, Lambda, Dropout
from keras.layers.noise import GaussianNoise
from keras.models import Model
from keras import regularizers
import numpy as np
def noise_output_shape(input_shape):
return tuple(input_shape)
def gaussian_noise(x, mean=0.0, std=0.1, random_state=1234):
return x + K.random_normal(K.shape(x), mean=mean, std=std, seed=random_state)
def AutoEncoder(input_dim, encoding_dim, add_noise=None, dropout_proba=None, l1=1e-4):
model_input = Input(shape=(input_dim,))
if add_noise is not None:
x = Lambda(add_noise, output_shape=noise_output_shape)(model_input)
else:
x = model_input
if l1 is not None:
encoded = Dense(encoding_dim, activation='relu',
activity_regularizer=regularizers.activity_l1(l1))(x)
else:
encoded = Dense(encoding_dim, activation='relu')(x)
if dropout_proba:
encoded = Dropout(dropout_proba)(encoded)
decoded = Dense(input_dim, activation='sigmoid')(encoded)
AE = Model(input=model_input, output=decoded)
AE.compile(optimizer='adadelta',
loss='binary_crossentropy',
metrics=['accuracy'])
return AE
from __future__ import division
from keras.layers import Input, Dense, Activation, Merge
from keras.models import Model, Sequential
import keras.backend as K
from probability_distributions import GaussianDistribution, BernoulliDistribution, CategoricalDistribution
from custom_batchnormalization import CustomBatchNormalization
class VAE(object):
def __init__(self, in_dim=50, cat_dim=10, hid_dim=300, z_dim=50, alpha=0):
self.in_dim = in_dim
self.cat_dim = cat_dim
self.hid_dim = hid_dim
self.z_dim = z_dim
self.alpha = alpha
self.x_l = Input((self.in_dim, ))
self.x_u = Input((self.in_dim, ))
self.y_l = Input((self.cat_dim, ))
y_u0 = Input((self.cat_dim, ))
y_u1 = Input((self.cat_dim, ))
y_u2 = Input((self.cat_dim, ))
y_u3 = Input((self.cat_dim, ))
y_u4 = Input((self.cat_dim, ))
y_u5 = Input((self.cat_dim, ))
y_u6 = Input((self.cat_dim, ))
y_u7 = Input((self.cat_dim, ))
y_u8 = Input((self.cat_dim, ))
y_u9 = Input((self.cat_dim, ))
self.y_u = [y_u0, y_u1, y_u2, y_u3, y_u4, y_u5, y_u6, y_u7, y_u8, y_u9]
self.z = Input((self.z_dim, ))
###############
# q(z | x, y) #
###############
x_branch = Sequential()
x_branch.add(Dense(self.hid_dim, input_dim=self.in_dim))
x_branch.add(CustomBatchNormalization())
x_branch.add(Activation('softplus'))
y_branch = Sequential()
y_branch.add(Dense(self.hid_dim, input_dim=self.cat_dim))
y_branch.add(CustomBatchNormalization())
y_branch.add(Activation('softplus'))
merged = Sequential([Merge([x_branch, y_branch], mode='concat')])
merged.add(Dense(self.hid_dim))
merged.add(CustomBatchNormalization())
merged.add(Activation('softplus'))
mean = Sequential([merged])
mean.add(Dense(self.hid_dim))
mean.add(CustomBatchNormalization())
mean.add(Activation('softplus'))
mean.add(Dense(self.z_dim))
var = Sequential([merged])
var.add(Dense(self.hid_dim))
var.add(CustomBatchNormalization())
var.add(Activation('softplus'))
var.add(Dense(self.z_dim, activation='softplus'))
self.q_z_xy = GaussianDistribution(self.z, givens=[self.x_l, self.y_l], mean_model=mean, var_model=var)
###############
# p(x | y, z) #
###############
y_branch = Sequential()
y_branch.add(Dense(self.hid_dim, input_dim=self.cat_dim))
y_branch.add(CustomBatchNormalization())
y_branch.add(Activation('softplus'))
z_branch = Sequential()
z_branch.add(Dense(self.hid_dim, input_dim=self.z_dim))
z_branch.add(CustomBatchNormalization())
z_branch.add(Activation('softplus'))
merged = Sequential([Merge([y_branch, z_branch], mode='concat')])
merged.add(Dense(self.hid_dim))
merged.add(CustomBatchNormalization())
merged.add(Activation('softplus'))
mean = Sequential([merged])
mean.add(Dense(self.hid_dim))
mean.add(CustomBatchNormalization())
mean.add(Activation('softplus'))
mean.add(Dense(self.in_dim))
var = Sequential([merged])
var.add(Dense(self.hid_dim))
var.add(CustomBatchNormalization())
var.add(Activation('softplus'))
var.add(Dense(self.in_dim, activation='softplus'))
self.p_x_yz = GaussianDistribution(self.x_l, givens=[self.y_l, self.z], mean_model=mean, var_model=var)
########
# p(y) #
########
self.p_y = CategoricalDistribution(self.y_l)
############
# q(y | x) #
############
inference = Sequential()
inference.add(Dense(self.hid_dim, input_dim=self.in_dim))
inference.add(CustomBatchNormalization())
inference.add(Activation('softplus'))
inference.add(Dense(self.hid_dim))
inference.add(CustomBatchNormalization())
inference.add(Activation('softplus'))
inference.add(Dense(self.cat_dim, activation='softmax'))
self.q_y_x = CategoricalDistribution(self.y_l, givens=[self.x_l], model=inference)
##########################
# sample and reconstruct #
##########################
self.sampling_z = self.q_z_xy.sampling(givens=[self.x_l, self.y_l])
self.reconstruct_x_l = self.p_x_yz.sampling(givens=[self.y_l, self.sampling_z])
def _KL(self, mean, var):
return -1/2*K.mean(K.sum(1+K.log(K.clip(var, K._epsilon, 1/K._epsilon))-mean**2-var, axis=1))
def label_cost(self, y_true, y_false):
###########
# Labeled #
###########
self.mean, self.var = self.q_z_xy.get_params(givens=[self.x_l, self.y_l])
KL = self._KL(self.mean, self.var)
logliklihood = -self.p_x_yz.logliklihood(self.x_l, givens=[self.y_l, self.sampling_z])-self.p_y.logliklihood(self.y_l)
L = KL+logliklihood
L = L+self.alpha*self.q_y_x.logliklihood(self.y_l, givens=[self.x_l])
return L
def cost(self, y_true, y_false):
###########
# Labeled #
###########
self.mean, self.var = self.q_z_xy.get_params(givens=[self.x_l, self.y_l])
KL = self._KL(self.mean, self.var)
logliklihood = -self.p_x_yz.logliklihood(self.x_l, givens=[self.y_l, self.sampling_z])-self.p_y.logliklihood(self.y_l)
L = KL+logliklihood
L = L+self.alpha*self.q_y_x.logliklihood(self.y_l, givens=[self.x_l])
#############
# UnLabeled #
#############
U = 0
# marginalization
for y in self.y_u:
mean, var = self.q_z_xy.get_params(givens=[self.x_u, y])
sampling_z = self.q_z_xy.sampling(givens=[self.x_u, y])
U += self.q_y_x.prob(y, givens=[self.x_u])*(-self.p_x_yz.logliklihood(self.x_u, givens=[y, sampling_z])
-self.p_y.logliklihood(y)
+self._KL(mean, var)
+self.q_y_x.logliklihood(y, givens=[self.x_u])
)
return U+L
def label_training_model(self):
model = Model(input=[self.x_l, self.y_l], output=self.reconstruct_x_l)
return model
def training_model(self):
model = Model(input=[self.x_l, self.y_l, self.x_u]+self.y_u, output=self.reconstruct_x_l)
return model
def encoder(self):
model = Model(input=[self.x_l, self.y_l], output=self.mean)
return model
def decoder(self):
decode = self.p_x_yz.sampling(givens=[self.y_l, self.z])
model = Model(input=[self.y_l, self.z], output=decode)
return model
def classifier(self):
inference = self.q_y_x.get_params(givens=[self.x_l])
model = Model(input=self.x_l, output=inference)
return model
from keras.models import Sequential
from keras.layers import Dense, Dropout, Activation, Flatten
from keras.layers import Convolution2D, MaxPooling2D
def CNN(nb_filters, kernel_size, input_shape, pool_size):
model = Sequential()
model.add(Convolution2D(nb_filters, kernel_size[0], kernel_size[1],
border_mode='valid',
input_shape=input_shape))
model.add(Activation('relu'))
model.add(Convolution2D(nb_filters, kernel_size[0], kernel_size[1]))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=pool_size))
model.add(Flatten())
model.add(Dense(128))
model.add(Activation('relu'))
model.add(Dense(nb_classes))
model.add(Activation('softmax'))
return model
# apply a 3x3 transposed convolution with stride 1x1 and 3 output filters on a 12x12 image:
model = Sequential()
model.add(Deconvolution2D(3, 3, 3, output_shape=(None, 3, 14, 14), border_mode='valid', input_shape=(3, 12, 12)))
# output_shape will be (None, 3, 14, 14)
# apply a 3x3 transposed convolution with stride 2x2 and 3 output filters on a 12x12 image:
model = Sequential()
model.add(Deconvolution2D(3, 3, 3, output_shape=(None, 3, 25, 25), subsample=(2, 2), border_mode='valid', input_shape=(3, 12, 12)))
model.summary()
# output_shape will be (None, 3, 25, 25)
从某种程度上说,这个神经网络的名字有一点欺骗性,它们实际上是VAE,但是在编码和解码中分别有CNN 和 DNN。这些网络尝试在编码的过程中对“特征”作为概率建模,这样一来,它只需要分别“看”猫和狗的独照,就能学会生成一张既有猫又有狗的合照。类似的,你也可以让它把猫狗合照中的狗去掉,如果你很讨厌那只狗的话。Demo显示,这些模型可以学习为图像中复杂的转换进行建模,比如源文件中3D物体光线的改变。这种网络倾向于使用反向传播进行训练。
GAN由不同的网络组成,这些网络是成对的:每两个网络配对工作。任何一对网站都可以组成 GAN(虽然通常由FFs 和 CNN 配对),一个网络的任务是生成内容,另一个负责对内容进行评价。进行鉴别的网络从训练数据或者生成式的网络中获得内容。鉴别网络能正确地预测数据源,随后将会被当成误差生成网络中的一部分。这形成了一种对抗:鉴别器在对生成数据和真实数据进行区分时做得越来越好,生成器也在学习如何不被鉴别器预测到。这种网络能取得良好的效果,部分原因是,即便非常复杂的嘈杂模型最终也都是可预测的,但是它只生成与特征类似的内容,所以很难学会鉴别。GAN 很难训练,因为你不仅需要训练两个网络(况且其中的每一个都有自己的问题),而且二者的动态也需要被平衡。
# Build Generative model ...
nch = 200
g_input = Input(shape=[100])
H = Dense(nch*14*14, init='glorot_normal')(g_input)
H = BatchNormalization(mode=2)(H)
H = Activation('relu')(H)
H = Reshape( [nch, 14, 14] )(H)
H = UpSampling2D(size=(2, 2))(H)
H = Convolution2D(nch/2, 3, 3, border_mode='same', init='glorot_uniform')(H)
H = BatchNormalization(mode=2)(H)
H = Activation('relu')(H)
H = Convolution2D(nch/4, 3, 3, border_mode='same', init='glorot_uniform')(H)
H = BatchNormalization(mode=2)(H)
H = Activation('relu')(H)
H = Convolution2D(1, 1, 1, border_mode='same', init='glorot_uniform')(H)
g_V = Activation('sigmoid')(H)
generator = Model(g_input,g_V)
generator.compile(loss='binary_crossentropy', optimizer=opt)
generator.summary()
# Build Discriminative model ...
d_input = Input(shape=shp)
H = Convolution2D(256, 5, 5, subsample=(2, 2), border_mode = 'same', activation='relu')(d_input)
H = LeakyReLU(0.2)(H)
H = Dropout(dropout_rate)(H)
H = Convolution2D(512, 5, 5, subsample=(2, 2), border_mode = 'same', activation='relu')(H)
H = LeakyReLU(0.2)(H)
H = Dropout(dropout_rate)(H)
H = Flatten()(H)
H = Dense(256)(H)
H = LeakyReLU(0.2)(H)
H = Dropout(dropout_rate)(H)
d_V = Dense(2,activation='softmax')(H)
discriminator = Model(d_input,d_V)
discriminator.compile(loss='categorical_crossentropy', optimizer=dopt)
discriminator.summary()
RNN 是一种包含时间纠缠的 FFNN: 他们不是无主的stateless;他们在通道间是有联系的,通过时间进行连接。神经元不仅从上一层神经网络获得信息,而且可以从自身、从上一个通道中获得信息。也就是说,输入和训练网络的顺序会变得很重要。RNN 有一个很大的问题是梯度消失或爆炸,这取决于所使用的激活函数,在这一过程中,信息会不断地迅速消失,正如极深的FFNN 网络在的信息丢失一样。直观地说,这并不是一个大问题,因为他们只是权重而不是神经元状态,但是,经过多次加权后,权重已经成为了旧信息的存储地,如果权重达到0或者100万,那么此前的状态就没有什么信息意义了。RNN 可以在许多领域得到应用,因为绝大多数形式的数据并不真的拥有可以用序列表示的时间线(比如,声音或者视频)。总的来说,循环网络对于完整的信息来说是一个很好的选择。
LSTM 通过引入关口(gate)和一个精确定义的记忆单元,尝试解决梯度消失或者爆炸的问题。这一概念大部分是从电路学获得的启发,而不是从生物学。每一个神经元都有一个存储单元和三个关口:输入、输出和忽略(forget)。这些关口的功能是通过运行或者禁止流动来保证信息的安全。输入关口决定有多少上一层的信息可以存储到单元中。输出层承担了另一端的工作,决定下一层可以了解到多少这一层的信息。忽略关口初看是一个很奇怪的设计,但是,有时候忽略也是很重要的:如果网络正在学习一本书,并开始新的一章,那么忘掉前几章的一些内容也是很有必要的。LSTM已经被证明可以学习复杂的序列,包括像莎士比亚一样写作,或者创作音乐。需要注意的是,这些关口中的每一个都对前一个神经元中的存储单元赋有权重,所以他们一般会需要更多想资源来运行。
是 LSTM 的一种轻量级变体。它们有一个关口,连线方式也稍微不同:没有输入、输出、遗忘关口,它们有一个更新关口(update gate)。该更新关口既决定来自上个状态的信息保留多少,也决定允许进入多少来自上个层的信息。重置的关口函数很像 LSTM 中遗忘关口函数,但位置稍有不同。GRU 的关口函数总是发出全部状态,它们没有一个输出关口。在大多案例中,它们的职能与 LSTM 很相似。最大的不同就是 GRU 更快、更容易运行(但表达力也更弱)。在实践中,可能彼此之间要做出平衡,当你需要具有更大表达力的大型网络时,你可能要考虑性能收益。在一些案例中,r如果额外的表达力不再需要,GRU 就要比 LSTM 好。
可被理解为 LSTM 的抽象化,并试图将神经网络去黑箱化( un-black-box,让我们洞见里面到底发生了什么。)NTM 中并非直接编码记忆单元到神经元中,里面的记忆是分离的。这种网络试图想将常规数字存储的功效与永久性和神经网络的效率与表达力结合起来。这种网络的思路是有一个可内容寻址的记忆库,神经网络可以直接从中读取并编写。NTM 中的「Turing」来自于图灵完备(Turing complete):基于它所读取的内容读取、编写和改变状态的能力,意味着它能表达一个通用图灵机可表达的一切事情。
是非常深度的 FFNN 网络,有着额外的连接将输入从一层传到后面几层(通常是 2 到 5 层)。DRN 并非是要发现将一些输入(比如一个 5 层网络)映射到输出的解决方案,而是学习将一些输入映射到一些输出 + 输入上。大体上,它在解决方案中增加了一个恒等函数,携带旧的输入作为后面层的新输入。有结果显示,在超过 150 层后,这些网络非常擅长学习模式,这要比常规的 2 到 5 层多得多。然而,有结果证明这些网络本质上只是没有基于具体时间建造的 RNN ,它们总是与没有 关口的 LSTM 相对比。
是另一种不同类型的网络。它不同于其他网络的原因在于它在不同神经元之间有随机连接(即,不是在层之间整齐连接。),而且它们训练方式也不同。在这种网络中,我们先给予输入,向前推送并对神经元更新一段时间,然后随时间观察输出,而不是像其他网络那样输入信息然后反向传播误差。ESN 的输入和输出层有一些轻微的卷积,因为输入层被用于准备网络,输出层作为随时间展开的激活模式的观测器。在训练过程中,只有观测器和隐藏单元之间连接会被改变。
能发现分类问题的最佳解决方案。传统上只能够分类线性可分的数据,比如说发现哪个图像是加菲猫,哪张图片是史努比,不可能有其他输出。在训练过程中,SVM 可被视为在一张图上(2D)标绘所有数据(加菲猫和史努比),并搞清楚如何在这些数据点间画条线。这条线将分割数据,以使得加菲猫在一边,史努比在一边。调整这条线到最佳的方式是边缘位于数据点之间,这条线最大化到两端。分类新数据可通过在这张图上标绘一个点来完成,然后就简单看到这个点位于线的哪边。使用核(kernel)方法,它们可被教授进行 n 维数据的分类。这要在 3D 图上标绘数据点,从而让其可分类史努比、加菲猫、Simon’s cat,甚至分类更多的卡通形象。
你可以在Keras Google group里提问以获得帮助,如果你生活在中国大陆的话,梯子请自备
你也可以在Github issues里提问。在提问之前请确保你阅读过我们的指导
同时,我们也欢迎同学们加我们的QQ群119427073进行讨论(潜水和灌水会被T,入群说明公司/学校-职位/年级)
本文不得用于任何形式的商业用途,如果需要转载请与作者联系,如果发现未经允许复制转载,将保留追求其法律责任的权利。
作者:SCP-173
E-mail :scp173.cool@gmail.com
如果您需要及时得到指导帮助,可以加微信:SCP-173-cool,酌情打赏即可
联系客服