Tensorflow的softmax

第一次接触这个函数的时候，直接给整蒙了，好端端的softmax层不放在inference里，怎么给单独抽出来了？下面就根据tensorflow的官方API，聊一聊这个又长又丑的函数。

然后，我干的第一件事情，就是把官方API的文档给copy过来了，方便后面引用

Computes softmax cross entropybetween logits and labels.
Measures the probability error indiscrete classification tasks in which the classes are mutually exclusive (eachentry is in exactly one class). For example, each CIFAR-10 image is labeledwith one and only one label: an image can be a dog or a truck, but not both.
NOTE: While the classes are mutually exclusive, their probabilities need not be. All that is required is that eachrow of labels is a valid probability distribution. If they are not, the computationof the gradient will be incorrect.
If using exclusive labels (whereinone and only one class is true at a time), see sparse_softmax_cross_entropy_with_logits.
WARNING: This op expects unscaled logits, since it performs a softmax on logits internally for efficiency. Do not call this op with the output of softmax, as it will produce incorrect results.
Logits and labels must have the sameshape [batch_size, num_classes] and the same dtype (eitherfloat32 or float64).
Args:
logits: Unscaled log probabilities.
labels: Each row labels[i] must be avalid probability distribution.
name: A name for the operation(optional).
Returns:
A 1-D Tensor of length batch_size of the same type as logits with the softmax crossentropy loss.

然后就是我个人的理解。

首先看输入logits，它的shape是[batch_size, num_classes] ，一般来讲，就是神经网络最后一层的输入z。

另外一个输入是labels，它的shape也是[batch_size, num_classes]，就是我们神经网络期望的输出。

这个函数的作用就是计算最后一层是softmax层的cross entropy，只不过tensorflow把softmax计算与cross entropy计算放到一起了，用一个函数来实现，用来提高程序的运行速度，原话就是

it performs a softmax on logits internally for efficiency。

开始看到这个函数的时候，第一反应就是，softmax被拉出来单独计算了，那么原网络inference岂不是不完整了？因为最后一层的softmax输出计算没有在inference里进行啊。后来想想，这个貌似对最后的结果正确性和accuracy计算没什么影响，因为最后一层的计算y = softmax（z）不会影响到输出值的大小顺序，主因是softmax是个单调增函数，也就是说，z的大小排序和y的大小排序是一样的，废话不多说，为了更好理解，我上图！

这图明眼人能看出来，摘自那本读了能让人醍醐灌顶的书——一本好书。

这个图里有4个输入，大小顺序3-4-2-1，softmax输出有4个，大小顺序依然是3-4-2-1，只不过被概率化了而已。我们再看一下一般的accuracy计算的方式

correct_pred = tf.equal(tf.argmax(z,1), tf.argmax(y,1))
accuracy = tf.reduce_mean(tf.cast(correct_pred, tf.float32))

从计算方式来看，只要输出的大小顺序没有被改变，这个accuracy的计算就不会受到影响啦！

最后，总结一下我的个人理解，也不一定全对。tensorflow之所以把softmax和cross entropy放到一个函数里计算，就是为了提高运算速度，虽然这样做让很多刚接触tf的同志难以捉摸，但Google的工程师是不会以“使用者的舒服程度”为第一要素设计程序的。

本站仅提供存储服务，所有内容均由用户发布，如发现有害或侵权内容，请点击举报。