Softmax 정리

Softmax

SVM과 더불어 많이 쓰이는 Classifier.

Logistic regression classifier의 multiple class 에 대한 일반화 개념.

각 class의 score를 받아서 normalized class probability들을 리턴한다.

Score를 unnormalized log probabilies for each class로 해석한다.

Hinge loss(SVM애서의)를 cross-entropy loss로 대체한다.

(두식은 같은 식 : i class의 loss)

여기서 fj가 j번째 클래스의 score

Softmax function :

(여기에 -log를 붙인 것이 cross-entropy loss)

_ 확률통계적으로 해석하지만 다음과 같다.

: 주어진 xi와 W의 상황에서 yi 레이블에 부여된 normalized된 확률

참고로 Information theory view로 해석하자면 다음과 같다.

The cross-entropy between a “true” distribution p and an estimated distribution q is defined as:

The Softmax classifier is hence minimizing the cross-entropy between the estimated class probabilities ( q=efyi /∑jefj as seen above) and the “true” distribution, which in this interpretation is the distribution where all probability mass is on the correct class (i.e. p=[0,…1,…,0] contains a single 1 at the yi-th position.).

Practical issues: Numeric stability

실제로 softmax function 을 코딩할 경우

the intermediate terms efyi and ∑jefj may be very large due to the exponentials.

큰 숫자들을 나누는 것은 numerically unstable 하기 때문에, 노말리제이션 트릭을 사용한다.

C의 값은 마음대로 정해도 된다.

(A common choice for C is to set logC=−maxjfj. )

f = np.array([123, 456, 789]) # example with 3 classes and each having large scores

p = np.exp(f) / np.sum(np.exp(f)) # Bad: Numeric problem, potential blowup# instead: first shift the values of f so that the highest number is 0:

f -= np.max(f) # f becomes [-666, -333, 0]

p = np.exp(f) / np.sum(np.exp(f)) # safe to do, gives the correct answer

명칭 헷깔리지 말기 :

SVM은 hinge loss(max-margin loss)를 사용하고, Softmax classifier는 cross-entropy loss를 사용한다.

Softmax classifier는 softmax function에서 그 이름을 따오는데, 이 function은 score들을 총합이 1이 되는 0과 1사이의 값으로 노말라이즈 하는 함수이며, 여기에 cross-entropy loss 까지 적용된 것이 바로 softmax classifier가 되는 것이다.