Standford University - CS231n(Convolutional Neural Networks for Visual Recognition)
Stanford University CS231n: Deep Learning for Computer Vision
✔ reference
YouTube
cs231n 2강 Image classification pipeline - YouTube
Lecture 1 | Introduction to Convolutional Neural Networks for Visual Recognition - YouTube
Doc
https://yganalyst.github.io/dl/cs231n_1
https://yerimoh.github.io/DL206/
https://biology-statistics-programming.tistory.com/53
https://velog.io/@cha-suyeon/CS231n-Lecture-9-%EA%B0%95%EC%9D%98-%EC%9A%94%EC%95%BD
https://velog.io/@fbdp1202/CS231n-%EC%A0%95%EB%A6%AC-9.-CNN-Architectures-rp6rx3zy
https://yerimoh.github.io/DL206/
https://velog.io/@cha-suyeon/CS231n-4%EA%B0%95-%EC%A0%95%EB%A6%AC-Introduction-to-Neural-Networks
Multi-layer Perceptron
(Before) Linear score function $f = Wx$
(Now) 2-layer Neural Network $f=W_2max(0,W_1x)$ → max(0,Wx) = ReLU
“Neural Network” is a very broad term;
hese are more accurately called “fully-connected networks” or sometimes “multi-layer perceptrons” (MLP)
Non parametic approach : Nearest Neighbors 1class → 1classifier
Parametic approach : Neural Network 1class → multi classifier
Q. What if we try to build a neural network without one?
A. We end up with a linear classifier again.
Activation functions
![]() |
Sigmoid
|
![]() |
tanh
|
![]() |
ReLU (default)
|
![]() |
Leaky ReLU
|
![]() |
Maxout
|
![]() |
ELU |
Neural networks
모든 layer들이 연결되어 있으며(Full-connected), 하나의 단일레이어는 단일 연산으로 끝난다
Do not use size of neural network as a regularizer.
neural network의 크기가 regularization의 역할을 하는 것은 아니다.
→ overfitting 방지를 위해 network size를 조절해 작게 만드는 것이 아니라 regularization의 strength를 더 높여줘야한다. 즉, neural network는 regularization을 잘한다는 전제하에서는 크면 클수록 좋다.
def neuron_tick(inputs):
cell_body_sum = np.sum(inputs*self.weights) + self.bias // x
firing_rate = 1.0/(1.0+math.exp(-cell_body_sum)) // sigmoid activation func
return firing_rate
Be very careful with your brain ananlogies!
인공신경망이 실제 우리 두뇌와 유사하다고 말하는 것에는 경계해야 한다.
Biological Neurons:
- Many different types
- Dendrites can perform complex non-linear computations
- Synapses are not a single weight but a complex non-linear dynamical system
Backpropagation
How to compute gradients?
Idea #1 Derive $\triangle wL$ on paper
- Problem 1: Very tedious: Lots of matrix calculus, need lots of paper (계산이 많음)
- Problem 2: What if we want to change loss? (loss function을 바꾸고 싶다면 처음부터 다시 계산해야함) E.g. use softmax instead of SVM? -> Need to re-derive from scratch =(
- Problem 3: Not feasible for very complex models! (복잡한 모델에서는 불가능)
Idea #2(Better Idea) Computational graphs + Backpropagation
- Upstream gradient : 노드의 output에 대한 gradient.
- Local gradient : 해당 노드내에서만 계산되는 gradient.
- Downstream gradient : 노드의 input에 있는 변수에 대한 gradient.
Backpropagation 계산 example
def f(w0, x0, w1, x1, w2) :
// forward pass : compute output
s0 = w0 * x0
s1 = w1 * x1
s2 = s0 + s1
s3 = s2 + w2
L = sigmoid(s3)
// backward pass : compute grads
grad_L = 1.0
grad_s3 = grad_L * (1-L) * L // sigmoid local gradient(dsigmoid/dx) : (1-sigmoid)*sigmoid
grad_w2 = grad_s3 // add gate
grad_s2 = grad_s3 // add gate
grad_s0 = grad_s2 // add gate
grad_s1 = grad_s2 // add gate
grad_w1 = grad_s1 * x1 // mul gate
grad_x1 = grad_s1 * w1 // mul gate
grad_w0 = grad_s0 * x0 // mul gate
grad_x0 = grad_s0 * w0 // mul gate
Patterns in gradient flow
def forward(ctx, x, y) :
ctx.save_for_backware(x, y) // nedd to cash some values for use in backward
z = x * y
return z
def backward(ctx, grad_z) : // upstream gradient
x, y = ctx.saved_tensors
grad_x = y * grad_z // dz/dx * dL/dz = dL/dx
grad_y = x * grad_z // dz/dy * dL/dz = dL/dy
return grad_x, grad_y
What about vector-valued functions?
Q. if 4096 input vector, 4096 output vector, what is the size of the jacobian matrix?
A. 4096 x 4096
Q. What parts of y are affected by one element of x?
A. $x_{n,d}$ affects the whole row $y_n$.
Q. how much does $x_{n,d}$ affect $y_{n,m}$ ?
A. $w_{d,m}$
Summary
(Fully-connected) Neural Networks are stacks of linear functions and nonlinear activation functions;
they have much more representational power than linear classifiers
- backpropagation = recursive application of the chain rule along a computational graph to compute the gradients of all inputs/parameters/intermediates
- implementations maintain a graph structure, where the nodes implement the forward() / backward() API
- forward: compute result of an operation and save any intermediates needed for gradient computation in memory
- backward: apply the chain rule to compute the gradient of the loss function with respect to the inputs
'🤖 ai logbook' 카테고리의 다른 글
[NLP/자연어처리] BERT & GPT & ChatGPT (0) | 2023.07.05 |
---|---|
[NLP/자연어처리] 트랜스포머(Transformer) (0) | 2023.07.04 |
[NLP/자연어처리] seq2seq 인코더-디코더 및 어텐션 모델 (Seq2Seq Encoder-Decoder & Attention Model) (0) | 2023.07.04 |
[NLP/자연어처리] 자연어 처리에서의 순환 신경망 (RNN in Natural Language Processing) (0) | 2023.07.01 |
[NLP/자연어처리] 정보 검색 & 단어 임베딩(Information Retrieval & Word Embedding) (0) | 2023.07.01 |
[NLP/자연어처리] 감정 분석 & 문장에 대한 확률 (Sentiment Classification & Probabilities to Sentences) (0) | 2023.06.29 |
[NLP/자연어처리] 언어 모델에서의 나이브베이즈 (Naive Bayes as a Language Model) (1) | 2023.06.28 |
[NLP/자연어처리] 언어 모델링(Language Modeling) (0) | 2023.06.28 |