본문 바로가기

Open Lecture Review/Deep Learning for Computer Vision

Lecture 14: Visualizing and Understanding

4개의 다른 first layer를 거친 이미지

① L2 Nearest neighbors in feature/pixel space

② Dimensionality Reduction 4096 차원 -> 2 차원 : PCA, t-SNE, UMAP 

 

 

Maximally Activating Patches

Run many images through the network -> record values of chosen channel -> Visualize image pathes that correspond to maximal activations

 

Which Pixels Matter? Saliency via Occlusion

Make part of the image before feeding to CNN, check how much predicted probabilities change

* computational expensive

 

Which Pixels Matter? Saliency via Backprop

Compute gradient of (unnormalized) class score with respect to image pixels, take absolute value and max over RGB channels. - 기울기 계산

+ Saliency Maps : Segmentation without Supervision (배경없이 물체만 추출해내기)

Use GrabCut on saliency map

 

Intermediate Features via (guided) backprop

Pick a single intermediate neuron, and Compute gradient of neuron value with respect to image pixels.

Images come out nicer if you only backprop positive gradients through each ReLU (guided backprop)

 

 

Visualizing CNN Feature: Gradient Ascent

(Guided) backprop은 neuron이 반응하는 이미지의 부분을 찾는 것이고, Gradient ascent는 neuron을 가장 크게 활성화시키는 가짜 이미지를 생성하는 것이다.

 

 

 

Repeat:

1. Initialize image to zeros
2. Forward image to compute current scores
3. Backprop to get gradient of neuron value with respect to image pixels
4. Make a small update to the image

 

Better regularization : Penalize L2 norm of image; also during optimization periodically

① Gaussian blur image

② Clip pixels with small values to 0
③ Clip pixels with small gradients to 0

 

+ Adding "multi-faceted" visualization gives even nicer results 

(+ more careful regularization, center-bias)

 

 

Adverarial Examples

1. Start from an arbitrary image
2. Pick an arbitrary category
3. Modify the image (via gradient ascent) to maximize the class score
4. Stop when the network is fooled 

 

Feature Inversion

Given a CNN feature vector for an image, find a new image that: 

- Matches the given feature vector

- "looks natural" (image prior regularization)

ex) reconstructing from different layers of VGG-16

오른쪽으로 갈수록 정보 손실

 

DeepDream : Amplify Existing Features

Rather than synthesizing an image to maximize a specific neuron,

instead try to amplify the neuron activations at some layer in the network

Choose an image and a layer in a CNN; repeat:
1. Forward: compute activations at chosen layer
2. Set gradient of chosen layer equal to its activation

3. Backward: Compute gradient on image
4. Update image

 

lower layer -> higher layer :  좀더 패턴이 보이고 구체화 됨.

 

 

Texture Synthesis

Given a sample patch of some texture, can we generate a bigger image of the same texture?

Nearest Neighbor

Generate pixels one at a time in scanline order;

form neighborhood of already generated pixels and copy nearest neighbor from input

Gram Matrix

- Each layer of CNN gives C x H x W tensor of features; H x W grid of C-dimensional vectors.

- Outer product of two C-dimensional vectors gives C x C matrix of elementwise products.

- Average over all HW pairs gives Gram Matrix of shape C x C giving unnormalized covariance

- Efficient to compute; reshape features from C x H x W to F = C x HW 

 

Neural Texture Synthesis

Reconstructing texture from higher layers recovers larger features from the input texture.

 

Texture synthesis ~ Feature reconstruction

 

 

Neural Style Transfer

(-> Using instance Normalization)

내용1 + 스타일2 -> new image
gradient ascent through pre-trained network
different weight & resizing style image

+ Mix style from multiple images by taking a weighted average of Gram matrices

 

Problem : Style transfer requires many forward/backward passes through VGG; very slow!

Solution : Train another neural network to perform style transfer for us!

(1) Train a feedforward network for each style
(2) Use pretrained CNN to compute same losses as before
(3) After training, stylize images using a single forward pass

 

+ One network, Many Styles

Use the same network for multiple styles using conditional instance normalization:

learn separate scale and shift parameters per style.

 

 

더보기

Summary

Many methods for understanding CNN representations

Activations: Nearest neighborsm Dimensionality reduction, maximal patchesm occlusion, CAM

Gradients: Grad-CAM, Saliency maps, class visualization, fooling images, feature inversion

Fun: DeepDream, Style Transfer