CS131 Notebook
A individual notebook for cs131: computer vision Chapter1-4(relating to pdf 2, 4 ,5, 6)
More blogs or fun,see xiaoxin83121
Chapter1: Pixels and Filters
Color: The result of interaction between physical light in the environment and our visual system. **
It’s a **psychological propety of our visual experiences, NOT physical property.
Color Space
- RGB Cubic color space
- HSV(Hue, Saturation, Value: 色调,饱和度,明度): cone
img = cv2.imread("./test.jpg") # BGR==0:b, 1:g, 2:r
img_hsv = cv2.cvtColor(img, cv2.COLOR_BGR2HSV)
img_grey = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
img_b = img[..., 0]
img_g = img[..., 1]
img_r = img[..., 2]
cv2.imshow("Image_B", img_b)
cv2.imshow("Image_G", img_g)
cv2.imshow("Image_R", img_r)
cv2.imshow("Image_BGR", img)
Image sampling and quantization
- Resolution: sampling parameter, defined in dots per inch(DPI) or spatial pixel density.
- An image contains discrete number of pixels: a matrix or a set of matrix(r,g,b,etc.)
Image histograms
provide the frequency of the brightness(intensity) value.
img = cv2.imread('./test.jpg')
img = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
h = np.zeros(255)
for row in range(img.shape[0]):
for col in range(img.shape[1]):
h[img[row, col]] += 1
Linear systems(filter)
Filtering: Forming a new image whose pixel values are transformed from original pixel values. (两个矩阵之间的线性变换)
详细内容可以查看信号与系统中关于Linear shift invariant system的详细描述,这里截取自我个人的note
def convolution_filter(stride=1):
# manually convolution and see how it changes
img = cv2.imread('./test_high.jpg')
img = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
# generate kernel
kernel = np.ones((3, 3)) / 9
# kernel = np.array([[0, 1, 2], [0, 1, 2], [0, 1, 2]]) / 9
# generate output feature map
output = np.zeros_like(img)
time_before = time.time()
for row in range(0, img.shape[0]-2, stride):
for col in range(0, img.shape[1]-2, stride):
mat = img[row:row+3, col:col+3] * kernel
val = np.sum(mat.flatten())
output[int(row/stride) + 1][int(col/stride) + 1] = val
print('error happened when row={},col={}'.format(row, col))
time_after = time.time()
print("time consume={}".format(time_after- time_before))
cv2.imshow("origin", img)
cv2.imshow("convolution", output)
右图变糊了一点,实际测试[[0,1,2],[0,1,2],[0,1,2]]/9的卷积核差距不大,stride的调整也不会对整体观感有明显影响, origin file和convolution的结果之间的difference(Difference of Gaussian, DoG,chapter-4会使用到)的结果如下:
Chapter2: edges
Edges typically occur on the bondary between two different regions in an image.
edge detection
Identify sudden changes(discontinuities) in an image. with Good detection, Good localization, Single response
Image Gradient
- 1D discrete derivative filters
As we all know,
backward filter: $f(x)-f(x-1)=f’(x)$, with a vector [0, 1, -1]
forward: $f(x)-f(x+1)=f’(x)$, with [-1, 1, 0]
central: $f(x+1)-f(x-1)=f’(x)$, with [1, 0, -1 ]
vector means : [f(x+1), f(x), f(x-1)], is that clear? - 2D discrete derivate filters:
Left kernel means gradient in x-axis, result as below:
Right kernel means gradient in y-axis, result as below:
Other kernel like: $[[1, 0, -1], [2, 0, -2], [1, 0, -1]]=[1, 2, 1]^{T}[1, 0, -1]$. 为高斯核与梯度核的结合体(x, y derivatives of Gaussian)。
Canny edge detection
- Filter image with x,y derivatives of Gaussian
- Find magnitude and orientation of gradient
- Non-maximum suppression(Single response)
- Define low and high thresholding
edge_output = cv2.Canny(img, 50, 100) # wheels of opencv
Hough transform
$y=ax+b$,This is the simplest statement of line with certain $a$ and $b$,
But we can also get $b = -ax+y$ with certain $x$ and $y$
for $(x_{1}, y_{1})$ and $(x_{2},y_{2})$, there will be two lines in $a\&b$ space with intersection of $(a^{‘},b^{‘})$. Convert it to $x\&y$ space, line is determined.
img = cv2.imread('./test_ver_hor.jpg')
img_grey = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
edges = cv2.Canny(img_grey, 50, 100)
lines = cv2.HoughLinesP(edges, 1, np.pi/180, 30, minLineLength=60, maxLineGap=5)
lines = lines[:, 0, :]
for x1, y1, x2, y2 in lines: # 画出红线不能在grey上画噢
cv2.line(img, (x1, y1), (x2, y2), (0, 0, 255), 1)
cv2.imshow("Hough", img)
Chapter3:RANSAC and feature detectors
RANSAC: RANdom SAmple Consensus
inliers: 内点,在某个拟合区域内的样本点;
outliers: 外点,在拟合区域外的样本点,干扰项;
核心思想:不断随机生成拟合区域样本,选择拟合区域内内点最多的样本;投票策略。 eg: 拟合直线,随机选取两点定一条直线,距离这条直线距离d以内的点都为inliers;
Local invariant features
和SIFT类似, 在Chapter-4继续
Harris Detector
假设$u$和$v$是$x$和$y$方向上的移动距离,那么intensity difference可以量化为$I(x+u, y+v) - I(x, y)$
So,we can get Harris Detector Formulation: $E(u,v)=\sum_{x,y}w(x,y)[I(x+u, y+v) - I(x, y)]^{2}$
$w$ is windows function, belowing two are often used:
By taylor expansion, we can get $I(x+u, y+v) - I(x, y) \approx I_{x}u+I_{y}v$
So, the Harris Detector Fomulation can be written as:
\(E(u, v) \approx
u & v
\end{bmatrix} \space \sum_{x,y}w(x, y)
I_{x}^{2} & I_{x}I_{y} \\
I_{x}I_{y} & I_{y}^{2}
\end{bmatrix} \space
u \\
=w(x, y) \space R^{-1}
\sum{I_{x}^{2}} & \sum{I_{x}I_{y}} \\
\sum{I_{x}I_{y}} & \sum{I_{y}^{2}}
\end{bmatrix} R, as \space R=
u \\
\(M =
\sum{I_{x}^{2}} & \sum{I_{x}I_{y}} \\
\sum{I_{x}I_{y}} & \sum{I_{y}^{2}}
\(M = R^{-1}
\lambda_{1} & 0\\
0 & \lambda_{2}
\end{bmatrix} R\)
经过转化后,$E(u, v)$随着$u$和$v$变动的幅度就由矩阵$M$中的参数$\lambda_{1}$与$\lambda_{2}$来决定,若$\lambda_{1}$大,那么$E(u, v)$随着$u$变动幅度也大……
因此当$\lambda_{1}»\lambda_{2}$或者$\lambda_{2}»\lambda_{1}$, 判定为edge;
$\lambda_{1} \approx \lambda_{2}$并且两个值都很大,判定为corner;
记$\theta=det(M)-\alpha trace(M)^{2}=\lambda_{1}\lambda_{2}-\alpha(\lambda_{1}+\lambda_{2})^{2}, \alpha=0.04 \space to \space 0.06$
Wonderful! Now, Let’s summarize the Harris Detection:
- calculate Image derivatives, $I_{x}$ and $I_{y}$
- calculate square of derivatives, $I_{x}I_{y}$, $I_{x}^{2}$ and $I_{y}^{2}$
- apply Gaussian filter or other windows function
- calculate $\theta$ , get corner collection
- apply Non-maximum suppression
Chapter4: Feature Descriptors
After extract Ket point in Chapter3, There is another question: What will happen if scale or orientation changes? How to match the same key points in another image with different scales and orientation?
Scale invariant detection:
Find local Maximum of :
- Harris corner detector in Space
- Laplacian in Scale
Find local Maximum of: Difference of Gaussian filter in Space and Scale Algorithm are as follows: For a SIFT descriptor,
- rotate image gradient in a calculated $\theta$; $L(x,y)$为Gaussian Smoothed Image的pixel value.
$\theta(x,y)=tan^{-1}(\frac{L(x, y+1)-L(x, y-1)}{L(x+1,y)-L(x-1,y)})$
- split SIFT descritor as $4 \times 4$ histogram array, in each histogram, split 8 orientation bins .
- use the array with $4 \times 4 \times 8$ to match.
HOG: Histogram of Oriented Gradients
What’s More, Resizing, segmentation, and cluster , See cs131_2