台灣資料科學愛好者年會系列活動筆記
- http://dsc.kktix.cc/events/video-signal
- http://datasci.tw/event/vision_and_learning/
- Slides: http://www.slideshare.net/tw_dsconf/ss-61255961
【判讀】電腦視覺簡介
Real Cases in Computer Vision
- Character Recognition (LeNet)
- Microsoft PhotoSynth
- Video Reenactment
- Auto Driving
- Autonomous Cars - NVIDIA Drive PX2
- Object class recognition
- Semantic Segmentation
- 分辨出哪裡是車子可以開的地方
- Radar
- 用雷射去掃周邊的環境,可以很快速的去辨認,但跟電腦視覺比較無關。
- 電子後照鏡
- 解決視線死角問題
- Grandma rides a Tesla
- Autonomous Cars - NVIDIA Drive PX2
- Trip Wire
- Loitering
- People Count
- Speed Test
- 不用都卜勒雷達算,直接用影像計算。
- 不小心歪掉就不準了,所以大家知道怎麼躲這種測速了吧 (XDD
- Smart Daily
- 用監視器的影像辨認人臉打卡。
- Smart Fast Forward (Skywatch 的產品)
- 用影像辨識來判斷農舍監視器畫面中哪些時間是有人的,主要是用來定期追蹤是否有記得噴灑農藥。
- Structure from motion
- 3D Reconstruction
- Person tracking
- Face detection
Relationship to Data Science?
- Rich info, lots of data (in terms of bits)
- Unstructured, usually without much context / semantics
- Difficult to process and query
- We are generating them every day
- 要變成人類歷史的一部份,轉化成可搜尋的話,是個問題。
A Brief History of Computer Vision
- 1966, Marvin Minsky
- 50 年過後,我們還沒完全解決這個問題。
- 1960's: Interpretation of Synthetic Worlds
- Larry Roberts (Father of Computer Vision)
- 1970's: Some progress on interpreting selected images
- 1980's: AI Winter ... back to basics
- 1984: Perceptual Organization and Visual Recognition, David Lowe
- Blending
- Shape from shading
- 用三角函數找出反光的角度建模
- Edge Detection
- From Science to Engineering
- 1990's: structure, segmentation and face recognition
- 2000's: more object classes, computational photography, video processing
- 重新對焦的照相機
- Texture Sythesis
- 2010's: Deep Learning is Back!!
- AlexNet NIPS 2012
- DeepFace CVPR 2014
- DeepPose CVPR 2014
- Show, Attend and Tell ICML 2015
Basic parts of Computer Vision
Reference Books
- "Multiple View Geometry in Computer Vision", Richard Hartley and Andrew Zisserman
- A good book to get started on camera geometry
- More math heavry but very old school
- "Computer Vision: Algorithms and Applications", Richard Szeliski
- More balanced mix between math and application
- Freely available online.
Image Formation and 2D Image Processing
- Image formation
- 照相原理:散射會造成無法成像,所以透過針孔(作為 barrier),使其成像。
- 缺點
- 光線不足,所以很暗
- 針孔太大的話,成像會變模糊,所以加上透鏡輔助。
- Circle of Confusion
- 有散景表示你的鏡頭光圈夠大,代表你是有錢人。 XDD
- Circle of Confusion
- 缺點
- Modeling Projection
- The coordinate system
- Homogeneous Coordinations
- 3D 轉 2D
- 4D 轉 3D
- Homogeneous Coordinations
- The coordinate system
- Projection equations
- Camera parameters
- 外部參數(extrinsics)
- 內部參數(intrinsics)
- http://ai.stanford.edu/~saumitro/projektiv/ 可以透過這個網址來瞭解外部參數和內部參數實際上的影響
- Distortion (扭曲)
- Types
- Pin Cushion Distortion(針包)
- Barrel Distortion (木桶)
- Camera Calibration (攝影機校正)
- 使用時機:把扭曲移除、改變照片的角度、要辨認轉了角度的物件畫面 (Low Level Projection)
- Types
- Tilt-shift
- Digital Color Images
- Bayer Filter
- 人對綠色比較敏感,對藍色比較不敏感。
- 彩色的照片是 3 個黑白的 RGB 疊加起來
- Many early algorithms use greyscale instead of color images, Why?
- 早期只有灰階照片
- 彩色會有偏差
- Image Filtering
- Sliding Window
- Sharpening filter (Unsharp Mask)
- Vertical Edge
- Horizontal Edge
- Bayer Filter
- Digital Color Images
- 照相原理:散射會造成無法成像,所以透過針孔(作為 barrier),使其成像。
Epipolar geomerty and stereo matching
- Recovering structure from a single view
- Intrinsic ambiguity of the mapping from 3D to image (2D)
- 2D 是無法直接確定物體距離與深度的,必須用兩個眼睛來看,三角定位。
- Epipolar geomerty
- Parallel Images Plane
- Forward translation
- Epipolar line
- The "Vertigo" Effect
- Epipolar Constraint (F)
- Estimating F
- The Eight-Point Algorithm
- Estimating F
- Fundamental Matrix 很重要!
- Rectification
- Your basic stereo algorithm
- Triangulation
- Depth Map Results
- Active stereo with structured light
- Data Acquisition
Structure from motion and tracking
- Finding Path Through the World's Photos
- Pose Estimation
- Structure from motion
- Tracking
- 找特徵點去追蹤,然後解出結構。
- Tracking
Stitching and computational photography
如何把一堆照片合起來變成一張大照片
- Image Mosaics
- Recognizing Panormas
- De-Ghosting
- Cutout-based de-ghosting
- Cutout-based compositing
- Photomontage
- 可以把好幾張裏面有不同人閉眼的照片合成一張沒有人閉眼的照片。
- Poisson Image Editing
- Possion Equation: 微分、微分、再積分
- 照片合成特效
- Seamless Poisson cloning
- Face Cloning
- Texture Swapping
- Cutout-based de-ghosting
- Interactive Mobile Panorama
- High Dynamic Range Imaging (HDR)
- The real word is high dynamic range
- Typical cameras have limited dynamic range
- Solution: Merge multiple exposures
- Typical cameras have limited dynamic range
- Varying Exposure
- Tone Mapping
- Simple Global Operator
- The real word is high dynamic range
- Interactive Local Adjustment of Tonal Values
- Tonal (色調) Manipulation
- Constraint Propagation
- Touch-Tone: Point-and-Swipe Image Editing
Visual Recognition and Query
- 1989
- MNIST, Backpropagation applied to handwritten zip code recognition
- Character Recognition (LeNet)
- 1998, Neural Network-Based Face Detection
- 1999, SIFT (Scale Invariant Feature Transform)
- Object Recognition from Local Scale-Invariant Features, Lowe, ICCV 1999.
- No more sliding windows (interest points)
- Better features (use more computation)
- 找出來的特徵點會是一個球,而不是邊邊角角。
- Better Descriptor
- Image gradients => Keypoint descriptor
- Truncated normalization (globally)
- 高維度的球
- What worked
- Object instance recognition
- Panaroma
- What failed?
- 無法認東西
- 2001, Rapid Object Detection using a Boosted Cascade of Simple Features, Viola and Jones
- Why did it work?
- Simple Features (Haar wavelets)
- 假設光線都是從上打下來,直接去認眼睛和鼻子的陰影,覺得有可能的保留,沒可能的就丟掉,所以速度很快。
- Why did it fail?
- 側面就無法 work
- Why did it work?
- 2003, Constellatioin model (redux) (related to SIFT)
- Object Class Recognition by Unsupervised Scale-Invariant Learning
- 2005, HOG (Histograms of oriented gradients) (related to SIFT)
- Normalize locally not globally
- Why worked?
- Hard negative mining
- Computers are fast enought
- What failed?
- 無法認出運動中的人,必須要站著。
- 2007, Pascal VOC
- The PASCAL Visual Object Classes (VOC) Challenge
- 只有 20 個分類
- 2008, DPM (Deformable parts model)
- Object Detection with Discriminatively Trained Part Based Model
- Star-structure
- 2009, Caltech Pedestrian
- 2009, ImageNet
- ImageNet, A Large-Scale Hierarchical Image Database
- 2010, SUN
- SUN Database: Large-scale Scene Recognition from Abbey to Zoo
- MS COCO
- over 77,000 worker hours (8+ years)
- 2012 DNNs
- GPUs + Data
- Classification vs Deteciton
- Detection need to know the position of the target object
- CNN, RNN
- Why it fails
- 找不到位置的話就很難去判斷
- Neural Networks are easily fooled
- 會把看起來完全不相關的雜訊誤判成某些物件
- Neural Networks are easily fooled: High Confidence Predictions for Unrecognizable Images
- PANDA: Pose Aligned Networks for Deep Attribute Modeling
- DeepFace: closing the gap to human-Level performance in fac verification
- 會把看起來完全不相關的雜訊誤判成某些物件
- Additional Challenges
- Detecgtion in context (with common sense)
- 加入一些常識的判斷,例如:人在普通情況下不可能在天上飛之類的等等
- Model awareness
- Training time (when dataset is incrementally updated)
- 每個公司都用大量的電腦去運算,不僅耗時,也蠻浪費電的。
- More science?
- 目前比較像是大量嘗試去找出方法,不太有系統且有科學性。
- Detecgtion in context (with common sense)
【索引】多媒體檢索
Search By Image Examples
- Still very much an open problem
- Most commercial applications use a mixture of algorithms
- 沒有一種演算法可以完全解決這個問題
- Google Goggles in action
- Text => OCR
- Landmarks, Books, Artwork, Wine, Logos => SIFT
- Contact Info
- TinEye
- 以圖找圖
- Instance Recognition
- Search Structure
- Possible Solutions
- Find approximate words
- Approximate nearest neighbour (ANN)
- 維度比較高,所以速度比較慢
- Find lower dimensional spae to split the data
- 找 2D 的的資料,雖然沒那麼準確,但速度會比較快。
- Scalable Recognition with a Vocabulary Tree
- Find approximate words
【加速】圖形處理器與深度學習 (GPU and Computation)
Parallel Processing and GPU
Parallel Computing Goals
- To slove your problem in less time
- 平行化去處理
- In order to parallelize a problem
- 要去看哪邊有關聯性,並確定處理這些關聯性對演算法的影響。
Types of Parallelism
- Multiple Programs
- Multi-tasking
- Multi-threading
- Single Program
- Instruction-levl parallelism
- Multiple instructions in a serial program get excuted simultaneously
- Data-level parallelism
- Single Instruction, Multiple Data processing model (SIMD)
- Instruction-levl parallelism
- Amdahl's Law
- Named after computer architect Gene Amdahl
- Speedup of a parallel computer is limited by the amount of serial work
- Resource Management
- 哲學家晚餐問題
GPU Applications
- Real-time rendering. e.g. Game
- Movie Effect
GPUs Today
- GPUs are becoming more programmable
- GPUs now support 32/64 bit floating points numbers
- GPUs have higher memory bandwidth than CPUs
NVIDIA CUDA
- Compute Unified Device Architecture
- CUDA Workflow
- Get a CUDA-enabled GPU
- Write C/C++ like code (*.cu)
- Compile with CUDA compiler (nvcc)
- Generated PTX code ("Parallel Thread Execution")
- Applications auto-magically run on GPUs
- Many many parallel threads
- CUDA driver translate PTX code into hardware.
- CUDA Overview
之前學 CUDA 時收集的一份不錯的 CUDA 教學系列文:Nice Series of CUDA Tutorials on ptt.cc
Frameworks and Libraries
- MATLAB
- BLAS Library (Basic Linear Algebra Subprograms)
- 和 Fortran 同年代的產物
- Processor vendors implement their BLAS library
- e.g., Intel MKL (Math Kernel Library)
- cuBLAS - CUDA version, very fast
- NVIDIA Thrust Library
- A little like C++ STL library for CUDA
- Very few lines of code for vector manipulation
- Fast implementation of parallel primitives
- reduce
- mapreduce
- scan
- sort
- reduce
- NVIDIA cuDNN
- Deep Neural Network Library for CUDA
- TensorFlow, Caffe, Microsoft CNTK
- Deep Learning Getting Started Advises
- Borrow (steal if you must) a modern GPU
- Use Caffe for your deep learning projects
- Browse through the Caffe Model Zoo and try out the existing (pre-trained) models (AlexNet, R-CNN and GooLeNet
電腦視覺之實作演示
-
Introduct OpenCV by the official tutorials
- Core functionality
- Image processing
- Demos
-
Python, OpenCV, Numpy
- Canny Edge Detection
- Detect unique edges
- 不管是 strong edge 或 weak edge 在經過微分後都會產生一個 peak
- Edge Voting (Use 2 threshold)
- Strong edge: Always accept.
- Weak edge: Accept when connected.
- 是很多後續演算法的基礎
- Detect unique edges
- Histogram
- Canny Edge Detection
- Demo
- OpenCV QR Drive
- QR code Marker Detection
- 1:1:3:1:1 black-white markers at the coners
- How to detect 11311?
- Only need to use raster scan
- Use Otsu algorithm
- Thresholding: leave only white and black
- A binarization algorithm that minimize the weighted intra-class variance algorighm for bimodal distributioin.
- Detect the most bright points
- Dilate
- Equality check
- Threshold
- Dilation and thresholding
- QR code Marker Detection
- Make a little PiBorg which will chase the $1,000 NTD bill.
- The PiBorg (RPi + Motor) aka DoodleBorg
- Camshift (WACV 98)
- Meanshift
- OpenCV: Meanshift and Camshift
- Camshift 比 Meanshift 多了一個 scale 的選項
- OpenCV QR Drive
- Conclusion
- Basic OpenCV functionalities
- OpenCV and image processing
- OpenCV and detection
Share
Donation
如果覺得這篇文章對你有幫助, 除了留言讓我知道外, 或許也可以考慮請我喝杯咖啡, 不論金額多寡我都會非常感激且能鼓勵我繼續寫出對你有幫助的文章。
If this blog post happens to be helpful to you, besides of leaving a reply, you may consider buy me a cup of coffee to support me. It would help me write more articles helpful to you in the future and I would really appreciate it.