This page introduces a standard implmentation of Bag of Words model for image classification and object recognition.
The released software of our algorithm is a very initial version, which is just a basic Bag of words model, including:
- sift feature extraction via vlfeat
- hierarchical k-means clustering
- vector-quantization coding (hard voting).
- average pooling
- linear svm learning
and a trick which provides a small improvement in performance: flip the training image, double the training set.
So, the performance is about 45%.
There are many techniques to improve the algorithm, including
- more low level features combination: hog, lbp, color-hog/sift/lbp, etc.
- large scale code-book, upto 65536 or 131072
- complex coding algorithm: soft-voting, LLC, LCC, Super-vector, Fisher-Kernel, etc.
- Nonlinear kernel transformation, additive kernel mapping and embedding, multi-kernel learning
- Dimensionality reduction
- large scale learning algorithm, such as liblinear
The last, but the most important one is the context, which is usually just a combination with the detection score. I suggest the Deformable Part Based Model, which has been released for evaluation by Felz. Sorry that our detection algorithm can not be public right now due to the copyright and license problem.
I'm trying to release several versions of our algorithm to show each step of the algorithm, so it will be completed in future.