version 1

This page introduces a standard implmentation of Bag of Words model for image classification and object recognition.

The released software of our algorithm is a very initial version, which is just a basic Bag of words model, including:

sift feature extraction via vlfeat
hierarchical k-means clustering
vector-quantization coding (hard voting).
average pooling
linear svm learning

and a trick which provides a small improvement in performance: flip the training image, double the training set.
So, the performance is about 45%.
There are many techniques to improve the algorithm, including

more low level features combination: hog, lbp, color-hog/sift/lbp, etc.
large scale code-book, upto 65536 or 131072
complex coding algorithm: soft-voting, LLC, LCC, Super-vector, Fisher-Kernel, etc.
Nonlinear kernel transformation, additive kernel mapping and embedding, multi-kernel learning
Dimensionality reduction
large scale learning algorithm, such as liblinear

The last, but the most important one is the context, which is usually just a combination with the detection score. I suggest the Deformable Part Based Model, which has been released for evaluation by Felz. Sorry that our detection algorithm can not be public right now due to the copyright and license problem.
I'm trying to release several versions of our algorithm to show each step of the algorithm, so it will be completed in future.

Image Classification and Object Recognition