Pushing by big data and deep convolutional neural network (CNN), the performance of face recognition is
becoming comparable to human. Using private large scale training datasets, several groups achieve very
high performance on LFW, i.e., 97% to 99%. While there are many open source implementations of CNN, none
of large scale face dataset is publicly available. The current situation in the field of face recognition
is that data is more important than algorithm. To solve this problem, we propose a semi-automatical
way to collect face images from Internet and build a large scale dataset containing 10,575 subjects
and 494,414 images, called CASIA-WebFace. To the best of our knowledge, the size of this dataset rank second
in the literature, only smaller than the private dataset of Facebook (SCF). We encourage those data-consuming
methods training on this dataset and reporting performance on LFW.
The statistics of the proposed CASIA-WebFace dataset is shown in Table 1. Except for Facebook's SFC dataset,
the scale of CASIA-WebFace has the largest scale. For users' privacy issue, maybe SFC will never be open
to research community. The features of Microsoft's WDRef dataset was publicly available from 2012 but it
is inflexible for advanced researches. Among the datasets listed in the table, CASIA-WebFace+LFW is the
most suitable combination for large scale face recognition in the wild. If you feel the accuracy of LFW has
been saturated by the current state-of-the-art method.
BLUFR is a more challenging protocol to report your results.
Table 1. The information of CASIA-WebFace and comparison to other large
scale face datasets.
||Public (feature only)
||Public (partial annotated)
Publication and Results:
To illustrate the quality of CASIA-WebFace, we train a deep CNN on it and compare its accuracy to state-of-the-art methods, such as,
DeepFace and DeepID2. You can refer the following technical report for details.
♦ Dong Yi, Zhen Lei, Shengcai Liao and Stan Z. Li, ¡°Learning Face Representation from Scratch¡±.
arXiv preprint arXiv:1411.7923. 2014. (pdf)
The above reference should be cited in all documents and papers that report experimental results based on the CASIA WebFace database.
To apply for the database, please follow the steps below:
- Download and print the document Agreement
for using CASIA WebFace database
- Sign the agreement (The agreement must be signed by the director or the delegate of the deparmart of university. Personal applicant is not acceptable.)
- Send the agreement to firstname.lastname@example.org
- Check your email to find a login account
and a password of our website after one day, if your
application has been approved.
- Download the CASIA WebFace database
from our website with the authorized account within 48
Copyright Note and Contacts:
The database is released for research and educational purposes. We hold no liability for any undesirable consequences of
using the database. All rights of the CASIA WebFace database are reserved.
 LFW, http://vis-www.cs.umass.edu/lfw/
 D. Chen, X. Cao, L. Wang, F. Wen, and J. Sun. ¡°Bayesian face revisited: A joint formulation¡±. In ECCV 2012,
pages 566¨C579. Springer, 2012.
 Y. Sun, X. Wang, and X. Tang. ¡°Deep learning face representation by joint identification-verification¡±. arXiv preprint
 Y. Taigman, M. Yang, M. Ranzato, and L. Wolf. ¡°Deepface: Closing the gap to human-level performance in face verification¡±.
In Computer Vision and Pattern Recognition (CVPR), 2014 IEEE Conference on, pages 1701¨C1708. IEEE, 2014.
 CARC, http://bcsiriuschen.github.io/CARC/