Welcome to the Multi-Attribute Labelled Faces (abbreviated as MALF), which is a large dataset designed for fine-grained evaluation of face detection in the wild. This website provides:

  • descriptions of the dataset, annotations and evaluation rules;
  • how to download the dataset for evaluation;
  • how to submit your results to this website for others to compare with;
  • performance comparison of all algorithms.
  • Descriptions


    The dataset contains 5,250 images with 11,931 annotated faces collected from the Internet.


    Each face contains the following annotations:

  • square bounding box;
  • pose deformation level of yaw, pitch and roll (small, medium, large);
  • 'ignore' flag for faces which are smaller than 20x20 or extremely difficult to recognize (totally 838 faces, account for ~7%);
  • other facial attributes: gender(female, male, unknown), isWearingGlasses, isOccluded and isExaggeratedExpression.
  • For more details about the database and its annotation statistics, please refer to our evaluation paper.

    Go to examples for a quick look at the dataset.


    The dataset has been splitted into two parts, 5,000 test images for evaluation and 250 example images with annotations for finetuning the algorithm and/or adjusting the output bounding box style.

    Evaluation procedures

  • Download the test images (and example images if you need);
  • Run your algorithm on the whole test set, and do all the post-processing (like NMS, bounding box adjustment and so on) properly;
  • Submit the detection results in correct format;
  • Get the evaluation result in two work days, and it's your decision whether or not to save your results on the Results page;
  • (Optional) Use the curve data of various algorithms on the Results page in your work.
  • Performance measurement

  • Detection results are evaluated following the same rules in PASCAL VOC Challenge, with the IoU threshold equals 0.5;
  • Algorithm is evaluated via the following aspects: performance on the whole test set, one-attribute-specific performance, and performacnes on two pre-defined sub-sets 'easy' and 'hard';
  • Performance is measured via TPR-FPPI curve and mean-recall rate.
  • For more details about the evaluation protocol and rules, please refer to our evaluation paper.


    If you use our dataset or evaluation results in your work, please cite our evaluation paper:

    Bin Yang*, Junjie Yan*, Zhen Lei and Stan Z. Li.
    Fine-grained Evaluation on Face Detection in the Wild.
    Proceedings of the 11th IEEE International Conference on Automatic Face and Gesture Recognition Conference and Workshops.

    BibTex entry:

    title={Fine-grained Evaluation on Face Detection in the Wild},
    author={Yang, Bin and Yan, Junjie and Lei, Zhen and Li, Stan Z},
    booktitle={Automatic Face and Gesture Recognition (FG), 11th IEEE International 
    Conference on},


    Junjie Yan [page]
    Bin Yang [page]
    Zhen Lei [page]
    Stan Z. Li (Advisor) [page]


    29/9/15: We add fine-grained results with regard to different views, check it out at Results page!

    13/5/15: MALF is now a public benchmark. New submissions are always welcome!

    13/3/15: Curves data available!

    29/1/15: FG2015 Evaluation finished! 21 state-of-the-art algorithms are evaluated!

    10/9/14: Dataset upgraded!

    13/4/14: This face detection evaluation is part of the evaluations in FG2015.

    Contact us

    Zhen Lei: zlei[at]nlpr.ia.ac.cn
    Bin Yang: yb.derek[at]gmail.com
    Junjie Yan: yanjjie[at]gmail.com