Skip to content

An implementation to clean large scale public face dataset

Notifications You must be signed in to change notification settings

jimbojumbo/face-dataset-cleaner

Repository files navigation

face-dataset-cleaner

An implementation to clean large scale public face dataset

This is an unofficial implementation based on the paper A Community Detection Approach to Cleaning Extremely Large Face Database

To do the experiment, first prepare your face-dataset and LFW embedding files using a pre-trained face recognition network.

Use the lfw_far_thresholding.py to determine the similarity threshold between different face images.

Then run the dataset_adjacency_build.py to save the image pair similarity information in csv files, which will then be used in dataset_cleaner.py to build the graphs and do small community cleaning.

A small tool is provided to move original images to a separate folder according to the clean data list.

A first version of cleaned VGGFace2 training and testing image lists can be downloaded at Google Drive

About

An implementation to clean large scale public face dataset

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages