[vip-help] Inception Score parameters doubt

Guo, Jiaqi guo498 at purdue.edu
Mon Jan 29 13:56:27 EST 2024


Hi Kris,

In addition to Yue's comment, the pytorch implementation of FID with inception_v3 requires the input to be "mini-batches of 3-channel RGB images of shape (3 x H x W), where H and W are expected to be at least 224. The images have to be loaded in to a range of [0, 1] and then normalized using mean = [0.485, 0.456, 0.406] and std = [0.229, 0.224, 0.225]." So yes, you need to transform the input image to range [0,1] then normalize using mean and std from ImageNet.
In fact, this is inherently consistent with the TensorFlow implementation which requires the input to be in range [-1,1]: If you look into the pytorch plementation  (https://github.com/pytorch/vision/blob/3c254fb7af5f8af252c24e89949c54a3461ff0be/torchvision/models/inception.py#L191<https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fpytorch%2Fvision%2Fblob%2F3c254fb7af5f8af252c24e89949c54a3461ff0be%2Ftorchvision%2Fmodels%2Finception.py%23L191&data=05%7C02%7Cvip-help%40ecn.purdue.edu%7Ccf030255ffa6427f6ddf08dc20fbff61%7C4130bd397c53419cb1e58758d6d63f21%7C0%7C0%7C638421513884452845%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C0%7C%7C%7C&sdata=wTII24v%2FCwaEEwx0N2AYuDE9CUtqKWejFdhbXiPAPH8%3D&reserved=0>). You will find they first perform a step of de-normalization using the ImageNet parameters.

Best,
Jiaqi


________________________________
From: vip-help <vip-help-bounces at ecn.purdue.edu> on behalf of Han, Yue <han380 at purdue.edu>
Sent: Monday, January 29, 2024 12:55 PM
To: Gurung, Kris <kgurung at purdue.edu>
Cc: vip-help at ecn.purdue.edu <vip-help at ecn.purdue.edu>
Subject: Re: [vip-help] Inception Score parameters doubt

Hi Kris,

Yes, since FID is a metric used to evaluate the quantity and diversity of your generated image compared to your real training image on the feature space. So the pre-trained inception_v3 here is purely used as feature extractor. And since ImageNet is a diverse dataset contains various of images, so it is very common to use inception_v3 with pre-trained on ImageNet as feature extractor unless the paper specifically indicate it use other pre-train to calculate FID.

There is a paper compares about using pre-trained vs. random weights for calculating FID you might found interesting:
https://ieeexplore.ieee.org/document/9745214<https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fieeexplore.ieee.org%2Fdocument%2F9745214&data=05%7C02%7Cvip-help%40ecn.purdue.edu%7Ccf030255ffa6427f6ddf08dc20fbff61%7C4130bd397c53419cb1e58758d6d63f21%7C0%7C0%7C638421513884452845%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C0%7C%7C%7C&sdata=U532kss5%2FVWtsBBhzCXmAdsqdSyIeV2z4HjzCP1CvdQ%3D&reserved=0>

The answer to second question is yes as well, so if you want to implement standard FID calculation (Which is default in PyTorch, Tensorflow etc and can be compared to others’ work), then use pre-trained on ImageNet and normalize use ImageNet statistics.

If you want a FID score specific to your own work or trying to explore to see if there is any difference between standard FID calculation, you can use your own statistics to do the FID calculation


Best,
*****
Yue Han
Ph.D. Candidate
Research Assistant
Video and Image Processing Laboratory (VIPER)
School of Electrical and Computer Engineering
Purdue University
465 Northwestern Avenue
West Lafayette, IN 47907-2035
U.S.A.
Telephone: +1 765 637 3729
email: han380 at purdue.edu
URL: https://lorenz.ecn.purdue.edu/~han380/
*****

On Jan 29, 2024, at 11:41, Gurung, Kris <kgurung at purdue.edu> wrote:

Some people who received this message don't often get email from kgurung at purdue.edu<mailto:kgurung at purdue.edu>. Learn why this is important<https://aka.ms/LearnAboutSenderIdentification>
Hi,

I am trying to incorporate FID and Inception Score for quantitative evaluation of the pictures generated, and was wondering if the Inception_v3 model could be used for CelebA dataset as well, since it was initially trained on the ImageNet dataset. If it can be used, what mean and standard deviations would be used for normalization of the images? I could only find the following normalization parameters for ImageNet.
<image.png>
Any advice regarding better evaluation metrics, if any, would also be greatly appreciated!

Regards,
Kris
--
vip-help mailing list
vip-help at ecn.purdue.edu<mailto:vip-help at ecn.purdue.edu>
https://engineering.purdue.edu/ECN/mailman/listinfo/vip-help

-------------- next part --------------
An HTML attachment was scrubbed...
URL: </ECN/mailman/archives/vip-help/attachments/20240129/9bec5bf7/attachment-0001.htm>


More information about the vip-help mailing list