Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

About input size #21

Closed
chenhanch opened this issue Jun 27, 2024 · 4 comments
Closed

About input size #21

chenhanch opened this issue Jun 27, 2024 · 4 comments

Comments

@chenhanch
Copy link

This is a great job and has a positive significance for the development of the IML community.

However, I have noticed that there is some confusion regarding the input image sizes for different methods in this project. For instance, in Trufor, the input image is randomly cropped to 512 × 512 during training, while the entire image, regardless of size, is used during testing. This is why the experimental results in Table 3 are different from those in the original paper.

Additionally, it seems that there is no standardization in current research on image manipulation detection regarding whether input images should be resized and what size they should be resized to. As a result, most comparisons are unfair, and the experimental conclusions are not reliable. This is a matter that deserves serious attention.

@SunnyHaze
Copy link
Contributor

SunnyHaze commented Jun 28, 2024

Hi! Han Chen!

Thank you for recognizing and engaging with our work!


However, I have noticed that there is some confusion regarding the input image sizes for different methods in this project. For instance, in Trufor, the input image is randomly cropped to 512 × 512 during training, while the entire image, regardless of size, is used during testing. This is why the experimental results in Table 3 are different from those in the original paper.

This is a good question. In our experiments with TruFor, we indeed resized images to 512 × 512 for both training and testing, which differs from the original paper. This approach was chosen for computational efficiency since TruFor, although supporting multi-resolution input, experiences a significant increase in FLOPs with larger image sizes.

Method Paper Infer. Time (sec)/ Image Params.(M) 512x512 FLOPS(G) 1024x1024 FLOPS(G)
MVSS-Net ICCV21 & TPAMI22 2.929 147 167 683
PSCC-Net TCSVT22 0.072 3 120 416
HiFi-Net CVPR23 1.512 7 404 3470
TruFor CVPR23 1.231 68 231 1016
IML-ViT ArXiv 0.094 91 136 576

Additionally, it seems that there is no standardization in current research on image manipulation detection regarding whether input images should be resized and what size they should be resized to. As a result, most comparisons are unfair, and the experimental conclusions are not reliable. This is a matter that deserves serious attention.

Regarding your second point, the lack of standardization in image resizing is indeed a chaotic aspect of the current research landscape. Highlighting and addressing this issue is one of the primary reasons we wrote this paper. By explicitly stating the resolutions used in different studies, we hope to bring attention to this problem and help the field achieve more reliable and consistent conclusions.

@Knightzjz
Copy link
Contributor

Thanks for your information Han Chen. We recently also received some similar questions and advice. Please refer to Issue #22 for our detailed explanations.

@chchshshhh
Copy link

Thank you very much for your outstanding contributions in the field of image manipulation detection. Regarding the input image size, would you consider using a unified experimental setup across all models by resizing the images to 512×512 for the experiments? I believe this would be the fairest way to make comparisons.

@SunnyHaze
Copy link
Contributor

Thank you very much for your outstanding contributions in the field of image manipulation detection. Regarding the input image size, would you consider using a unified experimental setup across all models by resizing the images to 512×512 for the experiments? I believe this would be the fairest way to make comparisons.

Thanks for your attention to our project. However, it's true that each network has its own appropriate resolution and design, and the corresponding complexity is also within the same order of magnitude. Therefore, our strategy is to 'stay true to the original work,' implementing the models as closely as possible to the design presented in the original paper.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants