Intraday.my - identification of cover duplication

Intraday.my - identification of cover duplication



Overall:

On this site every image with a srcset has an URL to it's original version, and URLs with concrete sizes at the end. Since the cover doesn't always use the same size version as it's duplicate in the article, first I get all images original version with regex (it's also a must to improve quality). Than if there's an image in the body with the EXACT SAME URL as the cover, I remove the cover, because

"The cover must not be set if: The cover image is duplicated in the article".

Two images with the exact same URL can't be different.


Regex:

The template replaces the size-mark only before the file-extension if exists, at the images that have srcset. If an img doesn't have size-mark at the end of it's URL or doesn't have srcset, it won't replace anything. If it has srcset, and has size-mark at the end, it must have an original version too, so it's not possible that it leads to invalid URLs.


Potentially difficult cases:

Image that has size-mark before the file-extension:

https://intraday.my/wp-content/uploads/2019/01/intraday-scrape-1200x628px-750x430.png

"1200x628" doesn't get replaced, only the dynamic "750x430" before .png:

https://intraday.my/wp-content/uploads/2019/01/intraday-scrape-1200x628px.png

So it can't cause invalid URL.


Image with dynamic lookalike size-mark at the end:

https://intraday.my/wp-content/uploads/2018/06/Were-hiring-a-Content-Creator-Follow-the-link-in-our-profile-to-learn-more.-contentmarketing-writer-640x260.jpg

Deleting it leads to invalid URL:

https://intraday.my/wp-content/uploads/2018/06/Were-hiring-a-Content-Creator-Follow-the-link-in-our-profile-to-learn-more.-contentmarketing-writer.jpg

But since it doesn't have an original version, it doesn't have srcset, so it won't be replaced.

No srcset


Different images having nearly identical image URLs, where only the file-extension is different:

https://intraday.my/wp-content/uploads/2018/10/youtube.jpg

https://intraday.my/wp-content/uploads/2018/10/youtube-300x205.png

The second one has srcset and size-mark, so the template gets it's original size:

https://intraday.my/wp-content/uploads/2018/10/youtube.png

Only the sizemark gets removed, the file-extension is still different from the other one, so the image isn't the same - so it doesn't get removed. See:

https://instantview.telegram.org/contest/intraday.my/template269/?url=https%3A%2F%2Fintraday.my%2Fyoutube-pula-tumbang-tidak-dapat-diakses-di-seluruh-dunia%2F


Result: reliable working

Another example article beside the Analisis ones with duplicated image:

https://intraday.my/kisah-trader-misteri-jepun-jana-keuntungan-34-juta-ketika-ramai-orang-panik/

In my template:

https://instantview.telegram.org/contest/intraday.my/template269/?url=https%3A%2F%2Fintraday.my%2Fkisah-trader-misteri-jepun-jana-keuntungan-34-juta-ketika-ramai-orang-panik%2F



Report Page