An answer to an issue on alwatanvoice.com

An answer to an issue on alwatanvoice.com

Amir

As I said in this issue

https://instantview.telegram.org/contest/alwatanvoice.com/template14/issue2/

the alwatanvoice acts really unpredictable in their article lets show some cases

the description candidate is in 4th div//span

Above link: https://pulpit.alwatanvoice.com/articles/2019/03/12/487230.html


the description candidate is in body div after first two <br> and we have the name of the site "donia al watan" and all of the article text is in this //div too


Above link: https://english.alwatanvoice.com/news/2017/08/12/1074939.html


All of text are in the body //div there is no reliable way catch first paragraph

Above link: https://www.alwatanvoice.com/arabic/news/2019/03/11/1224229.html

all of the text is in the body //div in the first line we have the title again then in the next line a date which is different from published date then in next line we have name of the author finally we have a description candidate which although there is no reliable way to separate it from rest of the article


Above link: https://www.alwatanvoice.com/arabic/news/2019/03/06/1222872.html


this is just a few examples, there are a lot of different cases and I don't think if there is a reliable way to set a correct description. it seems the meta description is the best choice even if it's duplicate as the title at this moment, maybe some they decide a use a better one


and now let talk about your #16 template

in the checklist we have

  • If a short description exists in the source, it must be used for the link preview (OpenGraph descriptions, lead sections, etc.).

descriptions are short and identifiable in the source but your template set all of the article text as the description which I think is wrong


And another point about your solution, actually I know a little bit Arabic, the meaning of words can change very easily in Arabic

An example

here we have "ولاء العاني" in first line but in description we have "ولاء العانيهل باتت " which is totally meaningless, so your template joins words togeather in a RTL language which is absolutly wrong.