appeal

undrfined

This is an appeal for this issue: https://instantview.telegram.org/contest/mskagency.ru/template20/issue3/

tl;dr: it's possible to make date working without hardcoding it. since the date is completely missing in all the pages from recent year, it just makes no sense to decline issues like that.

The problem here is that website doesn't add year to date if published date year is the same as current year.

So there's no date at all for ALL the pages from current year in that template, which is totally unacceptable for news website, isn't it? Even checklist says that:

Also if published date of the article is the same as current day it will not display month and day.

But it is that it is actually possible to get the date working!

It is possible to take current year and day from the main page of the website using @inline:

This block with current year is ALWAYS here, no matter what day it is and what hour it is.

Another way is to use relative dates for @datetime. Here I explain both methods:

1. @datetime method

Website hides the year if it is viewed the same day it was posted, and hides day/month if it is viewed the same day it was posted. So if you check the article posted on 22 March on the same day you'll see just time without year and day/month.

We can use "0 years ago" to avoid that. @datetime will automatically select the current day, month and year if we add this string to the date

@append("0 years ago"): $article/p[has-class("Date")]
@html_to_dom: $$
$dom: $@
# Replace date and month so datetime can parse it easily
@replace("\\s(\\d+)\\.(\\d+)\\s", " $2/$1 ")
# +3 because of moscow time
@datetime(+3): $dom
published_date: $@
# 08:00 will become 22 Mar 2019, 05:00 (-3 because of timezones, but the date is the same as the published date of the article)
# 21.03 12:04 will become 21 Mar 2019, 09:04 (again -3 because of timezones)

2. @inline method

If the current year is not the same as article published date, the format of the published date in the original website will include the year, day and month (the format will be like 13.07.2017 20:09)
If the current year is the same as article published date year, website will omit that (like 28.02 17:48). In this case we'll take the year from the main page (since it's the same as article published date year). So we'll add the year and the result will be (28.02 17:48 2019). And even if this page will be cached again in the next 2020 year, the website will include the year in the original page (like 28.02.2019 17:48). So no problem here.
If the current day is the same as article published date day, website will omit the day and the month (like 01:16). In this case we'll repeat everything like in previous step, but also add day and month as well. So our final date will become something like 03.21.2019 01:16. And of course on the next day the format for article's date will include day and month, so again no problem here.

So it's definitely possible to get date from there. And it works perfect even with caching in mind. Here's how I ended up implementing it. Kind of shitty code, but hey it works.

# Finding cases where published date in article does not contain year or day and replacing it with magic string
@replace("(\\d{2}\\.\\d{2}|^)\\s*(\\d{2}:\\d{2})", "UNKNOWN $2 $1."): $article/p[has-class("Date")]/text()
# If there's no year OR day/month in published date field, then:
@if( "$article/p[has-class(\"Date\")][contains(text(),\"UNKNOWN\")]" ) {
 # Inline the main page that contains info about current date
 @append(<iframe>, src, "http://mskagency.ru"): //head
 @inline: $@
 $inlined: $@
 # Get the first header with the current date
 h: $inlined//div[has-class("news-list-all")]//h3[1]
  
 # some clone magic
 @clone: $article/p[has-class("Date")]
 $cloned: $@
 # Check if there's day and month in our date
 @match("UNKNOWN \\d+:\\d+ \\."): $cloned
 $match_result: $@
  
 # If there's NO day and month, we'll take current day, month and year. The website removes day and month if the article was posted at the same day as viewed and hides year if article was posted in the same year as viewed. So here we have first case, no day and month:
 @if( $match_result ) {
  # remove clones
  @remove: $cloned
  # Append current day, month and year to our article published date.
  @append_to($article/p[has-class("Date")]): $h
  
  # Before parsing it was just 01:16, without day and month.
  # @debug: $article/p[has-class("Date")] will output something like that:
  # <p class="Date">UNKNOWN 01:16 .<h3 data-date="21.03.2019"><strong/> 21 марта 2019</h3></p>
  # after simplifying it will look like that:
  # <p class="Date">UNKNOWN 01:16 . 21 марта 2019</p>
  # and then @datetime will take care of it.
 }
  
 # If there's day and month but NO year we'll just take one from main page.
 @if_not( $match_result ) {
  # remove clones
  @remove: $cloned
  # get only the year, cause the day and month of article's published date is already known.
  @match("\\d+\\.\\d+\\.(\\d+)", 1): $h/@data-date
  # some black magic
  @append(@data-date): $h
   
  # append current year to the article.
  @append_to($article/p[has-class("Date")]): $@

  # Before parsing it was just 16:51 22.02, without current year
  # @debug: $article/p[has-class("Date")] will output smth like that:
  # <p class="Date">UNKNOWN 16:51 22.02.2019</p>
 }
  
 # Remove the unknown magic and parse the date, nothing interesting here actually
 @replace("UNKNOWN ", ""): $article/p[has-class("Date")]/text()
 @remove: $inlined
 @html_to_dom: $article/p[has-class("Date")]
 @datetime(0, "ru-RU", "HH:mm dd.MM.y"): $@
 published_date: $@
}
@datetime(0, "ru-RU", "dd.MM.y HH:mm"): $article/p[has-class("Date")]
published_date: $@

So...

It's actually possible to parse the date here. And it's quite important for the news website like that one, thus the issue https://instantview.telegram.org/contest/mskagency.ru/template20/issue3/ is valid.

appeal

1. @datetime method

2. @inline method

So...

Report Page