Some Protests over A Contest

H. Mirzababaei . Estimated Reading Time: 6 Min 40 Sec

What you are reading is a critical report of my observations as one of the participants in Telegram's educational quiz contest, and I hope that it would have positive effects on the continuation of the evaluation process and P.O.V of the contest executives. I will discuss the controversial aspects of the evaluation process based on what I have done and observed. Therefore, it is also a kind of objection and appeal against the received comment and score on my quiz, which I hope would be noticed by the esteemed judge(s).

Introduction:

This contest is exciting and like a magic token story. There is very little accurate, official, and transparent information related to it, which is transmitting to users and participants in drops. However, there are UNOFFICIAL resources that provide the most detailed information about the contest through informal channels. All must Know, and no one should be held responsible! Win-win!

One of the tiny official information released was the announcement of the results on an unknown date from June 2020, which did not evolve, and of course, never said when the possible alternative date would be? It is reasonable to infer Telegram did not imagine this volume of participation, and its widespread acceptance has led to a lack of human resources.

But what stays the main criticism is the criteria of evaluation. The competition is designed with simple rules, although the not-so-simple inspections evaluate it! Telegram may, to some extent, be right. They were waiting for a comprehensive manifesto for evaluation by modeling the submitted quizzes on the inductive reasoning basis. Nonetheless, by analyzing the judges' comments, it seems that it has reached a not-so-fair model.

A logical overview:

We can divide the set of criteria - that could have rationally and logically existed - into two categories of predictable and unpredictable measures for quiz designers. Each of these two categories has taken two forms in the evaluation process, either considered and applied or not applied by judges. Therefore, with rational limitation, there can be four assumable types of criteria in general:

Was predictable for designers and used by judges
Was unpredictable for designers and used by judges
Was predictable for designers and not used by judges
Was unpredictable for designers and not used by judges

Analytical Review:

The latter (No. 4) is meaningless and cannot be mater of examination because both descriptions are negative. Also, we do not have enough evidence to be exact on what's going on behind the scenes. Thus, based on what we have in our hands, by analyzing judges' most frequent comments, we will examine the other three types of measurements separately to determine the extent to which fair patterns have taken into account.

The first family of these measurements includes criteria that Telegram has explicitly referred to as the basic rules of the test and have been applied exactly as later, such as the minimum number of questions that should be 30 questions. In other words, these measurements are synonymous with these criteria, which the failure to comply with them directly leads to the participant's disqualification. Although the seventh one, i.e., Advertising, is not highly predictable, it can have an acceptable justification. In general, this family of measurements seems relatively fair.

On the other hand, based on the contest's general objectives, other predictable measurements are identifiable, which seem also have taken into account somehow. Indeed, judges use the measures but in an irregularly and ambiguously manner. These undefined and irregular measurements lead to injustice in some cases. Most frequent categories are:

Educational explanations: This criterion is relatively clear.
Well-used and self-made media: that the "well-use" of media is a relatively transparent and predictable measurement, but its "self-made" is not very predictable. Merely if I wad aware, it was straightforward to produce the media I want personally. Furthermore, in the first part of it - well-use of media - I have used it very accurately and creatively (e.g., question #3 of my quiz) since approximately just one-third of the test participants were able to answer correctly because of the same appropriate and creative use. I have not been nominated for this issue; why? Is there any clarification? May it depend somehow onto the judge's tastes?
Provides a valuable service, teaches something useful that can be used in real life: which, given the generality of this measurement and the fact that it is inherently good, can be predictable and a reasonable measure, although the exact dimensions of it are still not clear to us.
Educational pre-poll messages: This measurement can also be predictable, given that it creates the first profile of the quiz and welcoming of the test. My submitted exam has a very informative and attractive pre-poll message for the participant. Also, it is developed by referring to other sources such as here & here. This extra information and media give a detailed explanation to the user with particular tags. Why did the esteemed judge not consider it, and I am not nominated for this measurement in his comment?
A unique idea for an original test: Naturally, having unique ideas in any competition is precious and can be predicted as a criterion for evaluation. I have explicitly used a truly creative thought in my exam for my field of expertise (PTE Exam). My opening sentence of the pre-poll message describes my quiz as a unique quiz even before I certainly know this would be a measurement. This unique form of question has aroused the admiration of well-known professors in this field. The idea of designing questionable items from the nature of other exam's questions is precisely the unique idea, but my judge did not nominate me for this element in his comment; Why?
An impressive approach to formatting: This criterion is also one of those general terms expected to be positive in any designing, whether quizzes or anything else, but what is the definition of the measure and criterion? Personally, in my test with more than 200 real participants, I feel participants were very excited about the questions and the profound reminder of the tips for them. Even this test of mine has led to a practice process in which other colleagues have turned to design multiple training quizzes. Why did my judge not have such an opinion about me? What exactly is the definition of an "impressive approach?" And how should I show my being impressive?

This category includes comments that were not predictable for designers. Otherwise, someone who can meet other standards is more likely to deal with such rules as well. The "self-made media" was the first sample, which mentioned earlier.

Another example is the step-by-step approach to teaching. Again, the definition of being step-by-step is not very clear. Additionally, someone can quickly represent it merely by rearranging the order of quiz questions. First of all, such an indicator was inconceivable for me as a criterion of the contest, and secondly, it has already happened by nature, because I, as a designer, arranged my questions for other educational purposes. Let say evaluation is going to impose also the direction of steps! Such ambiguity certainly does not make a fair judgment. Eventually, my step-by-step approach is not nominated for the step-by-step approach because of our different step's directions!

It seems that all unpredictable criteria would be classifiable under the heading of "creativity and innovation." I created real competition with my quiz. I held it and even paid a personal prize until my exam goes under use for practical educational purposes. All records are available here. How is such an innovation seen and given a positive rating? Or is such creativity worthless and meaningless to the evaluation?

It includes the criteria which are expected to be an evaluation criterion but not considered by judges. In other words, the characteristic and connotative implications of the contest make them predictable for designers. According to the comments made so far, at least a trace of positive and negative effects of them is not identifiable. Some of which are:

Test level: If such a classification did not affect the competition, why did it exist? And if it has impacts, how?
The number of test participants: It seems that the actual number of test participants can have an essential message of its effectiveness and success in attracting the audience. Why is there no trace of such an indicator's positive and negative effect in the judges' comments?
Participant's average time: This indicator is also critical because it indicates the degree of involvement of test participants with quiz questions. This index, as in the previous cases, has not been considered.
Quiz Title: As creativity is an overall positive dimension of the whole test, why creativity has not considered in the test's naming? When the pre-poll message is essential, so the title has a special place, too.
The number of questions: The amount of quiz questions is a measurable and predictable indicator that seems to have not taken into account. In this case, I have the minimum standard, but the problem is: Has it had effects on the evaluation or not?

Conclusion:

Finally, I hope it would be clear that this critical view entirely intended to help the improvement of fair evaluation, and that perhaps happens with even a little more official information and clarification by Telegram, which will clear up many of these issues. In any case, I believe that Telegram has accepted the fact of being delayed, if a little more delay makes the evaluation process fairer, it must adopt such a policy.

Note 1: I appeal to my esteemed judge(s) to re-inspect the mentioned parts, which I claimed to be nominated. I hope to receive more positive comments and a revised-to-better score rate.

Note 2: Special thanks to the admin of this page for providing the most trustable UNOFFICIAL information about the contest.