Monday, 24 September 2012

Confidence based assessment with MCQs. Is it worth using?

Confidence assessment in MCQs is an innovative method of accounting for the degree of belief in a respondent's answer and minimising guessing, which can skew scores upwards. I've been a fan of the approach and have used it in some of my module assessment as it rewards those who are knowledgeable AND confident in that knowledge. Details of the method can be found in this paper on formative and summative confidence based assessment by Gardner-Medwin and Gahan from UCL. Essentially, respondents pick their chosen answer and indicate their level of confidence (say 1, 2 or 3 where 1 is low and 3 is high). If they get the answer correct then they score according to their confidence level, so correct and v. confident = 3; while correct but not at all confident (possibly guessing) = 1. However, if they get it wrong AND are confident in their answer to some degree then a penalty is applied, so v. confident but wrong will be a minus score.

So why is this relevant to my research? Well the main outcome measure is a 26 item MCQ assessing spatial cognition and I want to minimise the impact of guessing on the results and evaluate if participants' confidence in their knowledge is enhanced following the intervention. But ... is it worth doing? For one thing it complicates the procedure for participants and, in itself, influences how they might respond. Furthermore, I recall a discussion with my supervisor some time ago where she argued that it wasn't really necessary. I remember arguing the case FOR it but coming away thinking that I really should investigate further and compare early results with and without confidence assessment to see if it WAS having any impact. It is essential that the main outcome measure in a RCT is as relaible and valid as possible.

So, as I didn't use confidence assessment in the pilot, my plan was to use data collected over the next fortnight to establish the impact of confidence assessment by doing 2 separate analyses and comparing the results. I then remembered that about 12 months ago I collected data to test the VR model, pilot revisions in the outcome measures/questionnaires and check their test-retest reliability.  I already had data I could use. It took a little while to find it amongst one of the 3 USB drives and 2 PC's where all my stuff seems to be (dis)organised! Note to self - must spend a day getting this all organised properly and backed up too.

The marking scheme for the confidence based assessment I used was as follows:

Confidence level
Mark if correct
Penalty if wrong

The chart below shows the knowledge enhancement (difference between pre and post intervention MCQ scores) for 2 separate analyses for the 20 participants. Both are normalised to a percentage score. Blue bars represent analysis WITH confidence assessment. Red bars represent the analysis WITHOUT it applied. Basically I adjusted the scores for this latter analysis by scoring correct answers as 1 point and incorrect answers as 0 points.

As you can see it appears that confidence based assessment has virtually no impact on the scoring in all but 2 of the participants. Sadly I have no additional data that allows me to explore WHY confidence in knowledge was enhanced significantly in those 2 participants. I did a quick and dirty paired t-test to compare the datasets and this confirmed no significant difference between the 2 analyes (p = 0.45).

The MCQ inventory also has 3 different categories of items so I thought it would be important to compare improvement scores in these too. Once again, no statistically significant difference in any of them. However, it was interesting to note that, irrespective of confidence based assessment, there was a significant correlation between score improvement and MCQ item difficulty. Basically, a bigger score improvement was seen in those items with the lowest mean score on the pre-test. This is something I'll explore further in the main bulk of data collection/analysis.

One final point to note is that this analysis was based ONLY on the results from participants using the VR intervention and NOT the control group intervention. It is, I suppose, possible that there may have been differences in confidence pre and post with this group.

Conclusions (tentative)
  1. Incorporating confidence-based assessment in the MCQ measure does not appear to influence the difference between pre and post-intervention scores.
  2. This finding is consistent for overall knowledge enhancement AND knowledge enhancement in sub-categories of MCQ items.
  3. Incorporating confidence-based assessment in the main study is not justified.
  4. There appears to be a positive correlation between increasing MCQ item difficulty and degree of knowledge improvement.
I admit to being a little surprised by this analysis. I had really believed that I would have seen significant improvements in scores as a result of increased confidence in knowledge and that this would have justified the use of confidence-based assessment. However, this is clearly not the case and at least I won't be wasting mine and participants' time by incorporating it.

1 comment:

  1. Quick response from Heidi! Text of email exchange below

    Thanks for looking at that so quickly H. Appreciated!

    I don't think I AM worried about guessing so much now. There are other methods but they aren't very sophisticated and are all variations on a 'negative marking' theme whereby incorrect answers are penalised but students are given an option to 'NOT' answer the question if they think they might be guessing and don't want to risk such a penalty.

    The only possible valid alternative is to minimise the *probability* of skewing scores upwards by including more answer options for each MCQ item. Clearly with the current 4 options for each question there is a 25% chance of getting it correct with just a wild guess. Could reduce that to 20% with 5 items, 17% with 6 etc. However, I'm not keen to do this for 2 reasons: Firstly, the MCQ becomes more unwieldy and confusing for participants if there are lots of options to choose from and, secondly, I'm naturally lazy and unwilling to spend the rest of today and tomorrow adding additional option choices ;-)

    As another measure of comparing differences in *possible* guessing between the pre and post tests I looked at the number of answers with a '1' score - i.e. correct answer but low confidence. This does NOT indicate a guess of course but as there is no penalty for an incorrect answer with a confidence level of 1 they probably do include a number of guesses and so is probably the best estimator of guessing I can go with. 12.9% of all 520 answers were '1's in the pre-test compared with 10.4% in the post-test so a small reduction but not statistically significant.

    So, on reflection I have come round to the idea that incorporating an additional method to minimise the impact of guessing and/or assist evaluation of confidence in knowledge is unnecessary.

    From: Probst, Heidi
    Sent: 24 September 2012 15:09
    To: Appleyard, Robert M
    Subject: Blog reply

    Hi I think this was a useful task to undertake, are there other options for accounting for guessing with MCQs I vaguely recall some methods, if you are worried about it, could you also ask those students you interview about whether they guessed at all and how much of their answers were guessed to give you some deeper understanding, maybe?
    I tried commenting again on the blog for some reason on the I-pad it doesn't like me.

    Dr Heidi Probst PhD, MA, FCR
    Sent from my iPad