Monday, 24 September 2012

Confidence based assessment with MCQs. Is it worth using?

Confidence assessment in MCQs is an innovative method of accounting for the degree of belief in a respondent's answer and minimising guessing, which can skew scores upwards. I've been a fan of the approach and have used it in some of my module assessment as it rewards those who are knowledgeable AND confident in that knowledge. Details of the method can be found in this paper on formative and summative confidence based assessment by Gardner-Medwin and Gahan from UCL. Essentially, respondents pick their chosen answer and indicate their level of confidence (say 1, 2 or 3 where 1 is low and 3 is high). If they get the answer correct then they score according to their confidence level, so correct and v. confident = 3; while correct but not at all confident (possibly guessing) = 1. However, if they get it wrong AND are confident in their answer to some degree then a penalty is applied, so v. confident but wrong will be a minus score.

So why is this relevant to my research? Well the main outcome measure is a 26 item MCQ assessing spatial cognition and I want to minimise the impact of guessing on the results and evaluate if participants' confidence in their knowledge is enhanced following the intervention. But ... is it worth doing? For one thing it complicates the procedure for participants and, in itself, influences how they might respond. Furthermore, I recall a discussion with my supervisor some time ago where she argued that it wasn't really necessary. I remember arguing the case FOR it but coming away thinking that I really should investigate further and compare early results with and without confidence assessment to see if it WAS having any impact. It is essential that the main outcome measure in a RCT is as relaible and valid as possible.

So, as I didn't use confidence assessment in the pilot, my plan was to use data collected over the next fortnight to establish the impact of confidence assessment by doing 2 separate analyses and comparing the results. I then remembered that about 12 months ago I collected data to test the VR model, pilot revisions in the outcome measures/questionnaires and check their test-retest reliability.  I already had data I could use. It took a little while to find it amongst one of the 3 USB drives and 2 PC's where all my stuff seems to be (dis)organised! Note to self - must spend a day getting this all organised properly and backed up too.

The marking scheme for the confidence based assessment I used was as follows:



Confidence level
1
2
3
Mark if correct
1
2
3
Penalty if wrong
0
-1
-2


Results
The chart below shows the knowledge enhancement (difference between pre and post intervention MCQ scores) for 2 separate analyses for the 20 participants. Both are normalised to a percentage score. Blue bars represent analysis WITH confidence assessment. Red bars represent the analysis WITHOUT it applied. Basically I adjusted the scores for this latter analysis by scoring correct answers as 1 point and incorrect answers as 0 points.


As you can see it appears that confidence based assessment has virtually no impact on the scoring in all but 2 of the participants. Sadly I have no additional data that allows me to explore WHY confidence in knowledge was enhanced significantly in those 2 participants. I did a quick and dirty paired t-test to compare the datasets and this confirmed no significant difference between the 2 analyes (p = 0.45).

The MCQ inventory also has 3 different categories of items so I thought it would be important to compare improvement scores in these too. Once again, no statistically significant difference in any of them. However, it was interesting to note that, irrespective of confidence based assessment, there was a significant correlation between score improvement and MCQ item difficulty. Basically, a bigger score improvement was seen in those items with the lowest mean score on the pre-test. This is something I'll explore further in the main bulk of data collection/analysis.

One final point to note is that this analysis was based ONLY on the results from participants using the VR intervention and NOT the control group intervention. It is, I suppose, possible that there may have been differences in confidence pre and post with this group.

Conclusions (tentative)
  1. Incorporating confidence-based assessment in the MCQ measure does not appear to influence the difference between pre and post-intervention scores.
  2. This finding is consistent for overall knowledge enhancement AND knowledge enhancement in sub-categories of MCQ items.
  3. Incorporating confidence-based assessment in the main study is not justified.
  4. There appears to be a positive correlation between increasing MCQ item difficulty and degree of knowledge improvement.
I admit to being a little surprised by this analysis. I had really believed that I would have seen significant improvements in scores as a result of increased confidence in knowledge and that this would have justified the use of confidence-based assessment. However, this is clearly not the case and at least I won't be wasting mine and participants' time by incorporating it.

Friday, 21 September 2012

Why do a PhD?

Every doctoral student must wonder why they are bothering doing a PhD.  And if not then why not? I sometimes lay awake pondering why the hell I'm putting myself through this.

If you aren't sure then this probably isn't far off the truth:




Randomisation methods

I'm not going to discuss the rationale for randomisation in trials in any detail here. Basically it will prevent selection bias, provide (in theory) comparable groups and maximise the precision of the estimates of intervention effects. What I have been more interested in is 'which allocation procedure should I be using?'

Essentially the options are:
  • Simple randomisation
  • Block randomisation
  • Stratified randomisation
  • Minimisation
A concise summary and explanation of these different methods are provided by the CONSORT Group and can be found here

In the pilot study I used simple randomisation which was easy to implement but limited in the sense that comparable study arms cannot be guaranteed with relatively small numbers of participants. In the pilot there were just 32 participants and so the probability of significant chance imbalances across the 2 arms was high. This is also likely in the main study where n=130 (based on sample size calculation following the pilot and still not considere a large RCT) and is further confounded by the fact that there are a number of important variables that are likely to be correlated to the study's main endpoints. These include gender, age, spatial ability, preferred learning style and stage of training programme. Basically it is important to ensure that these factors are as equally balanced as possible across the control and intervention groups.

This leaves stratified randomisation or minimisation as potential options.  Stratification, with restricted, block randomisation, is common in very large multi-centre clinical trials.  Unfortunately, where there are a number of strata and overall sample size is small it can lead to sparse/empty data in 'cells' and defeat the purpose of using startification in the first place.  In my study there are a number of important strata (as identified above) and the International Conference on Harmonization suggest that no more than 3 or 4 strata should be used in a clinical trial so this method is probably unsuitable.

This leaves minimisation (as originally presented by Pocock and Simon in 1975).  Minimisation is a dynamic process that ensures balance between groups for a number of important variables.  After true random allocation of the first participant, subsequent ones are allocated such that the imbalance of identified factors is minimised between the 2 groups.  Ideally a random component is also introduced at this point with heavy weighting in favour of the allocated intervention (e.g. with a probability of 0.8). The general procedure for allocating intervention group is summarised within this PPT on randomisation methods (slides 32-36).

In my study there will likely be 5 stratification factors as earlier highlighted: gender, age, spatial ability, preferred learning style and stage of training programme. Each of these factors will have levels associated with them - e.g. spatial ability would be categorised as high, middle or low and stage of programme would be 1, 2 or 3.

Minimisation is not strictly randomisation although it does incorporate randomisation within it. Furthermore it is accepted as a suitable alternative and some have argued it as superior (e.g. Treasure and MacRae 1998).

Minimisation IS relatively complex to manage and administer compared to other methods but given it's advantages in terms of reducing chance imbalances of important factors across my groups in what is a moderately sized study it should be the allocation procedure I adopt.

Thursday, 20 September 2012

Impact on knowledge retention?

Mark Collins made an interesting comment at yesterday's research meeting. Something along the lines of 'would I be testing their spatial knowledge at a subsequent time point in order to identify any differences in knowledge retention?'

It's a good point and not something I had really considered before. For one thing I hadn't constructed any research questions related to this point. The ethical issues associated with the design of my study also kind of preclude longitudinal testing to assess knowledge retention as participants need to be offered experience in the alternative arm (intention or control) after the  post-test in case there is any difference in knowledge enhancement. If this wasn't offered then one group may be advantaged and attain better module scores as a result. Possibly. So, it would therefore be impossible to assess differences in knowledge retention between the study arms as participants will have had exposure to both intervention and control.

It may still be an interesting phenomenon to explore in a subsequent project although a number of other studies have found NO significant differences in knowledge retention between VR and conventional teaching methods. e.g. see here (with regards engineering)  or here (Hall 1998). There may be a difference in knowledge retention, however, where the relevant knowledge depends on spatial 3D cognition (e.g. navigation) so could be something to consider further in the future. For the time being though I don't think it has a place in this work.

Monday, 3 September 2012

Getting (re)started


Let's be clear. I'm not at the start of this PhD journey. For various reasons my progress had stalled but the award of a Research Excellence Fellowship at Sheffield Hallam University has given me the opportunity to get this beast under control, and a 6 month, full time secondment to complete the research. I am grateful for this and for the first time in, well, months if not years, I feel motivated by it. I can give this research my full attention.

Many of my colleagues probably see this as a chance to avoid real work for 6 months. Indeed a number of friends and family see this as an opportunity for me to get my golf handicap down further! Bloody cheek. This is NOT my view. I have a lot to do in 6 months but I'm not particularly in awe of any of it so just need to knuckle down and get the job done. Here goes.