Wednesday, 10 October 2012

More interviewing. More thoughts.

Three more interviews today with participants from both the intervention and control group.  Haven't transcribed yet (obviously) but some preliminary thoughts:

  • Knowing they were going to be assessed afterwards was probably the main focus for all of them and this strongly influenced their approach to how they used the VRE/plastic model.
  • They adopted subtly different approaches to how they used it but it was systematic in all cases and focused around the forthcoming assessment of spatial knowledge. I will look at whether specific learning strategies inherent in these participants correlates with this assessment focus. That's a good thing of the Vermunt ILS: You can look at sub-scale scores!
  • First thing that they all did was use the model to strengthen their knowledge of anatomy (not necessarily spatial knowledge) based on the pre-intervention MCQ - i.e. what they thought they didn't know.
  • After identifying structures they seemed to use the model to see how structures related to each other spatially. This was done differently by different participants. One (control group) went systematically sup to inf, ant to post and medial to lateral, self-testing as they went. There was relatively little rotation of the model during this. One (intervention) rotated the model freely to see how structures related to each other. They commented that this was really helpful to them. This participant had a relatively low spatial ability score compared to the other interviewees today. I wonder if this is important in relation to how they manipulated the model? The final interviewee (intervention) did not rotate the model at all while learning spatial relationships. He commented that he had a clear 3D mental image of the brain and thus didn't need to rotate it. His spatial ability score was relatively high. Again this is potentially of interest.
  • The self-testing theme definitely seems to be emerging. They all commented that they consciously did this.

Tuesday, 9 October 2012

Thoughts on interviewing in the research

Have done a couple of the post-experience interviews now and 3 more tomorrow (10/10/12). Was a bit uncertain about how these would go as it isn't something I feel particularly comfortable about. I'm definitely someone with more of a quantitative bent. So what are my initial thoughts? What follows here is all a bit 'stream of consciousness' but I suppose it's good to get these thoughts down now:
  • Videoing the participants' interaction with the model (control or intervention) is definitely a good idea. It avoids problems with recall, allows them to articulate their thought processes while watching themselves and provides a great basis for the early, unstructured part of the interview.
  • It might be possible to analyse and code the videos themselves. I'm not sure this is worthwhile but I'll definitely reflect a bit more on this.
  • Starting off by getting the participants to view the video and articulate what they were thinking and doing seems to have worked/is working well. It has already raised a couple of issues that I definitely didn't expect and may not have got without this approach - e.g. using self-testing and seeking feedback on knowledge, adopting a systematic approach to the self-directed tutorial knowing that they would be tested again afterwards.
  • With regards this last point; I wonder if this is a positive or negative issue? Does the fact that they might be preparing themselves for 'a test' influence how they interpret and use the model? Might this be an example of some sort of 'Hawthorne effect' - i.e. the participants are adjusting their behaviour in relation to how they use the model because they know they are involved in research and will be tested afterwards? Is it the 'post-test' that is foremost in their mind rather than learning about the spatial relationships of the anatomical structures. I'm not sure you can separate learning and assessment anyway so maybe it isn't important.
  • Is there a danger that I will place too much emphasis on these unexpected issues at this early stage and identify them as important themes? I should test these out in subsequent interviews but maybe it is too early to do that. Purposive sampling at the moment. Theoretical sampling later?
  • Trying to get my head around Grounded Theory (GT) is difficult. The language used in relation to it seems unecessarily complex and, in many cases, not intuitive at all. It is driving me nuts.
  • A discussion with a colleague about GT has helped somewhat and I have a clearer idea about how to go about coding the transcriptions and managing the data.
  • I need to start transcribing SOON while it is fresh in my mind. I intend to transcribe at least the first few myself to help with some immersion in the data but I know it is time consuming and mind numbingly boring so have been putting it off. I have a 'free' day this week and I WILL get on and do it. Honestly.
  • I'm not sure exactly how to 'link' the qualitative data with the quantitative data yet - e.g. emerging themes/categories with characteristics of the participants such as spatial ability/learning styles/baseline knowledge etc. I think I may start by mapping these on flip chart paper but not sure. Will need to discuss further with supervisor.
  • How many interviews??? At the moment it's early days and I'm pretty much interviewing all participants who volunteer but as the number increases I will be able to monitor whether all important characteristics within participants are being included - e.g. good and poor spatial ability, those with very strong/divergent learning styles, male/female, range of ages etc. and continue the purposive sampling until I start to see saturation of themes.
  • Wonder if it might be a good idea to use a focus group or two after this to test out emerging propositions?
Give me statistics any day of the week!

Tuesday, 2 October 2012

Potential selection bias problems

Oh bugger. I've stumbled across a problem that I hadn't anticipated. One that will introduce selection bias if I'm not careful.

The baseline assessment of spatial ability is being done via 2 tests: The Vandenburg and Kuse Mental Rotation Test and the CEEB Mental Cutting Test. Both are reliable, valid and well suited to the study. Example problems from both tests are here:

The top one is an example from the MRT (A) battery. The bottom one is an example from the MCT.

I got a couple of large groups of participants to do these tests immediately after consent to participate. The rationale for asking them to do this was as follows:
  1. They need to be timed and invigilated.
  2. I need the data to add to the minimisation algorithm before allocation.
  3. It saves a lot of time on the main data collection days that they attend.
Both tests are quite 'tough' and yesterday I noted about 5 participants give up after a couple of minutes because they couldn't do them (easily?) and deciding to then not participate in the research because of this. Clearly these are those who are particularly low on the spatial ability (SA) continuum and ones who I would ideally like to include in the research.  Through trying to be organised I have introduced some selection bias. I'm not sure that it would be ethical to follow these individuals up and encourage them to have another go and participate.

Possible solutions?
  • Do these tests after other aspects of data collection. This would mean using simple randomisation  instead of minimisation (actually someone from the research centre wondered why I wasn't doing this anyway, suggesting that it would be perfectly acceptable) but I could still end up with attrition later in the study if they give up on the SA tests later rather than sooner?
  • Explain to other prospective participants that the SA tests are quite tough and that low scores are not something to worry about. Essentially try to encourage them to give them a go and continue with participation.
  • Accept that attrition will occur as a result of this, accept it and discuss the limitations in the thesis.
Not sure what to do yet.

Monday, 24 September 2012

Confidence based assessment with MCQs. Is it worth using?

Confidence assessment in MCQs is an innovative method of accounting for the degree of belief in a respondent's answer and minimising guessing, which can skew scores upwards. I've been a fan of the approach and have used it in some of my module assessment as it rewards those who are knowledgeable AND confident in that knowledge. Details of the method can be found in this paper on formative and summative confidence based assessment by Gardner-Medwin and Gahan from UCL. Essentially, respondents pick their chosen answer and indicate their level of confidence (say 1, 2 or 3 where 1 is low and 3 is high). If they get the answer correct then they score according to their confidence level, so correct and v. confident = 3; while correct but not at all confident (possibly guessing) = 1. However, if they get it wrong AND are confident in their answer to some degree then a penalty is applied, so v. confident but wrong will be a minus score.

So why is this relevant to my research? Well the main outcome measure is a 26 item MCQ assessing spatial cognition and I want to minimise the impact of guessing on the results and evaluate if participants' confidence in their knowledge is enhanced following the intervention. But ... is it worth doing? For one thing it complicates the procedure for participants and, in itself, influences how they might respond. Furthermore, I recall a discussion with my supervisor some time ago where she argued that it wasn't really necessary. I remember arguing the case FOR it but coming away thinking that I really should investigate further and compare early results with and without confidence assessment to see if it WAS having any impact. It is essential that the main outcome measure in a RCT is as relaible and valid as possible.

So, as I didn't use confidence assessment in the pilot, my plan was to use data collected over the next fortnight to establish the impact of confidence assessment by doing 2 separate analyses and comparing the results. I then remembered that about 12 months ago I collected data to test the VR model, pilot revisions in the outcome measures/questionnaires and check their test-retest reliability.  I already had data I could use. It took a little while to find it amongst one of the 3 USB drives and 2 PC's where all my stuff seems to be (dis)organised! Note to self - must spend a day getting this all organised properly and backed up too.

The marking scheme for the confidence based assessment I used was as follows:

Confidence level
Mark if correct
Penalty if wrong

The chart below shows the knowledge enhancement (difference between pre and post intervention MCQ scores) for 2 separate analyses for the 20 participants. Both are normalised to a percentage score. Blue bars represent analysis WITH confidence assessment. Red bars represent the analysis WITHOUT it applied. Basically I adjusted the scores for this latter analysis by scoring correct answers as 1 point and incorrect answers as 0 points.

As you can see it appears that confidence based assessment has virtually no impact on the scoring in all but 2 of the participants. Sadly I have no additional data that allows me to explore WHY confidence in knowledge was enhanced significantly in those 2 participants. I did a quick and dirty paired t-test to compare the datasets and this confirmed no significant difference between the 2 analyes (p = 0.45).

The MCQ inventory also has 3 different categories of items so I thought it would be important to compare improvement scores in these too. Once again, no statistically significant difference in any of them. However, it was interesting to note that, irrespective of confidence based assessment, there was a significant correlation between score improvement and MCQ item difficulty. Basically, a bigger score improvement was seen in those items with the lowest mean score on the pre-test. This is something I'll explore further in the main bulk of data collection/analysis.

One final point to note is that this analysis was based ONLY on the results from participants using the VR intervention and NOT the control group intervention. It is, I suppose, possible that there may have been differences in confidence pre and post with this group.

Conclusions (tentative)
  1. Incorporating confidence-based assessment in the MCQ measure does not appear to influence the difference between pre and post-intervention scores.
  2. This finding is consistent for overall knowledge enhancement AND knowledge enhancement in sub-categories of MCQ items.
  3. Incorporating confidence-based assessment in the main study is not justified.
  4. There appears to be a positive correlation between increasing MCQ item difficulty and degree of knowledge improvement.
I admit to being a little surprised by this analysis. I had really believed that I would have seen significant improvements in scores as a result of increased confidence in knowledge and that this would have justified the use of confidence-based assessment. However, this is clearly not the case and at least I won't be wasting mine and participants' time by incorporating it.

Friday, 21 September 2012

Why do a PhD?

Every doctoral student must wonder why they are bothering doing a PhD.  And if not then why not? I sometimes lay awake pondering why the hell I'm putting myself through this.

If you aren't sure then this probably isn't far off the truth:

Randomisation methods

I'm not going to discuss the rationale for randomisation in trials in any detail here. Basically it will prevent selection bias, provide (in theory) comparable groups and maximise the precision of the estimates of intervention effects. What I have been more interested in is 'which allocation procedure should I be using?'

Essentially the options are:
  • Simple randomisation
  • Block randomisation
  • Stratified randomisation
  • Minimisation
A concise summary and explanation of these different methods are provided by the CONSORT Group and can be found here

In the pilot study I used simple randomisation which was easy to implement but limited in the sense that comparable study arms cannot be guaranteed with relatively small numbers of participants. In the pilot there were just 32 participants and so the probability of significant chance imbalances across the 2 arms was high. This is also likely in the main study where n=130 (based on sample size calculation following the pilot and still not considere a large RCT) and is further confounded by the fact that there are a number of important variables that are likely to be correlated to the study's main endpoints. These include gender, age, spatial ability, preferred learning style and stage of training programme. Basically it is important to ensure that these factors are as equally balanced as possible across the control and intervention groups.

This leaves stratified randomisation or minimisation as potential options.  Stratification, with restricted, block randomisation, is common in very large multi-centre clinical trials.  Unfortunately, where there are a number of strata and overall sample size is small it can lead to sparse/empty data in 'cells' and defeat the purpose of using startification in the first place.  In my study there are a number of important strata (as identified above) and the International Conference on Harmonization suggest that no more than 3 or 4 strata should be used in a clinical trial so this method is probably unsuitable.

This leaves minimisation (as originally presented by Pocock and Simon in 1975).  Minimisation is a dynamic process that ensures balance between groups for a number of important variables.  After true random allocation of the first participant, subsequent ones are allocated such that the imbalance of identified factors is minimised between the 2 groups.  Ideally a random component is also introduced at this point with heavy weighting in favour of the allocated intervention (e.g. with a probability of 0.8). The general procedure for allocating intervention group is summarised within this PPT on randomisation methods (slides 32-36).

In my study there will likely be 5 stratification factors as earlier highlighted: gender, age, spatial ability, preferred learning style and stage of training programme. Each of these factors will have levels associated with them - e.g. spatial ability would be categorised as high, middle or low and stage of programme would be 1, 2 or 3.

Minimisation is not strictly randomisation although it does incorporate randomisation within it. Furthermore it is accepted as a suitable alternative and some have argued it as superior (e.g. Treasure and MacRae 1998).

Minimisation IS relatively complex to manage and administer compared to other methods but given it's advantages in terms of reducing chance imbalances of important factors across my groups in what is a moderately sized study it should be the allocation procedure I adopt.

Thursday, 20 September 2012

Impact on knowledge retention?

Mark Collins made an interesting comment at yesterday's research meeting. Something along the lines of 'would I be testing their spatial knowledge at a subsequent time point in order to identify any differences in knowledge retention?'

It's a good point and not something I had really considered before. For one thing I hadn't constructed any research questions related to this point. The ethical issues associated with the design of my study also kind of preclude longitudinal testing to assess knowledge retention as participants need to be offered experience in the alternative arm (intention or control) after the  post-test in case there is any difference in knowledge enhancement. If this wasn't offered then one group may be advantaged and attain better module scores as a result. Possibly. So, it would therefore be impossible to assess differences in knowledge retention between the study arms as participants will have had exposure to both intervention and control.

It may still be an interesting phenomenon to explore in a subsequent project although a number of other studies have found NO significant differences in knowledge retention between VR and conventional teaching methods. e.g. see here (with regards engineering)  or here (Hall 1998). There may be a difference in knowledge retention, however, where the relevant knowledge depends on spatial 3D cognition (e.g. navigation) so could be something to consider further in the future. For the time being though I don't think it has a place in this work.