Quick TJ 2018
The basis of designing outcome measures for any assessment of treatment to improve muscle function after nerve injury necessitates measures that are sensitive to change and a knowledge of what amount of change (of that measure) is identified as significant by the patient. It has been identified that “heterogeneous selection and measurement of outcomes in clinical trials further impairs the ability to synthesise results across studies in systematic reviews” (Clarke 2008). In this manner ‘agreement towards the standardisation of outcomes for clinical trials has been proposed as a solution to the problems of inappropriate and non-uniform outcome selection’.
It is thus important to know what is the accepted standard and is what the attitudes are of experts in the field as to what measures could be undertaken to improve upon these standards.
1 Research Questions
- What is used in common practice to assess motor function?
- What should the ideal clinical assessment of motor function look like?
- From the discussion in modern outcomes science and clinical practice it appears it will likely have to be multifaceted and combine some aspect of subjective and objective assessment. What do leading clinicians think?
The aims were to:
- Characterise what the current opinion regarding motor assessment was across a group of clinical nerve injury experts.
- Identify what assessment methods were used currently and what expert opinion regarded as the strengths and weakness of these current methods.
- Identify if any maximal force assessments use any other measurement tool other than MRC
- Assess the experts’ opinions on what an MCID for re-innervated elbow flexion force would be.
- Identify what the expectation were for outcomes in each experts’ practice from an Oberlin transfer and how this relates to the literature.
- Identify which aspects beyond peak force were considered to be useful to assess and which are being collected.
- Assess the degree to which Patient-related outcomes were being deployed and which ones.
- Attempt to gain consensus on what data should be collected in the future.
- Establish and contact a group of experts who would engage with a Delphi process.
- Design and administer an online questionnaire to assess expert opinion.
- Respond to the themes of the primary responses to further clarify the opinion of the panel to the future direction of motor assessment.
A set of questions was designed by the author to invite responses towards the aims of the study. A combination of direct yes/no, multiple choice, open-ended answers and Likert questions were used. See below for a list of questions.
A web-based questionnaire was loaded with this questions and offered up to receive responses. (Figure 5.1 shows a screen grab of the on-line questionnaire)
Experts were identified by invitations being extended to all delegates at the 22nd Sunderland Society group meeting in Frankfurt, Germany (December 4-6 2016) and the Anglo-Scandinavian nerve injury and plexus meeting (Stockholm, Sweden May 19-20 2017). At each meeting the author delivered a presentation on the above material (and the findings of chapter six) at two international meetings. The audience of international leading Attending Consultant surgeons and leading therapist were asked to fill in an online survey hosted on Google Docs – (Google corp. Mountain View, California, United States). Across both meetings there were 70 delegates (15 delegates in Sweden and 55 in Frankfurt). The questionnaire received 18 respondents all Attending/ Consultant surgeons from the US, Canada, Sweden, Netherlands, Finland, Norway, Germany, India, Scotland and England.
N=18 respondents engaged with the first round. The results are below in 5. The second round method is detailed in 6.
a screen grab from the Google forms data collection page
The following questions were posed
Q1. Do you routinely use the MRC (0-5) Muscle Power grading for muscle recovery after denervation/renervation?
Q2. Do you find the MRC Grading system for force:
just something I do as a matter of course
nearly no use
I only use it for research purposes
Q3. What percentage of your elbow flexor restoration nerve transfers do you think attain MRC grade 4?
Q4. What percentage attain less than MRC Grade 4? (0-3)
Q5. What percentage of the outcomes do you think you would grade MRC Grade 5?
Q6. What other methods of assessing muscle power do you use routinely?
Q7. What force of elbow flexion (on average) do you think your patients attain from an Oberlin nerve transfer?
<1kg Force 5Kg Force 10Kg Force
1Kg Force 6Kg Force 11Kg Force
2Kg Force 7Kg Force 12Kg Force
3Kg Force 8Kg Force 13Kg Force
4Kg Force 9Kg Force 14Kg Force
> 15Kg Force
Q8. What percentage do you think this in relation to the normal side?
Q9. Given a population of Oberlin transfers which has a mean Force of 7Kgs what do you think would be the necessary increase in force to be clinically relevant? (the Minimal Clinical Important Difference). As represented by difference to the right shifted red curve below.
Q10. Following on from this question what do you think the minimal assessable improvement in this cohort would be (the smallest difference we could reliably assess)?
Q11. Which of the following factors do you consider useful in assessing muscle recovery from denervation?
– select one, non or many of the following: maximal contractile force, sustainability of force, fatigability of effort, grade-ability of recruitment of force, control of other joints around the assessed muscle (e.g. shoulder ER when assessing elbow flexion), co-contraction, proprioception, pain, sensory alteration.
Q12. Do you find the results of any patient related outcomes (PROMs) useful in assessing your outcomes from nerve transfers? Yes/ no
Q13. If you do use PROMS – which ones? [Free text]
Following the first round the results were interpreted and the second round of questions was developed.
The answers were as follows
5.1 Q1. Do you routinely use the MRC (0-5) Muscle Power grading for muscle recovery after denervation/renervation? (18 responses)
A1. 100% yes.
“Do you routinely use the MRC (05) Muscle power grading for muscle recovery after denervation/re-innervation?”
The use of the MRC muscle grading system is universal across all the clinics and countries sampled. This (as discussed in chapter four) is a situation which is well understood by clinicians, it has become the universal method applied in clinical discussions and published literature. It is a well used tool where by its weaknesses are well known but it is still favoured despite these; it has stood the test of time. Its longevity universality has made it part of the lexicon of motor assessment however it is this dominance that, it could be argued, is holding back development of novel tools of assessment.
5.5.2 Q2. Do you find the MRC Grading system for power (18 responses).
A2. Very useful (3) 17%
Quite useful (11) 61 %
Just something I do as a matter of course (1) 5%
Nearly no use (1) 5%
I only use it for research purposes (1) 5%
Other (1) 5%
“Expected by others, gives an impression easily understood by other doctors, but woefully unrelated to holistic function and heavily skewed towards minor increases in non-functionally relevant strength”
Pie chart of responses to question 2 “. Do you find the MRC Grading system for power:”
5.5.3 Q3.What percentage of your elbow flexor restoration nerve transfers do you think attain MRC grade 4?(18 responses)
|% attain MRC Grade 4||90||85||80||75||70||65||60||55||50||45||40|
Figure 5.4 :
Histogram chart of responses to question 3 “.What percentage of your elbow flexor restoration nerve transfers do you think attain MRC grade 4?”
Thus 2/3rds of respondents believe over 80% of nerve transfers restore elbow flexion to MRC grade 4 and 89% believe over 70% of their case regain this level.
5.5.4 Q4. What percentage attain less than MRC Grade 4? (0-3) (9 responses)
|% attain > than MRC Grade 4||0||5||10||15||20||25||30||35||40||45||50||55||60|
Histogram chart of responses to question 4 “What percentage attain less than MRC Grade 4? (0-3)”
This is not the exact inverse that one would have expected from question three. It does show that the range of opinion is between 15-30 percent of nerve transfers that do not attain MRC grade IV (the ability to lift weight against gravity).
5.5.5 Q5. What percentage of the outcomes do you think you would grade MRC Grade 5? (9 responses).
0% (13) 72%
shouldn’t use it (1) 5%
5% (2) 11%
10% (1) 5%
Figure 5.6 :
Bar chart of responses to question Q5. “What percentage of the outcomes do you think you would grade MRC Grade 5?”
Over 2/3rds of experts are of the thought that re-innervated muscle never gains MRC grade V (normal) peak force and a further respondent was of the opinion that MRC V should not be (on principle of definition) assigned to a re-innervated muscle.
If we were to define ‘normality’ to be 2SD beyond the mean this would mean from our figures (normal arms 20.65KgF SD6.85) then normal could be defined as attaining 7KgF and below but it is clear that even though our respondents recognise that (Q7) a mean outcome for this population is over this (7.11KgF) they do not consider this to be gradable as normal and thus perhaps are subconsciously considering other aspects of a muscle’s function when they consider no one will attain a Grade 5 MRC (‘normal’). This thought is highlighted by one respondent; who states that it ‘should not’ be used, by which, the author presumes, they mean that even though peak power may reach a level considered in the range of normal, the recovery should not be labelled as normal for other reasons.
5.5.6 Q6. What other methods of assessing muscle power do you use routinely?
(15 responses- including multiple responses to this question).
Calibration by weight lifted (discrete functional assessments) (5)
Assessment of functional tasks (3)
Calibration by weight lifted in adults particularly (1)
Active range of movement (1)
Verbal assessment of fatigue (2)
There were numerous other assessments of peak force; pinch, grip, calibration by weights lifted and dynamometer. These are all continuous measurements of force (other than weights lifted- a discrete method of measurement). Only 3 respondents assessed any other feature of muscle function (2 cast a vote for: verbal assessment of fatigue and 1 for: active range).
5.5.7 Q7. What force of elbow flexion (on average) do you think your patients attain from an Oberlin nerve transfer? (17 responses).
Figure 5.7 :
Bar chart of responses to question Q7. “What force of elbow flexion (on average) do you think your patients attain from an Oberlin nerve?”
The mode response is 5Kg (mean for the distribution of these responses is 7.11KgF range 2-10).
5.5.8 Q8. What percentage do you think this in relation to the normal side?
(17responses) 1 answer – ‘no clue’.
|Percentage of normal side||<10||10||20||30||40||50||60||70||80||90||100|
The mean here is 23% mode 20% of normal force. Thus extrapolated from the force that the respondents gave as presumed mean (7.11KgF) this would give an estimate of the expected normal mean of 30KgF.
5.5.9 Q9. Given a population of Oberlin transfers which has a mean Force of 7Kgs what do you think would be the necessary increase in force to be clinically relevant? (the Minimal Clinical Important Difference) (13 responses).
The author recognises the low rate of response to this questions (13/18). The question may be poorly worded or the concept a difficult one but the concept was defined in the lecture given by the author to each of the invited audiences.
These responses give a median response for MCID of 3.57KgF and mode response of 2Kg. Given the group defined the mean of outcomes as 7.11Kgs this is an MCID of 50%, taking the mode (2kgs) this is MCID of 29%.
5.5.10 Q10. Following on from this question what do you think the minimal assessable improvement (MAI) in this cohort would be (the smallest difference we could reliably assess)?
(15 responses) of these 2 stated do not know.
Thus the range of expected MAI is 0.96Kgs with our respondents presuming an average mean outcome of 7.11KgF MAD being 14% of the mean.
5.5.11 Q11. Which of the following factors do you consider useful in assessing muscle recovery from denervation (18 responses)?
maximal contractile force (9)
sustainability/ fatigability of force (5)
grade-ability of recruitment of force (0)
control of other joints around the assessed joint (0)
sensory alteration (0)
“All of the above, but the link only permits selection of one. The relative significance of each varies between muscle/joint movements” (1)
“Range of movement, force, co-contraction “(1)
“Most of the above” (1)
Figure 5.8 :
Bar chart of responses to question Q11. “Which if the following factors do you consider useful in assessing muscle recovery from denervation?”
The responses show that 50% consider the assessment of maximal contractile force being useful. Sustainability has been rated as useful by 22% of respondents. The free text answers give further insight that a multifaceted assessment is most useful.
There is an appreciation of need for a more global assessment of re-innervated muscle function. Factors such as fatigue and the ability to maintain contractile force, control the joints and effect proprioception are important as well as the standard assessment of maximal volitional force.
5.5.12 Q12. Do you find the results of any patient related outcomes (PROMs) useful in assessing your outcomes from nerve transfers? (14 responses)
“Unfortunately we don´t use it routinely, but YES”.
“Probably should use”
“Yes, but we do not use it.”
5.5.13 Q13. If you do use PROMS (patient related outcomes) – which ones? (11 respondents some with multiple responses)
Quick Dash (1)
Michigan Hand (1)
PROMs often assess function or quality of life.
The Dash (The Disabilities of the Arm, Shoulder and Hand (DASH) Score is a validated study (Hudak et al. 1996) of upper limb function and is utilised widely for a functional assessment of upper limb global activity and is mentioned by ¾ of those who responded that they had identified PROM outcome measures to use.
The short form 36 was developed as a global assessment of quality of life. It is validated in the UK populous (Hudak et al. 1996) and around the world. It assesses physical and mental well being and function.
5.5.14 Q14. Thank you for your support in this project. Do you have any further comment to make?
“Laudable intention, very complex topic though.”
“MRC grade 5 represents normal power. This is never achieved after nerve repair.”
“We currently do not use PROMS on this population, but we are searching for PROMS that could be relevant for these patients. On obstetric brachial plexus palsy patients, we use Brief pain inventory, and plan to get the BPOM translated to Norwegian. We also plan to use the EQ-5D on the obstetric population.”
“Fatigability of effort is almost as important as maximal contractile force.”
“Difficult questions Tom. We do not have the answers to all your questions but have tried to fill in the boxes as good as possible.”
5.5.15 First stage conclusions:
- This body of peers, active in the field of nerve injury treatment, demonstrate the spread of opinion around motor outcomes from nerve transfer.
- They demonstrate that there has been widespread acceptance of the method of assessing maximal volition force as a motor outcome.
- MRC grade is used universally however it is only considered very useful by 17%, the majority (61%) consider if quite useful and some consider it of little use.
- Assessment of force within MRC grade 4 is used by 11/18 of the group. With 5/18 using a continuous assessment of force (HHD) and 6/18 using discrete weights.
- The group considers that the mean expected outcome from an Oberlin nerve transfer is 7.11KgF which they consider would as a mean represent 23% of normal.
This data shows that the Delphi group considers maximal volitional force (MVF) as an entry level motor assessment. Perhaps best considered as a threshold assessment in recovery and that following the ability to attain some useful level of force other features should be assessed along with it. (It makes no sense to assess sustainability if there is no force to sustain and no point in assessing features of a function which has no application). To further characterise the agreement on the relative importance of these differing recognised aspects of re-innervated motor pattern further questions were asked of the group.
5.6 Second stage method
The same clinicians, that responded to the first round, were contacted again by email. They were thanked for their engagement and given the raw anonymised data of the first round findings. They were again asked to complete an online questionnaire; The following questions were designed by the author to identify the group’s opinion on best practice.
Second round Questions:
Do you agree to the following statements (1 – not at all to 5 – completely agree).
- Peak force assessment is an essential part of assessing re-innervated muscle function
- An assessment of fatigability would be useful as part of assessing re-innervated muscle function
- An assessment of grade-ability of recruitment of force would be useful as part of assessing re-innervated muscle function
- An assessment of muscular pain would be useful as part of assessing re-innervated muscle function
- An assessment of afferent (proprioception, muscle pinned function, etc) function would be useful as part of assessing re-innervated muscle function
- As a global patient-reported outcome (PRO) would you consider PGIC (see below) would be useful as part of assessing the outcome of a nerve transfer.
Patient related global impression of change (PGIC) is a simple assessment where-by (at a set time) post operatively the patient is asked what their impression of change from that operation has been:
The above questions were again presented via Google forms online.
5.7 Second round data
From the invited group of 18, n=10 second round answers were received.
They were as follows.
5.7.1 Q1. Peak force assessment is an essential part of assessing re-innervated muscle function (1-not at all to 5- completely agree)
|# of responses||0||0||2||4||4|
There is agreement to strong agreement here (with no disagreement)- Peak force is considered an essential part of assessing re-innervated muscle function.
5.7.2 Q2. An assessment of fatigue-ability would be useful as part of assessing re-innervated muscle function (1-not at all to 5- completely agree)
|# of responses||0||0||0||5||5|
Greater agreement here with a tighter spread. All respondents either 4/5 or 5/5 agreed that an assessment of fatigability would be a useful part of assessing re-innervated muscle function.
5.7.3 Q3. An assessment of grade-ability of recruitment of force would be useful as part of assessing re-innervated muscle function (1-not at all to 5- completely agree)
|# of responses||0||0||4||4||2|
Agreement, but less strong than for peak force and fatigue seen for assessing grade-ability of recruitment of force.
5.7.4 Q4. An assessment of muscular pain would be useful as part of assessing re-innervated muscle function (1-not at all to 5- completely agree)
|# of responses||0||4||3||3||0|
A balance of mild agreement and disagreement here for muscular pain being useful.
5.7.5 Q5. An assessment of afferent (proprioception, muscle pinned function, etc) function would be useful as part of assessing re-innervated muscle function
|# of responses||0||3||3||2||2|
5.7.6 Q6. As a global Patient reported outcome (PRO) would you consider PGIC (see below) would be useful as part of assessing the outcome of a nerve transfer.
|# of responses||0||2||0||5||4|
Thus in rank order of agreement on what should make up an assessment of function of re-innervated muscle function:
Fatigue, Patient reported outcome, Peak Force all attained greater than 4/5 Likert agreement.
With assessment of recruitment, proprioception and pain being less than 4/5 agreement.
As part of this review of muscle function assessment a group of expert views were sampled from larger expert groups and questioned for their view on what and how motor function should be assessed following nerve transfers. Through the two stage Delphi there has been an attempt to come to a consensus on how to assess re-innervated muscle function. The Delphi technique is a widely used and accepted method for gathering data from respondents within their domain of expertise. The technique is designed as a group communication process which aims to achieve a convergence of opinion on a specific issue. Participants in a Delphi study do not interact directly with each other, so situations where the group is dominated by the views of certain individuals can be avoided.
Delphi groups have been used widely in clinical medicine and surgery to inform scoping exercises for research questions (Schneider 2016), gain consensus for treatment recommendations (Van Vliet et al 2016) and to identify which parameters to measure in clinical trials (Sinha 2011). There has been discussion (Powell 2003) of the merits and pitfalls of this method of harnessing the opinions of an often diverse group of experts on practice-related problems. Particularly challenging is the level of evidence it is considered to provide inhabiting, as it does, the ground between opinion-based and evidence-based research (Powell 2003).
To place this work in nerve injury within the context of the current body of knowledge we reviewed the English literature using the search string “Dephi” AND “nerve injury” OR “nerve”; There were only two published full studies on nerve injury using a Delphi approach in clinical aspects of nerve injury. One poster reference was also identified.
Scmid & Coppieters 2011 invited 50 experts to discuss ‘double crush’ nerve pathology and garnered 17 responders of whom 16 complete their 3 stage process towards delineating the mechanism(s) underlying the pathology. In a Delphi study on sensory therapy Jerosch-Herold 2011) invited the opinions of 70 hand therapists and of the 10 responders, 7 responded to all three rounds. A poster presentation by Dy et al (Dy et al 2017) assessed which portions of a nerve decompression operation were deemed to be critical by a three stage Delphi with 10 respondents they did not explicitly identify the method used to identify the group nor did they characterise their experience.
The first round of this Delhi process demonstrated that the MRC grade was universally deployed [A1] to assess peak force and that this was the top outcome assessed [A6]. There is however a section of responders [A2] who see the MRC grade as insufficient or inadequate: only 17% rated the MRC grade very useful with 22% rating the score as not useful (“nearly no use”, “just something I do as a matter of course” or “woefully unrelated to holistic function”). Having exposed the lack of support for a wider utility for the MRC grade it is still true that of the factors considered useful in assessing motor function [A11] peak contractile force represents the majority of responses (50%). Thus the author assesses that the Delphi group whilst deeming peak force important see MRC grade assessment as inadequate for this role. The validity and utility of a continuous measure of force in motor assessment has been supported widely in the literature [Quick 2016]. This is shown [A6] in the current support for continuous measurements of force (pinch measurement, dynamometry, grip assessment).
Moving assessment beyond that of peak force; there is a wider appreciation of motor assessment apparent [A11] with sustainability of force, fatigue, proprioception and range of movement all featuring in responses. There were no positive responses for grade-ability of force being a useful aspect of assessment however.
Interestingly the Delphi group strongly favour assessment of the efferent functions of re-innervated muscle – the only afferent assessment selected, as being potentially useful in clinical assessment, is that of proprioception (with only one response). There were no respondents who selected pain or sensory alteration as important. Contrast this with the findings from the data on the subjective patient experience in chapter nine. There is evidence of a disconnection here between the professionals’ impression of what it is important to assess and the patients’ voice on this matter.
Moving to subjective assessments and PROMs, following the increasing popularity of such methods worldwide (Weldring & Smith 2011) the Delphi group showed that these were used by 14 of the respondents (all of those that replied to Q13). Of these, two used a quality of life outcome (QoL) the Short form 36 (SF36). The others used functional assessments. The Disability of the arm shoulder and hand (DASH) or a validated shortened version of this. One stated they deployed an assessment of activities of daily living (ADL). This was an area where 90% (n=9/10) of respondents in round 2 agreed to a Likert level of 4/5 or 5/5 that a specific subjective outcome of change (PGIC) assessment would be useful.
The second round again demonstrated the importance to the expert group of peak force with 80% agreeing 4/5 or 5/5 that it is an essential part of assessment of a re-innervated muscle. 100% agreed (at a 4 or 5/5 level) that an assessment of fatigue-ability would be useful as part of assessing re-innervated muscle function. Grade-ability of function as an important outcome attracted only 60% agreeing 4 or 5/5 (the others showing neither agreement nor disagreement). On muscular pain assessment there was even less support (30% with 4 or 5/5) with 40% disagreeing. Support was at similar proportions for assessing the afferent pathways.
This is novel data and there is no specific literature with which to compare it. It is shown however that this sample of international experts sees that peak force should be seen as part (and not the part of the assessment which met with most agreement on its importance either) of clinical studies on this area of muscle re-innervation. Historically such studies have focused on peak force to the exclusion of any other assessment (Figure 7.9).
A significant limitation of this study is the potential for selection bias: The specialists were approached to engaged via two international meetings the catchment was over 70 such specialists. The Delphi group we questioned was made up of 18 respondees; self-selected from from a larger group to whom the invited was extended. Groups of this size are however common in Delphi processes. Murphy et al (1998) reports that reliability is compromised by fewer than six participants, whereas groups in excess of 12 do not increase reliability of judgments thus the sample size itself is not in question but the method of selection is open to bias. The concern therefore is possited that the responses may not be representative of the larger community. The author intends to mitigate this assertion by publishing these data for peer review and to prompt wider discussion. The group was noted to be internationally heterogeneous with responses from across a number of differing countries. There is implicit response bias in a group like this but to recruit those not engaged in this area would lead to a low response rate. Even this self selecting group shows a drop off between stage one and two (18-10). It could however be considered; that given the open invite, the Delphi group were a self selecting group of those interested and engaged with the subject of motor assessment (the free comments support this assertion) and thus not representative of the wider community. It is not possible to assess how the sample size and drop off has influenced the results and the intention to publish the work for wider consumption and discussion will inform this and allow future projects to benefit from this input. There is no other literature regarding the consensus in this area that this data will attract interest and comment.
The assumptions the author draws regarding the pervading practice across those in practice are valid only if the practice of the experts is seen to be representative of that of their peers. The utility of a Delphi process allows opinion to be canvassed. The validity of using such a process to survey practice may not hold as individual practices (rather than the underlying intellectual process or reasoning which Delphi process were set up survey), may have regional reasons why they do not align. A formal site by site survey would be the most appropriate manner to establish the practices in use but would be impractical to establish in any manner other than a process similar to this. The questions though would be posed as a survey of clinic site or unit practice rather than a question on individual practice.
As set out in the aims; this study has demonstrated the opinions of working expert clinicians in the field on the subject of re-innervated muscle assessment.
Maximal volitional force remains the clinicians’ primary outcome measure. The assessment methods used currently are mainly the MRC grade. The current state of muscle re-innervation assessment is that MRC grades of force are used as the universal outcome measure. Although most recognise this has shortfalls (with only 17% grading as a very useful tool).
Experts expect 7.1KgF peak force from their Oberlin transfers; This is representative of what the published literature states. (Bhandari 2011, Carlsen 2011, Martins 2013, Quick 2016).
There has been some consensus reached through this process on what data should be collected in the future, peak force is considered to be one of the most important reported outcome but there is a recognition that fatigue and patient related outcomes should form part of any assessment.