Wednesday, May 2, 2012

Pulmonary JC 4.30.12

Azithromycin to Prevent Exacerbations of COPD 
Click here to listen to a recording of the presentation.

Dr. Kim Suh presented a randomized trial of azithromycin (250mg daily) for prevention of exacerbations in COPD.  The authors hypothesized that azithromycin should reduce the rate of exacerbations, while potentially increasing colonization with macrolide-resistant organisms and causing an increased incidence of hearing loss.

Major exclusion criteria were asthma, resting tachycardia, a prolonged QTc at baseline or use of drugs known to prolong the QT interval (except amiodarone), and pre-existing hearing impairment.

This well-designed study demonstrated a reduction in median time to first exacerbation and the overall incidence of exacerbations, while documenting a 5% absolute increase in the rate of hearing decrements and a 40% absolute increase in the rate of nasopharyngeal colonization with respiratory pathogens resistant to macrolide antibiotics.

In the discussion, Dr. Boesch pointed out that one would really like to see the results broken down according to a standardized rating of disease severity, such as the GOLD system of staging, in order to see where the most benefit might be expected.  Dr. Subramanian and Dr. Schub agreed that, while this study should not revolutionize practice, it does represent a potential rationale for deploying azithromycin in patients with particularly intractable disease.

From the point of view of EBM, it is worth pointing out that this article includes a misleading presentation of the number needed to treat (NNT). (Click here for a recording of some remarks on this at Journal Club).

Albert et al. report an impressive NNT of 2.86 “to prevent one acute exacerbation of COPD.”  Conventionally, the NNT is calculated as the reciprocal of the absolute difference in the risk of a dichotomized outcome (such as mortality) between the treatment and control groups of a clinical trial.  However, what Albert et al. present is the reciprocal of the absolute difference in the rate of a recurring event, i.e. COPD exacerbations.  This gives the “event-based” NNT, rather than the conventional NNT.

As Aaron and Ferguson have recently pointed out, the “event based” NNT (which was first described by Halpin in 2005) is a theoretically frought statistic with unclear clinical application.  Because it incorporates multiple instances of the same event happening to the same person, it is generally smaller than the conventionally calculated NNT and tends to obscure heterogeneities in risk.   

To illustrate the first point, calculate a conventional NNT for Albert’s study (using the data in Figure 2): we get a value of 9.1 to keep one patient exacerbation free for the entire study period, which is considerably less impressive than the one they present.  The second point is best illustrated by a simple thought-experiment: in a study with ten patients in each of the treatment and control groups, where 20 exacerbations occurred in one individual in the control group and 2 occurred in one individual in the treatment group, the “event-based” NNT would be 0.56.  This might lead one to believe that only half a patient needs to be treated to prevent one exacerbation, and that the benefit would apply to the entire group; however, the former is impossible and the latter clearly wrong.

This is the third time this year that we have discovered a major error either in calculation or interpretation of NNTs in a journal club article - the first being in ICU journal club, the second in Cardiology journal club.  Such a high prevalence in such a small sample should remind us that it's worth checking other people's work before making major treatment decisions based on their calculations.  For those who are interested, here is an excellent article from CMAJ on the limitations of the NNT.

A Tyrosine Kinase Inhibitor in Idiopathic Pulmonary Fibrosis
(Sorry, no recording).

Dr. Boesch presented a randomized trial examining the safety and efficacy of a tyrosine kinase inhibitor in the treatment of idiopathic pulmonary fibrosis (IPF).

Patients were allocated to one of four doses of the study drug, BIBF 1120, or placebo, and treated for 12 months.  The primary endpoint was the annual rate of decline in FVC.  There was no significant difference between any of the treatment groups and the placebo group.  A difference of 0.13 L/yr is reported between the high-dose treatment group and the placebo group, but the confidence intervals for the actual values overlap one another and, considering that the trial was appropriately powered to detect a difference of 0.1L/yr or greater, this strongly suggests that BIBF 1120 (as used in the trial) does not work.

Whatever its other notable features, this article serves as a cautionary reminder of just how intimate the relationship between academic medicine and the drug industry actually is.  (Click here for a recording of some comments on this at Journal Club).  This study was funded by Boehringer Ingelheim, and all the data analysis was carried out in their facilities by their house statisticians.  The manuscript was written by "medical writers" from Flesichman-Hillard, an international marketing company whose website promises to "identify and effectively target key members of the medical community, strategically position your products in the face of increased regulation, [and] create and build trust with increasingly knowledgeable consumers [emphasis added]." 

The authors try to attenuate the appearance of industry influence by noting that "the steering committee made the decision to submit the manuscript for publication," but, if one looks at Appendix G and then reviews the  disclosure forms submitted by the authors, one finds that all the members of the steering committee were paid for their services by Boehringer Ingelheim.  In fact, all of the authors were either paid by Boehringer Ingelheim, or, as in the case Brun, Gupta, Juhel and Kluglich, were full-time employees. 

So, what we have here is a negative trial paid for by a drug company for which all the math was done by drug company statisticians, and which was written by a marketing firm.  Given this, it may be more appropriate to regard this article as a paid advertisement  for BIBF 1120 rather than a piece of clinical research.

t-PA and/or DNA-ase in Pleural Infection
(Sorry, no recording).

Dr. Patel presented a trial of tissue plasminogen activator and DNA-ase in pleural infection which, among other things, stands in refreshing contrast to the paper by Richeldi et al..  The investigators note the receipt of an unrestricted educational grant from the pharmaceutical company Roche UK, but clearly document that the company had no involvement in the actual study.  Moreover, their disclosure forms are completely clean.

Rahman et al. randomized patients with pleural infection to treatment with either intrapleural t-PA, DNA-ase, both, or placebo.  They used chest X-rays taken at day 1 and day 7 (or the last available X-ray) to determine the primary endpoint, which was the change in area of pleural opacity. 

Given these limitations, they did demonstrate a 7.9% increase in the change in pleural opacity (with a rather wide confidence interval of 2.4% - 13.4%) associated with the combination of tPA and DNA-ase.  Neither agent alone was effective in reducing pleural opacity area relative to placebo.

This trial was well-conducted and very well-reported, but does have some limitations.  First, it seems odd to use this a surrogate endpoint instead of a clinically relevant one (e.g. length of stay) as the primary outcome, and their method of determination was also odd given that they quote an accuracy of only 71% for chest X-ray in determination of pleural effusion volume when compared to CT.

Tuesday, April 3, 2012

Infectious Diseases JC 3.28.12

Dr. DiRocco: Reducing Nosocomial Colonization by Resistant Organisms in the ICU
Click here to listen to a recording of Dr. DiRocco's presentation.

Dr. DiRocco presented a cluster-randomized trial of an intervention designed to reduce the incidence of nosocomially acquired MRSA and VRE colonization and infection in adult ICUs.    

Patients staying for three days or more who were not known to be colonized with either bacterium were eligible for enrollment.  All patients were tested for colonization within two days of admission by surveillance culture.  Patients admitted to intervention ICUs were subject to universal gloving until the results of the admitting surveillance cultures were established.  If colonization was detected, patients were thereafter assigned to full contact precautions.  If not, they were assigned to usual care.  In control ICUs, surveillance cultures were sent but the results were not reported to the providers; detection of VRE/MRSA colonization was left to whatever the local procedure happened to be.

The primary outcome was the ICU-level incidence of new events of colonization/infection (no attempt was made to differentiate for the purposes of the study) with MRSA or VRE per 1000 ICU patient-days at risk.  There was no significant difference in the primary outcome between control and intervention ICUs.

Dr. McCabe made a number of interesting points about this study.  First the rates of adherence to all levels of precaution documented in the internationally famous research institutions where the study was conducted are abysmal.  Given that the Hawthorne effect was presumably operative in this unblinded study, the real rates are probably worse. Second, because the intervention used an outside reference lab for surveillance cultures, leading to very long median times to detection of MRSA and VRE, it fell short of what is practically possible in most community hospitals (including ours).  Finally, on a related note, not much attention is given to describing "usual practice."  In our hospital, "usual practice" is actually more comprehensive than the "intervention" trialled here.  For a brief discussion of cluster-randomization, see our other blog.

Dr. Badran: Early vs. Late ART in Suspected TB .
Click here to listen to a recording of the presentation.
 
Dr. Badran presented an unblinded, randomized trial comparing early (i.e. < 2 weeks) initiation of ART in HIV infected patients with CD4+ T-cell counts less than 250 per cubic mm, compared to later initiation of therapy (8-12 weeks).  The study was motivated by ambivalence in current expert opinion: on the one hand, there's concern that early initiation may increase the risk of interactions between antituberculous and antiretroviral drugs and may precipitate IRIS; on the other hand, later initiation of ART has been shown to correlate with worse HIV-related outcomes.

The primary endpoint was survival to 48 weeks without a previously undiagnosed AIDS-defnining illness.  The investigators also monitored rates of TB-associated IRIS, which were confirmed by a blinded reviewer.  No significant difference in the primary outcome was observed.  The rate of TB-associated IRIS was higher in the early-initiation group, although this was not correlated with a higher incidence of adverse outcomes, at least during the follow-up period.

The authors interpreted their results as indicating that early initiation of ART is desirable and should be widely implemented.

Dr. McCabe pointed out that this is an instance of a bizarre surrogate endpoint, since anything from oral candidiasis to death qualified as an instance of the primary endpoint.  However, as he pointed out, the point of treating AIDS is to prevent death, not candidiasis.  The incentive to generalize one's surrogate endpoint is that it massively lowers the statistical power required to detect it; but in this case, it also massively dilutes the study's ability to tell us what we want to know, which is whether early ART kills people, not whether it makes them get thrush.

A final interesting point has to do with one of the study's methodological choices.  The investigators included subjects who had both confirmed TB and suspected TB.  Their rationale was that "in clinical practice, the decision to start or delay ART must often be made before there is a definitive diagnosis of tuberculosis."  To use some jargon, they were conducting a "strategy study" to evaluate the approach empirically, rather than an "efficacy study" which would have been designed to scientifically evaluate the mechanics of cause and effect.

The problem with this, however, is that no individual patient actually has a chance of having tuberculosis between zero and infinity.  All of them actually either have tuberculosis or do not have it.  What we want to know as clinicians is not whether ART is good or bad for somebody who may or may not have TB, but how it effects people who actually do have it.  The inclusion of people who didn't have it in the first place undoubtedly dilutes any malign effects early ART may have had on those who did.


Dr. Chow: Looking at the Recent Outbreak of E. Coli-Related HUS in Germany  
Click here to listen to a recording of the presentation. 

Dr. Chow presented an article which used three epidemiological methods to retrospectively analyze the recent outbreak of Escherichia coli serotype O104:H4 in Germany, which caused a disproportionate number of cases of hemolytic-uremic syndrome.  While the epidemic was initially attributed to Spanish producers of cucumbers and tomatoes (causing significant temporary damage to the Spanish economy) the investigators were able to trace the microbe to a single sprout producer in Lower Saxony.

Dr. McCabe pointed out that this was a very alarming development in the world of medical microbiology.  The organism responsible was not of the serotype normally associated with HUS, it was enteroaggregative rather than enterohaemorrhagic, and it manifested extended spectrum beta-lactamase resistance.  Further outbreaks of this or similar organisms could represent a major threat to public health.

In this paper, the authors used the epidemiological data they collected to produce both a case-control and a cohort study.  Both are types of observational study, as opposed to experimental studies.  Both allow the investigator to determine the strength of association between an exposure (or risk factor) and an outcome; however, there are important differences in how they're conducted and in what, exactly, they allow one to say, which need to be well-understood by anybody reading them.  If you need further convincing that this is important, you might check out these two articles, which illustrate the frankly terrifying extent of misunderstandings in the peer-reviewed literature.  A brief explanation of case-control and cohort studies, and a recording of the explanation we gave at journal club, are available on the chiefs' notes blog.




 

Friday, March 2, 2012

GI JC 3.1.12

Dr. Villanueva: Sarcopenia as a Predictor of Mortality in Patients Being Evaluated for Liver Transplantation

Click to listen to a recording of the presentation.
Dr. Villanueva presented a recent article from Clinical Gastroenterology and Hepatology which evaluated the finding of sarcopenia (reduced skeletal muscle mass) as a predictor of mortality in cirrhotics being evaluated for transplantation.   The investigators reviewed CT scans collected as part of evaluation for transplant and scored patients based on the presence of absence of sarcopenia, which they defined as skeletal muscle mass at the L3 level more than two standard deviations below normal young healthy adult levels.

Two patients with identical BMI. The patient on the left is sarcopenic.
The investigators found sarcopenia to be an independent predictor of mortality in cirrhotic patients being evaluated for transplantation.  Interestingly, sarcopenia correlated rather poorly with the established prognostic scoring systems widely used in advanced cirrhosis, the Child-Pugh and MELD scores.

The investigators report their results in the form of a receiver operating characteristic (ROC) plot.  ROC plots are commonly used to depict the characteristics of tests which produce a continuous range of results (as opposed to being unequivocally positive or negative).  An ROC curve plots the true positive rate (i.e. the sensitivity) on the Y axis and the false positive rate (i.e. the complement of the specificity) on the X axis.  You can see fairly easily that the ideal test would have a curve which goes straight up from the origin to 100% sensitivity and stays the same over all false positive rates, since this would mean that there exists a decision threshold which has 100% sensitivity and a false positive rate of 0%.  In reality, this rarely happens and most ROC curves are messier.  However, once you understand this it's easier to see that the closer the area under the ROC curve is to 1.0, the better the test is.

ROC Curve for MELD, Child-Pugh, and sarcopenia


Dr. Dehghan: Acid-Suppressive Medication Use and the Risk of Nosocomial Gastrointestinal Tract Bleeding

Click to listen to a recording of the presentation.
Dr. Dehghan presented a retrospective study which used a propensity-score matching analysis (of which more below) to evaluate the relationship between receiving acid-suppressive therapy during hospitalization and developing nosocomial GI bleed among non-ICU patients admitted for something other than GI bleeding.

Their analysis suggested that (assuming that their retrospective statistical model is accurate) the size of any benefit associated with acid-suppression in non-ICU patients without known GI bleeding is likely to be very small.

They expressed their results in terms of an extrapolated number-needed-to-treat.  The NNT has been discussed elsewhere, and here I only want to point out why it's not really a legitimate way to report the results of this kind of study.  The NNT is a very intuitive and easily calculated metric, but it does have a few preconditions.  First, it is an expression of causal effect size, and therefore can only really be used to express the results of trials which establish causality.  Since this is a retrospective, observational study, it's only really capable of establishing a correlation between exposure and outcome, not a causal relationship.  Second, the NNT is inherently time-dependent; consider.  Let's say you wanted to find out how many people you need to deprive of food in order to kill one person by starvation.  At day 2, the NNT would be infinite since nobody dies of starvation after 48 hours.  At 360 days it would be one, since nobody can survive a year without food.   The NNT reported by the authors of this study isn't associated (or associable) with a defined length of therapy, and for that reason it's unclear how it can be interpreted.




Dr. Tran: Impact of Nasogastric Lavage on Outcomes in Acute GI Bleeding

Click to listen to a recording of the presentation.
Dr. Tran presented a study of the association between diagnostic nasogastric lavage in acute upper GI bleeding (UGIB) and various clinical outcomes, including 30 day mortality, length of stay, transfusion requirements, surgery, and time to endoscopy.  This retrospective, observational study used the same statistical method (propensity score matching) as Dr. Dehghan's trial, although Huang et al. give a better explanation of how it works.

The investigators in this trial observed little correlation between the exposure (nasogastric lavage) and any of their outcomes except time to endoscopy.  Similarly to Dr. Dehghan's study, they concluded that while "kick starting" definitive therapy might be a good thing, the effect size associated with nasogastric lavage in terms of improving key clinical outcomes is probably small.

Propensity score matching is an ingenious statistical technique which is designed to approximate the conditions of a prospective, randomized controlled trial (the gold standard of medical research) using retrospectively collected observational data.  This is how it works: you take a bunch of people with an exposure (usually to a medical therapy) and you try to figure out what all their conceivable risk factors for that exposure might be.  In this case, GI bleeding would be considered a risk factor for nasogastric lavage.  Then, you use your table of risk factors to develop a score which quantifies the degree of risk a patient has for the exposure - so, for example, somebody with a history of peptic ulcer disease, tachycardia, hypotension and recent aspirin use would have a higher score than somebody with none of those things.  This is the "propensity score" because it describes any give patient's "propensity" to get the exposure.

Then, you identify a separate cohort of patients who did not have the exposure, and you match pairs based on their propensity scores.  Now, presto, you have two groups who are matched in terms of the baseline characteristics you measured, one of which had the exposure and one of which didn't.  You look for relationships between your exposure and the outcomes you're interested in, analyzing your data essentially as though you were analyzing data from a prospective, randomized controlled trial.

Propensity score matching is attractive mainly because RCTs are difficult, expensive, time consuming, and often impossible to conduct.  It may represent a new way to draw conclusions from epidemiological databases which are better informed than those enabled by prior observational methodologies.

However, it has some significant disadvantages and at least one major epistemological problem.  One disadvantage is that it only works (even in theory) if your propensity score is comprehensive.  If it doesn't cover a major risk factor for your exposure, then your groups may not actually be comparable in terms of their baseline characteristics and there may be confounding.  Another way of saying this is that whereas true randomization balances all variables, whether you know what they are or not, propensity score matching only balances measured variables.  Another disadvantage, which is common to all retrospective analyses, is that the quality of your information and the means for collecting it aren't determined a priori (even if your conceptual model is), and so are subject to various forms of recall bias.

The major epistemological problem with propensity score matching is that it assumes that doctors behave probabilistically.  That is, it assumes that when presented with two patients who are exactly the same, there is a quantifiable percentage probability of your behaving in one way or another.  Take the case of a patient with a propensity score for nasogastric lavage of 0.7.  To understand propensity score matching, you have to believe that you, the treating physician, would order NGL 70% of the time if presented with an infinite series of exactly identical patients.  This flies in the face of one's experience of actually making medical decisions, which is that they're usually motivated by an infinite variety of specific factors which interact with one another in ways unique to each particular case. 

It is important to understand the basic methods of propensity score matching, since as EMRs become more widespread this sort of data-mining will become an increasingly easy and popular way to try to answer clinical questions without going through the enormous amount of effort and expense entailed in RCTs.  If you understand the basic assumptions, you can tell whether or not the method is appropriate and whether it's being used appropriately.