Highland Journal Club: GI JC 3.1.12

Dr. Villanueva: Sarcopenia as a Predictor of Mortality in Patients Being Evaluated for Liver Transplantation

Click to listen to a recording of the presentation.
Dr. Villanueva presented a recent article from Clinical Gastroenterology and Hepatology which evaluated the finding of sarcopenia (reduced skeletal muscle mass) as a predictor of mortality in cirrhotics being evaluated for transplantation. The investigators reviewed CT scans collected as part of evaluation for transplant and scored patients based on the presence of absence of sarcopenia, which they defined as skeletal muscle mass at the L3 level more than two standard deviations below normal young healthy adult levels.

Two patients with identical BMI. The patient on the left is sarcopenic.

The investigators found sarcopenia to be an independent predictor of mortality in cirrhotic patients being evaluated for transplantation. Interestingly, sarcopenia correlated rather poorly with the established prognostic scoring systems widely used in advanced cirrhosis, the Child-Pugh and MELD scores.

The investigators report their results in the form of a receiver operating characteristic (ROC) plot. ROC plots are commonly used to depict the characteristics of tests which produce a continuous range of results (as opposed to being unequivocally positive or negative). An ROC curve plots the true positive rate (i.e. the sensitivity) on the Y axis and the false positive rate (i.e. the complement of the specificity) on the X axis. You can see fairly easily that the ideal test would have a curve which goes straight up from the origin to 100% sensitivity and stays the same over all false positive rates, since this would mean that there exists a decision threshold which has 100% sensitivity and a false positive rate of 0%. In reality, this rarely happens and most ROC curves are messier. However, once you understand this it's easier to see that the closer the area under the ROC curve is to 1.0, the better the test is.

ROC Curve for MELD, Child-Pugh, and sarcopenia

Dr. Dehghan: Acid-Suppressive Medication Use and the Risk of Nosocomial Gastrointestinal Tract Bleeding

Click to listen to a recording of the presentation.
Dr. Dehghan presented a retrospective study which used a propensity-score matching analysis (of which more below) to evaluate the relationship between receiving acid-suppressive therapy during hospitalization and developing nosocomial GI bleed among non-ICU patients admitted for something other than GI bleeding.

Their analysis suggested that (assuming that their retrospective statistical model is accurate) the size of any benefit associated with acid-suppression in non-ICU patients without known GI bleeding is likely to be very small.

They expressed their results in terms of an extrapolated number-needed-to-treat. The NNT has been discussed elsewhere, and here I only want to point out why it's not really a legitimate way to report the results of this kind of study. The NNT is a very intuitive and easily calculated metric, but it does have a few preconditions. First, it is an expression of causal effect size, and therefore can only really be used to express the results of trials which establish causality. Since this is a retrospective, observational study, it's only really capable of establishing a correlation between exposure and outcome, not a causal relationship. Second, the NNT is inherently time-dependent; consider. Let's say you wanted to find out how many people you need to deprive of food in order to kill one person by starvation. At day 2, the NNT would be infinite since nobody dies of starvation after 48 hours. At 360 days it would be one, since nobody can survive a year without food. The NNT reported by the authors of this study isn't associated (or associable) with a defined length of therapy, and for that reason it's unclear how it can be interpreted.

Dr. Tran: Impact of Nasogastric Lavage on Outcomes in Acute GI Bleeding

Click to listen to a recording of the presentation.
Dr. Tran presented a study of the association between diagnostic nasogastric lavage in acute upper GI bleeding (UGIB) and various clinical outcomes, including 30 day mortality, length of stay, transfusion requirements, surgery, and time to endoscopy. This retrospective, observational study used the same statistical method (propensity score matching) as Dr. Dehghan's trial, although Huang et al. give a better explanation of how it works.

The investigators in this trial observed little correlation between the exposure (nasogastric lavage) and any of their outcomes except time to endoscopy. Similarly to Dr. Dehghan's study, they concluded that while "kick starting" definitive therapy might be a good thing, the effect size associated with nasogastric lavage in terms of improving key clinical outcomes is probably small.

Propensity score matching is an ingenious statistical technique which is designed to approximate the conditions of a prospective, randomized controlled trial (the gold standard of medical research) using retrospectively collected observational data. This is how it works: you take a bunch of people with an exposure (usually to a medical therapy) and you try to figure out what all their conceivable risk factors for that exposure might be. In this case, GI bleeding would be considered a risk factor for nasogastric lavage. Then, you use your table of risk factors to develop a score which quantifies the degree of risk a patient has for the exposure - so, for example, somebody with a history of peptic ulcer disease, tachycardia, hypotension and recent aspirin use would have a higher score than somebody with none of those things. This is the "propensity score" because it describes any give patient's "propensity" to get the exposure.

Then, you identify a separate cohort of patients who did not have the exposure, and you match pairs based on their propensity scores. Now, presto, you have two groups who are matched in terms of the baseline characteristics you measured, one of which had the exposure and one of which didn't. You look for relationships between your exposure and the outcomes you're interested in, analyzing your data essentially as though you were analyzing data from a prospective, randomized controlled trial.

Propensity score matching is attractive mainly because RCTs are difficult, expensive, time consuming, and often impossible to conduct. It may represent a new way to draw conclusions from epidemiological databases which are better informed than those enabled by prior observational methodologies.

However, it has some significant disadvantages and at least one major epistemological problem. One disadvantage is that it only works (even in theory) if your propensity score is comprehensive. If it doesn't cover a major risk factor for your exposure, then your groups may not actually be comparable in terms of their baseline characteristics and there may be confounding. Another way of saying this is that whereas true randomization balances all variables, whether you know what they are or not, propensity score matching only balances measured variables. Another disadvantage, which is common to all retrospective analyses, is that the quality of your information and the means for collecting it aren't determined a priori (even if your conceptual model is), and so are subject to various forms of recall bias.

The major epistemological problem with propensity score matching is that it assumes that doctors behave probabilistically. That is, it assumes that when presented with two patients who are exactly the same, there is a quantifiable percentage probability of your behaving in one way or another. Take the case of a patient with a propensity score for nasogastric lavage of 0.7. To understand propensity score matching, you have to believe that you, the treating physician, would order NGL 70% of the time if presented with an infinite series of exactly identical patients. This flies in the face of one's experience of actually making medical decisions, which is that they're usually motivated by an infinite variety of specific factors which interact with one another in ways unique to each particular case.

It is important to understand the basic methods of propensity score matching, since as EMRs become more widespread this sort of data-mining will become an increasingly easy and popular way to try to answer clinical questions without going through the enormous amount of effort and expense entailed in RCTs. If you understand the basic assumptions, you can tell whether or not the method is appropriate and whether it's being used appropriately.

Highland Journal Club

Friday, March 2, 2012

GI JC 3.1.12

No comments:

Post a Comment

Our Favorites

Blog Archive