Tuesday, December 21, 2010

Why medical testing is never a simple decision

by Marya Zilberberg, Healthcare, etc.


A couple of days ago, Archives of Internal Medicine published a case report online. Now, it is rather unusual for a high impact journal to publish even a case series, let alone a case report. Yet this was done in the vein of highlighting their theme of "less is more" in medicine. This motif was announced by Rita Redberg many months ago, when she solicited papers to shed light on the potential harms that we perpetrate in healthcare with errors of commission.

The case in question is one of a middle-aged woman presenting to the emergency room with vague symptoms of chest pain. Although from reading the paper it becomes clear that the pain is highly unlikely to represent heart disease, the doctors caring for the patient elect to do a non-invasive CT angiography test, just to "reassure" the patient, as the authors put it. Well, lo' and behold, the test comes back positive, the woman goes for an invasive cardiac catheterization, where, though no disease is found, she suffers a very rare but devastating tear of one of the arteries in her heart. As you can imagine, she gets very ill, requires a bypass surgery and ultimately an urgent heart transplant. Yup, from healthy to a heart transplant patient in just a few weeks. Nice, huh?

The case illustrates the pitfalls of getting a seemingly innocuous test for what appears to be a humanistic reason -- patient reassurance. Yet, look at the tsunami of harm that followed this one decision. But what is done is done. The big question is, can cases like this be prevented in the future? And if so, how? I will submit to you that Bayesian approaches to testing can and should reduce such complications. Here is how.

First, what is Bayesian thinking? Bayesian thinking, formalized mathematically through Bayes theorem, refers to taking the probability of disease being there into account when interpreting subsequent test results. What does this mean? Well, let us take the much embattled example of mammography and put some numbers to the probabilities. Let us assume that an otherwise healthy woman between 40 and 50 years of age has a 1% chance of developing breast cancer (that is 1 out of every 100 such women, or 100 out of 10,000 undergoing screening). Now, let's say that a screening mammogram is able to pick up 80% of all cancers that are actually there (true positives), meaning that 20% go unnoticed by this technology. So, among the 100 women with actual breast cancer of the 10,000 women screened, 80 will be diagnosed as having cancer, while 20 will be missed. OK so far? Let's go on. Let us also assume that, in a certain fraction of the screenings, mammography will merely imagine that a cancer is present, when in fact there is no cancer. Let us say that this happens about 10% of the time. So, going back to the 10,000 women we are screening, of 9,900 who do NOT have cancer (remember that only 100 can have a true cancer), 10%, or 990 individuals, will still be diagnosed as having cancer. So, tallying up all of the positive mammograms, we are now faced with 1,070 women diagnosed with breast cancer. But of course, of these women only 80 actually have the cancer, so what's the deal? Well, we have arrived at the very important idea of the value of a positive test: this roughly tells us how sure we should be that a positive test actually means that the disease is present. It is a simple ratio of the real positives (true positives, in our case the 80 women with true cancer) and all of the positives obtained with the test (in our case 1,070). This is called positive predictive value of a test, and in our mammography example for women between ages of 40 and 50 it turns out to be 7.5%. So, what this means is that over 90% of the positive mammograms in this population will turn out to be false positives.

Now, let us look at the flip side of this equation, or the value of a negative test. Of the 8,930 negative mammograms, only 20 will be false negatives (remember that in our case mammography will only pick up 80 out of 100 true cancers). This means that the other 8,910 negative results are true negatives, making the value of a negative test, or negative predictive value, 8,910/8,930 = 99.8%, or just fantastic! So, if the test is negative, we can be pretty darn sure that there is no cancer. However, if the test is positive, while cancer is present in 80 women, 900 others will undergo unnecessary further testing. And for every subsequent test a similar calculus applies, since all tests are fallible.

Let's do one more maneuver. Let's say that now we have a population of 10,000 women who have a 10% chance of having breast cancer (as is the case with an older population). The sensitivity and specificity of mammography do not change, yet the positive and negative predictive values do. So, among these 10,000 women, 1,000 are expected to have cancer, of which 800 will be picked up on mammography. Among the 9,000 without cancer, a mammogram will "find" a cancer in 900. So, the total positive mammograms add up to 1,700, of which nearly 50% are true positives (800/1,700 = 47.1%). Interestingly, the negative predictive value does not change a whole lot (8,100/[8,100 + 200]) = 97.6%, or still quite acceptably high). So, while among younger women at a lower risk for breast cancer, a positive mammogram indicates the presence of disease in only 8% of the cases, for older women it is about 50% correct.

These two examples illustrate how exquisitely sensitive an interpretation of any test result is to the pre-test probability that a patient has the disease. Applying this to the woman in the case report in the Archives, some back-of-the-napkin calculations based on the numbers in the report suggest that, while a negative CT angiogram would indeed have been reassuring, a positive one would only create confusion, as it, in fact, did.

To be sure, if we had a perfect test, or one that picked up disease 100% of the time when it was present and did not mislabel people without the disease as having it, we would not need to apply this type of Bayesian accounting. However, to the best of my knowledge, no such test exists in today's clinical practice. Therefore, engaging in explicit calculations of what results can be expected in a particular patient from a particular test before ordering such a test can save a lot of headaches, and perhaps even lives. In fact, I do hope that the developers of our new electronic medical environments are giving this serious thought, as these simple algorithms should be built into all decision support systems. Bayes theorem is an idea whose time has surely come.

5 comments:

Tim Richardson, PT said...

My "aha" moment came when I realized that conditions are expressed at different rates among my patients and that published estimates of these "base rates" are the starting point for all of my clinical decisions.

Disease prevalence was not a concept taught in physical therapy school in 1992.

While we don't want the government making top-down medical decisions too many of my colleagues don't want to follow evidence-based "rules" either because then "why would we need the therapist?"

According to McGee:
"...the diagnostic power of any sign depends in part on our ideas about disease prevalence, which in turn depend on our own personal interviewing skills and clinical experience."

Thank you for your well-written post and for bringing this article to our attention.

Tim

H G Stern said...

just to "reassure" the patient, as the authors put it.

I call BS.

The likely real culprit: CYA (aka Defensive Medicine).

Anonymous said...

Pre-test probability is impossible to determine in the cohort described in the article, because presenting signs and symptoms in females aged 45-60 for cardiac disease are extremely difficult apply to individual patients. Women are much more likely to suffer sudden death, and much more likely to have "atypical" symptoms secondary to coronary artery stenosis. The basic premise of the article is false.

Tim Richardson, PT said...

No.

Bayesian thinking applied to coronary artery stenosis is a very difficult problem since "intermediate" probabilities applied to females from 40 to 60 years of age with "atypical" chest pain vary from 10% to 90%.

The range is too wide to be clinically useful.

The example given may not lend itself to illustrating "best practice" use of Bayesian probabilities that is with high-volume, high risk, high cost conditions for which clinical uncertainty exists.

The example of older women with a well-known 10% pre-test probability of breast cancer being tested with mammography is a better example.

Tim

A VT Hokie said...

In 1975 Galen and Gambino published "Beyond Normality: The Predictive Value and Efficiency of Medical Diagnoses". It was popular in clinical pathology and diagnostic lab services becasue it helped us focus our resources on tests that provided value and conserved the scarce resources in the labortory. The reaction of the clinicians was quite varied, but mostly it was ignored in my personal experience. I used this in a family practice clinic in 2003 to explain why we would not benefit patients by performing troponin assays on site, especially since any suspected MI was transferred to the ED regardless of the test outcome. After I left the organization, the assay was added to its test panel.

It's nice to see their work in use; nicer still would be recognition of their contribution. In times of scarce resources (any time in healthcare) it is worth knowing and applying.