Saturday, July 05, 2008

Junk science: How to lie with numbers

Epidemiology just sounds like the sort of thing you don't want to criticize in public without sounding like a fool. Who are you, mere mortal, to doubt the findings of someone with a title as impressive and as likely to cause your tongue to twist into knots as an epidemiologist? It's seven syllables long, for crying out loud. Anyone with a title that long must know what they are talking about, right?

That, at least, is what epidemiologists would like you to think. So, let's start by demystifying this mystical art. Although it is a branch of medicine, fear not. It ain't brain surgery. Yes, it does involve statistics, but we won't go so far into it that you'll have to do any math.

According to the American Heritage Dictionary, epidemiology is

The branch of medicine that deals with the study of the causes, distribution, and control of disease in populations

To put that in simple terms, they try to figure out why it is that some people get sick with certain diseases and others don't. Cancer is a big one. We don't know, publicly at least, what the causes of cancer are. Of course, medical doctors and cancer researchers have to same something when people ask the question. Shrugging the shoulders and saying, “damned if I know” isn't exactly a recipe for research grants flowing in.

The answers they struggle to give border on the downright funny, at times. For example, Cancer Research UK has a page on their website with the question, “What causes cancer?” right at the top. Their introduction to the answers reads, “This page tells you about what actually causes cancer,” then goes on to say, “There is no single cause for any one type of cancer.”

Hmmm. That's sounds a bit like a dodge. Perhaps further down the page they'll get into the specifics. After all, if I asked a doctor “What causes the common cold?” she'd be able to tell me about rhinoviruses getting into the cells in my nose and my body's immune system reacting to the irritation with inflammation and increased mucus production. Solid information.

Not so with the cancer researchers. All they have, with the possible exception of some viral links to certain types of cancer, are epidemiological studies that say there is a statistically calculated risk of getting certain types of cancer if you engage in certain types of activities. Calculated risk, not cause and effect relationship. They aren't even willing to pin the calculated risks down. Hence, the statement, “There is no single cause for any one type of cancer.” Keep that statement in mind, by the way. It was published by serious cancer researchers with some serious funding. It will be important a bit later on.

In other words, you won't find anyone able to tell you that certain toxins, radiation or what have you cause specific changes in cells that get them growing out of control. The only thing they can tell you is, statistically you may be more likely to get these types of cancers if you do these other types of things.

Now, how do you figure out just what these types of cancer causing things might be? You can't do the typical scientific thing, with a double-blind study and control group. That would require you asking people to come in and be exposed to what you believe to be toxins to see if they get sick. The problem there is, unless you happen to work for the Nazi party or the CIA, a positive result in your study will land you in jail for reckless endangerment. Instead, you create an epidemiological study.

The best way to get a handle on exactly what one of these studies is, is to do your own. It's easy and it's fun, so dive right in. Please note, though, that we aren't going to do our instructional study on a disease, so it technically isn't an epidemiological study. That word stems from the word “epidemic” and you probably don't have any epidemics running rampant around your computer at the moment to work with. We are just going to do a simple statistical study so that we can understand the concepts involved. Ready? This won't hurt a bit.

Find a die, by which I mean the singular for dice, not the outcome of an epidemic. You just want a simple six-side die, not one of those bizarre sphere approximations from Dungeons and Dragons. Roll it and it stops with some number face up. Let's say the number is three. Roll it again. Let's say the number five. Roll it one more time. This time it lands on six.

At this point, do you have enough information to make any predictions about how often those numbers might come up in some arbitrary numbers of rolls? If you answered no, you are doing well so far. See, I told you this would be easy.

Take a closer look at the die. The numbers are represented by little divots carved into the sides of the die. The number one has only one divot taken out, while on the exact opposite side of the die there are six divots taken out. This might get you to wondering, is the one side of the die heavier than the six side of the die since there is more material removed from the six side? How would you find out? If you don't have the equipment available to suspend the die and see if it is balanced no matter how you turn it, you'll just have to do a whole lot of rolling of that die to see if number one shows up more often.

With this, we have introduced two very important factors in an epidemiological study: is the effect being studied physically plausible (in an actual epidemiological study this would read “biologically plausible”) and are their confounders? In this particular case, the difference we noted between the one side and the six side serves in both capacities, depending on how you look at it.

In our first series of rolls, which we stopped after three trials, you likely assumed that given enough rolls you would see a pretty even distribution of numbers come up, right? That's the theory behind dice, at least. The rolls are random.

If that is what you are seeking to show, that the rolls are random, the difference between the various faces of the die is a confounder. Simply put, confounders are the variations that can throw the results of your study off. They confound your study. If, on the other hand, what you are trying to discover is whether the die will land with the six side up more often than the one side and about how much more often, the difference between the sides is a test of plausibility. It seems to make sense that the one side would be a little heavier and so would end up face down (with the six side then face up) a little more often.

So you start to roll. You roll the die until your arm hurts. You stop and you tally up the number of rolls you've done so far. Is it enough? You don't know how much heavier one side might be than the other, so how do you figure out how many rolls you need to see the expected result? For instance, if the one side were twice as heavy as the six side, you would expect the one side to land face down quite a bit. You may well be convinced of the effect after only 100 rolls of the die or so. But that isn't the case. If the die you are using has a one side that is, indeed, heavier than the six side, it is by a very minuscule amount. So, how many rolls?

This introduces another important concept in epidemiological studies: what is your sample size? Or, how much data do you actually have to work with? That amount of data will be one factor in determining the confidence interval of your study.

Understanding confidence intervals is simple. If you've ever heard the results of a poll given in a news broadcast, you've heard a confidence interval given. Typically, in the political polls, the confidence interval is plus or minus 3%. That simply means that the numbers given are in the middle of a range of probable actual values that extend three percentage points above and below what is being reported. So, if they say that candidate A is leading candidate B by 51% to 49% plus or minus 3% (the confidence interval) what they are actually saying is that it is perfectly possible that candidate B is actually leading since the difference between them is only 2%.

What they never ever tell you is the confidence level used to calculate that range. They leave it to you to assume that they are 100% sure that the actual values are within 3% of what they are reporting, but that confidence level is impossible. The gold standard for confidence level, at least in epidemiology, is 95%. That means that when they give the range of possible values, the confidence interval, they are 95% sure that the real numbers are somewhere in that range. Put another way, they are saying that there is a 5% chance that the actual values could be something else entirely.

When you think about it, leaving out the confidence level is a pretty big omission. As far as you know, that confidence interval of plus or minus 3% represents a confidence level of 50%, which is to say there is only a 50/50 chance that the actual numbers are within 3% of reported, which is to say that the entire poll is really just a roll of the dice...give or take a few percent.

Now, get one more die. That should be easy to come by since they generally come in pairs. I mean, a lot of people don't even realize there is a singular form of the word dice. But we don't want matching dice. We want dice that have some obvious dissimilarity. We'll go with size. We want to know if size really does matter.

Imagine we have one normal sized die, the kind you'd find in any casino, and a one of those smaller die like you might find in a travel sized game of some sort. What we are going to study is whether size has an effect on how often the number six is rolled. You can already guess how we are going to do it. We are going to roll the dice a whole lot of times and tally our results. Since I'm not about to sit here and roll dice until my arm hurts, I'll make up some numbers.

Number of rolls: 100
Large die sixes: 15
Small die sixes: 18

Already, given the data we have, it wouldn't take an epidemiologist to figure out that it is looking like you want a small die if you want to roll a six, or a larger die if you don't. To put it in epidemiological terms, we have the data to calculate the relative risk or risk ratio of rolling a six with a particular size of die.

The math is simple. First, we need to determine the risk of rolling a six for each die. Since we rolled each 100 times, the risk for the large die is 15/100 or .15 (15%). Apply the same math to the small die and you get 18%. The relative risk (how much riskier is a six in a small die over a large die) is .18/.15, which equals 1.2.

So, in a study that was looking to see if you were at greater risk of rolling a six with a small die rather than a large die, an epidemiologists would state that there is a relative risk (RR) of 1.2, or a 20% greater chance of rolling a six with a small die. Because we were looking at the risk of throwing sixes with the small die, the small die risk was the top number in the division. If we were looking for the large die relative risk, we would have calculated the RR as .15/.18 = .83. That is a relative risk below 1, which means that you have a lower risk of throwing six with a large die. A relative risk of 1.0 would mean there was no difference between the two at all.

Frankly, 20% sounds like pretty good odds, doesn't it? Yet, if you'll remember the part about confidence intervals (how much plus and minus and how sure are we that somewhere in there is the actual risk) you'll realize that we don't have nearly enough information to draw any conclusions. For instance, if we say we want a confidence interval at a 95% confidence level, we might find that our confidence interval tells us that the actual number is somewhere between, say, 0.9 and 1.5.

That means that, despite the number reported of RR = 1.2 (a 20% greater chance with the small die) the study really shows that we are 95% confident that the small die will throw somewhere between 10% fewer and 20% more sixes than the large die.

Did you catch that? Our study shows absolutely nothing at all, despite reported numbers of a 20% greater chance of throwing a six with a small die. The only conclusion that can be drawn from our study is this: a small die may or may not land with the number six face up more often than a large die. That's all we have.

But is this relative risk number overblown?

Relative risk is an interesting beast. Pharmaceutical companies just love to put numbers in terms of relative risk because the numbers are larger than absolute risk (AR). Those of you who feel comfortable with numbers and have been following along closely probably already noticed that the 20% increase we reported seems grossly inflated. The fact is, according to the numbers you can expect to roll only three more sixes with a small die than with a large one. Without resorting to a calculator, I'd estimate that to be somewhere around 3%.

In other words, there is an increase in sixes of 3% (thee out of one hundred rolls) with the small die. That means that in order to see the effect, statistically speaking that is, we'd need to roll around 33 times. If our game includes fewer rolls than that, we are probably not going to see any sixes benefit from rolling the small die. In epidemiological circles, those 33 rolls are called the number needed to treat or NNT.

For a quick practical example, let's say a pharmaceutical company has put out a new blood pressure medicine, call Premazine. [Author's note: Premazine is not a real drug. Any resemblance to a real drug is purely coincidental. The numbers quoted are for the purpose of illustration only and are not meant to represent and/or imply actual studies of the effectiveness of any real medication.] They gave this medicine to 500 people and gave a placebo to another 500. Let's say that out of the group that got the medicine, 25 showed significant reduction in blood pressure. In the group that got the placebo, 15 showed a reduction in blood pressure.

That means that the medication seemed to have a positive effect on 5% of those who took it. Hardly a number that will get bottles of the stuff moving off the shelf, especially when you hear all of the potential side effects!

To get a number that will move some product, they will use the relative risk or, in this case it's inverse, the relative risk reduction. Out of the medicated group, 5% (.05) saw a reduction of blood pressure while 3% (.03) saw a reduction while only thinking they took the medication. Just put the medicine number on top of the division and you get .05 / .03, a whopping 1.67 or 67%! Now there is a number a marketing department can sink its teeth into. The sales pitch will probably read something like this:

Clinical trials have shown that taking Premazine resulted in as much as a 67% reduction in high blood pressure over taking no medication at all.

Boy howdy, that sure sounds impressive. If you didn't know anything at all about statistics, you might even be impressed enough to beg your doctor to prescribe Premazine before you keel over and die. But before you go running to your doctor, stop and think about what that 67% really represents. Out of the 1,000 people in the study, there was only a 2% difference in how many got better taking the drug as opposed to how many just got better. That means that the number of people that will have to take that drug before the statistics kick in and one of them gets a reduction in blood pressure is 500, our number needed to treat (NNT).

That's right. A 67% reduction in this case really means that out of 500 people taking the drug, only one will see some positive effects (statistically speaking) while 499 will see nothing but side effects, not the least of which is the lightening of their wallets.

Alright, now you know enough about epidemiological studies to look at some numbers and actually make sense of them. I'm sure quite a few people reading this never thought they could say that! Let's try out our new skills.

Oh wait, there is one more thing you should know. Despite some truly horrific studies by epidemiologists over the years, please don't think that all epidemiological studies are junk science--they aren't--or that these people have no standards. They do. They also have rules of thumb that they go by to determine if a study is likely to have any validity at all that will be helpful to us later on.

  1. A relative risk below 2 is unlikely to show a valid result, regardless of the confidence interval. A relative risk of at least 3 is preferred, 4 is best. Many peer reviewed journals, like the New England Journal of Medicine, would not typically publish a study without a relative risk of at least 3.

  2. Confidence level should be 95%. A lower confidence level is a sign that someone is cooking the numbers to prove something that isn't there.

  3. If the confidence interval includes 1.0, as our dice study did (0.9 – 1.5), there is no statistical significance at all. We've got nothing.

Just to recap before moving on, here are the terms we have learned:

  • Biological plausibility

  • Confounders

  • Confidence interval / confidence level

  • Relative risk or risk ratio

  • Absolute risk

  • Number needed to treat

Next episode: The case against smoking (and mirrors)