Tuesday, October 5, 2010

Telephone surveys simply explained



Of course, this simplifies the issue, so let's honor it with a bit more detail.

(Note: I'm neither a specialist on statistics nor on telephone surveys. I'm only applying some common sense here.)

Sources of errors and biases


By and large there are two kinds of errors that can distort survey results:
  • Errors due to parts of the population not being reachable by phone ("selection bias").
  • Errors due to people not answering honestly ("measurement error").


Selection bias

There are various reasons why people can't be reached. Some don't have phones. This includes relatively isolated parts of the population, such as indigenous tribes and Amish people, but also homeless people, children, and people with alternative life styles who don't use phones by choice. Others just don't pick up when they see unknown or suppressed numbers.

Most telephone surveys deal with this by just extrapolating -- that is, hoping that those who could not be reached would have answered similar to those who did pick up. How well this approach works depends largely on how strongly the question is correlated to the reachability. Ideally, there is no correlation at all.

For example, if the question is "Do you prefer strawberry or vanilla ice cream?", then it's quite likely that this is not strongly linked to the possession of a phone. Probably the percentage of homeless who like vanilla better is similar to the percentage amongst millionaires.

Asking "Do you own a phone?", on the other hand, is akin to asking "Are you asleep?". You will not get a [credible] "no" for an answer.

Most questions are somewhere in the grey area between these extremes. For example, "Did you book a holiday on the Internet this year?" has distorted results, because of those people who did, a certain percentage is probably on this very holiday when the survey is being made.

Similarly, political opinion polls are likely to be distorted, because preference for a particular party is strongly tied to the social group -- and so is reachability by phone. A party with a program that favors young educated people for example might underperform in such surveys, because young educated people have mobile phones with caller ID, and know how to put anonymous calls onto an ignore list.

Other kinds of surveys have similar issues with selection bias. For example, a survey that is conducted in a shopping mall by randomly approaching people might not exclude people without phones, but it preselects on "people who go to shopping malls", and additionally has some preselection against assertive people who are more likely to refuse to take part in the survey.

Finally, the real selection bias nightmare are surveys where people can sign up themselves to participate. This is something that many non-profit organizations suffer from. Not having the money for a professional survey, they often send out "Please take our latest survey" emails to friends and mailing list subscribers -- which is a group that's usually very far away from the average opinion on the topic at hand. It's a bit like asking your five best friends if they like you, and then extrapolating that all the world loves you: Good for your self-esteem, but not very realistic.

Measurement error

Measurement errors are simpler to explain: Some people just don't answer honestly or correctly. Again, how much of a problem this is depends on the question. For some questions, people don't have any incentive to lie. Take for example the already mentioned vanilla or strawberry ice cream preference.

There are other questions however where people are much more likely to lie. "Do you cheat on your wife/husband?" is a classical one. But also certain political parties are generally underrated in pre-election polls because people are too embarrassed to admit that they vote for them. For example, we all know that nobody would ever vote for the FPÖ (except for those 10-20% that regularly do so at the elections, but miraculously never show up in any polls).

Besides intentional lying there are also questions where people simply don't know the correct answer. Smokers for example tend to underrate how much money they really spend on cigarettes, and few people can really tell how many hours per day they spend on the Internet or watching TV.

And sometimes people just don't understand the question. Ask enough people whether they have ever seen a phishing attack, and you will find some who have never heard of "phishing" before, hear "fishing" instead, and answer "no" because no, they have never seen a fish attack anyone.

Handling errors and biases


In order to handle these errors and biases, surveys can for example do the following things:

Estimating known biases

In some cases, previous surveys compared to real data can indicate what biases are to expect. For example, by comparing pre-election polls with election results, it's possible to see patterns how real results differ from predictions. Once it is known that the aforementioned FPÖ generally is underestimated in surveys, it is possible to estimate how much the bias distorts the result and try to calculate it away.

Weighting sample to match demographics

As mentioned before, some groups of the population are underrepresented in those samples because they are less likely to be reachable by phone than others. Surveys that also ask for age, gender and similar attributes can weight the answers so that the overall result better matches the distribution in the population. For example, if it is known that 30% of the population are older than 50 years, but of the people who took part in the survey only 10% are, then those 10% get more weight.

Stratified random selection

When proper distribution amongst certain population groups is crucial, instead of randomly calling phone numbers, the survey participants can be selected randomly per group, and if necessary even be surveyed by different means. For example, when it's important to avoid that homeless people are underrepresented -- for example, for a discount store they might be an important group of the customers --, then a survey will randomly select 200 phone numbers, and in addition perform 100 random in-person surveys at a homeless shelter. It will have other selection biases, but can avoid those that are known to be important to a particular survey.

Estimating result confidence


Even if there were no sampling biases and no measurement errors, there would still stay the problem that only a small fraction of the population was asked. So how much can asking 10 people really tell us about the average person?

Let's look at the simple example from above in detail. We want to know whether Austrians prefer strawberry or vanilla ice cream. We randomly choose 10 phone numbers and call them. 1 person likes strawberry and 9 prefer vanilla. To certain news papers this would be enough evidence for saying that 90% of the Austrians prefer vanilla ice cream. But what do we really know? The only thing we know for sure at this point is that 9 of the 8000000 Austrians like vanilla ice cream. Or, more precisely, that 9 Austrians say that they like vanilla ice cream.

The simple truth is that after calling n people, all we know for sure is how these n people answered.

Calling another n could change the whole result. The next 10 people might all like strawberry, and suddenly the preference for vanilla plummets from 90% down to 45%. And this is where probability comes in.

It is, theoretically, possible, that we have accidentally called the only 9 Austrians who like vanilla ice cream. It is possible that all the other 7999991 Austrians hate it, and that the Austrian preference for vanilla ice cream is thus at 0.0001125%. But how likely is it that with only 10 phone calls we really managed to reach these 9? Right. It's about as likely as calling 10 random phone numbers and meeting 9 attractive, single lottery millionaires, which is less likely than winning in the lottery yourself, which is less likely than being struck by lightning.

I will spare you the mathematics, but the most likely explanation for the 9 out of 10 vanilla answers is that 90% of the Austrians prefer vanilla.

Since we only called 10 persons, the result is not very reliable, though. If we call 10000 people and 9000 of them say "vanilla", we would still guess that 90% of the Austrians prefer vanilla, but we would be more confident about our estimate.

Professional surveys will therefore indicate the margin of error, which indicates how reliable the results are, by giving the range within which the real result lies with a high probability, usually 95% or 99%. And here the problems start again, because it's not possible to calculate that range without making even more assumptions. For example, do you assume that any result between 0 and 8000000 Austrians preferring vanilla is equally likely, or do you assume that roughly half of the Austrians preferring vanilla and the other half strawberry is much more likely than nobody liking strawberry?

The truth is that even the estimate can only be estimated. It can be estimated relatively well, though, and it always holds that the larger the sample size, the higher the reliability of the result. So both with 9 out of 10 and with 9000 out of 10000 answers in favor of vanilla ice cream we estimate the real result to be "around" 90%, but with different confidence levels: With 9 out of 10 answers for vanilla, the real result is with a probability of roughly 99% between 50% and 100%. With 9000 out of 10000 answers, it's with a probability of roughly 99% between 88% and 92%.

Conclusion


Surveys can have their uses, but they aren't the absolute truth and should be taken with a grain of salt. At the end of the day, the only thing they tell us for sure is how the people who were called have answered.

-- Birgit

9 comments:

  1. "But also certain political parties are generally underrated in pre-election polls because people are too embarrassed to admit that they vote for them. For example, we all know that nobody would ever vote for the FPÖ (except for those 10-20% that regularly do so at the elections, but miraculously never show up in any polls)."

    Actually, that hasn't been true for a long, long time.

    ReplyDelete
  2. It's exaggerated, yes, but even at the recent elections in Styria the FPÖ was around 7-10% in most pre-election surveys, and ended up with 10.66%.

    ReplyDelete
  3. Nope, the voting percentage was centered around 8–10% for the FPÖ, and the trend was clearly towards the higher end of that interval as the campaign drew to a close; see http://neuwal.com/index.php/wahlumfragen/wahlumfragen-steiermark/ .

    ReplyDelete
  4. That "trend" is well within the confidence interval of those surveys. But well. If it makes you happy, I will admit that the FPÖ might have been a bad example -- obviously their voters at least own up when asked by professional surveyists. I haven't personally met any confessing FPÖ voters yet, though.

    All that aside: Of all the things you could possibly have said about this posting -- like "Cool, you draw comic-like things now?" or a simple "Great article!" --, you go after the one tiny little bad example? ;)

    ReplyDelete
  5. If you start throwing around the confidence interval, you should also have realized that the result is well within the confidence interval of the last polls. ;p

    I have met quite a few FPÖ voters, though not of my own volition (when I was at my Zivildienst at the Sozialamt, one of the trainees was quite clearly an FPÖ supporter). Yeah, the the reverse Bradley effect/Shy Tory effect seems to have vanished for the FPÖ quite a few years ago, probably around 2000 if not before that.

    Besides that: Cool comic and great article. ;)

    ReplyDelete
  6. I'm sure we could continue this discussion ad nauseam if I now pointed out that 1) the confidence interval of the _sum_ of those surveys is a lot smaller, so the final result is not within it, whereas 2) the "trend" is within the margin of error for the individual surveys, not to mention that 3) two of the surveys that show this alleged trend were performed at the same day.

    My latest reference point is the NRW 1999, and sadly the effect wasn't visible even back then, otherwise I would already have rubbed your nose it in. ;)

    Thanks, I'm feeling better now. ;)

    ReplyDelete
  7. I just wrote you a lengthy reply, but the fucking blog software ate it, so let's just leave it at "you're still wrong". ;)

    ReplyDelete
  8. I'll save the time it would take to write an equally long reply, and just say "and so are you". ;)

    ReplyDelete
  9. This comment has been removed by a blog administrator.

    ReplyDelete