question everything: October 2010

Monday, October 18, 2010

Pac-Man

Proudly presenting: Ein Kurzfilmchen, bei dem ich durch Zufall die Gelegenheit hatte, mitzu"spielen" (im wahrsten Sinn des Wortes):

Weitere Informationen und einen recht netten Making-Of Bericht gibt es hier:
http://www.notsonoisy.com/pac-man/

Alles in allem ein gut verbrachter Samstag Vormittag. :-)

lG Birgit

Saturday, October 9, 2010

Serienname	Besser bekannt als...
Der Bulle von Tölz	Mama!
Close to home	Den Mörder zu finden ist nur der Anfang -- ihn vor Gericht dranzukriegen ist die wahre Herausforderung!
The Closer	Affektierte Schnepfe, die zwischen irgendwelchen konstruierten privaten Problemen zufällig auch manchmal Fälle löst.
Cold Case	Hatten wir ihn heute nicht schon zwei Mal verhört und wieder gehen gelassen?
Columbo	Ich hätte da noch eine Frage...
Criminal Intent	Böser Cop -- völlig durchgeknallter, sich an keine Vorschriften haltender und obendrein nicht schauspielern könnender Cop?
Criminal Minds	Isomorph zu Navy CIS, mit Dr. Reid statt McGee und Garcia statt Abby.
CSI	Sammlung von Wunschträumen eines Wissenschafters oder Spurenermittlers.
CSI: Miami	Horatio hat immer Recht.
CSI: New York	Ein weiterer Abklatsch von CSI.
Dexter	Ein brutaler Mörder treibt in dieser Stadt sein Unwesen! Ich meine, ein zweiter brutaler Mörder...
Doppelter Einsatz	Blondchen und die Femme Fatale
Ein Fall für zwei	Matula wird mindestens ein Mal pro Folge verprügelt, und der von seinem Auftraggeber vertretene Hauptverdächtige ist immer unschuldig.
House	Die Krankheit wäre eigentlich mit Steroiden heilbar gewesen, nur unsere Behandlungsversuche haben den Patienten fast umgebracht.
Law & Order: Special Victims Unit	Die Hälfte der Täter ist wegen Geisteskrankheit nicht verurteilbar, und die andere Hälfte bekommt eine viel zu milde Absprache.
The Mentalist	Oh, übrigens, ein Psychopath hat meine Frau und meine Kinder brutal ermordet. Und wie war Ihr Tag?
Monk	Folgendes ist passiert: Die Stifte liegen nicht genau in einer Reihe. Tuch!
Mord ist ihr Hobby	Ach wie furchtbar! In einem meiner Krimis, da...
Navy CIS	Egal was passiert, Abby findet es aufregend, und McGee ist das Opfer für alles.
Navy CIS: L.A.	Könnte man auf die Szenen mit Hetty kürzen ohne Wesentliches zu verlieren.
Numb3rs	Besonders intelligente Verbrecher nehmen immer die zufälligste aller zufälligen Routen.
Pfarrer Braun	Wieso wird Geiger eigentlich immer gleichzeitig und an den gleichen Ort versetzt?
Die Rosenheim-Cops	Was auch immer in einer Folge passiert -- es hängt direkt mit dem Fall zusammen.
SOKO Kitzbühel	Papa!
Tatort	Chance eines der interessanten Teams zu sehen: 2/17
Vier Frauen und ein Todesfall	Also i glaub net, dass des a Unfall woar.
Wilsberg	Ich hab doch nur dein Auto ausgeborgt und ruiniert, reg dich doch nicht so auf...
Without a trace	Sympathisches Polizistenpaar, das im Alleingang den Mangel an menschlichen Fehlern in allen anderen Serien ausgleicht.

The Collatz Rule (for informatics professors): "Before telling your students about the Collatz Conjecture, make sure you don't need the computer lab for anything else within the next two months."

-- Birgit

allzeit bereit

Den folgenden Slogan eines Frauen(!)magazins mögen sich alle, sowohl Emanzipationskämpfer als auch deren Gegner sowie alle dazugehörigen -Innen, auf der Zunge zergehen lassen:

fem.
frauen können immer
[1]

-- Birgit

[1] http://www.fem.com/index.html

Tuesday, October 5, 2010

Telephone surveys simply explained

Of course, this simplifies the issue, so let's honor it with a bit more detail.

(Note: I'm neither a specialist on statistics nor on telephone surveys. I'm only applying some common sense here.)

Sources of errors and biases

By and large there are two kinds of errors that can distort survey results:

Errors due to parts of the population not being reachable by phone ("selection bias").
Errors due to people not answering honestly ("measurement error").

Selection bias

There are various reasons why people can't be reached. Some don't have phones. This includes relatively isolated parts of the population, such as indigenous tribes and Amish people, but also homeless people, children, and people with alternative life styles who don't use phones by choice. Others just don't pick up when they see unknown or suppressed numbers.

Most telephone surveys deal with this by just extrapolating -- that is, hoping that those who could not be reached would have answered similar to those who did pick up. How well this approach works depends largely on how strongly the question is correlated to the reachability. Ideally, there is no correlation at all.

For example, if the question is "Do you prefer strawberry or vanilla ice cream?", then it's quite likely that this is not strongly linked to the possession of a phone. Probably the percentage of homeless who like vanilla better is similar to the percentage amongst millionaires.

Asking "Do you own a phone?", on the other hand, is akin to asking "Are you asleep?". You will not get a [credible] "no" for an answer.

Most questions are somewhere in the grey area between these extremes. For example, "Did you book a holiday on the Internet this year?" has distorted results, because of those people who did, a certain percentage is probably on this very holiday when the survey is being made.

Similarly, political opinion polls are likely to be distorted, because preference for a particular party is strongly tied to the social group -- and so is reachability by phone. A party with a program that favors young educated people for example might underperform in such surveys, because young educated people have mobile phones with caller ID, and know how to put anonymous calls onto an ignore list.

Other kinds of surveys have similar issues with selection bias. For example, a survey that is conducted in a shopping mall by randomly approaching people might not exclude people without phones, but it preselects on "people who go to shopping malls", and additionally has some preselection against assertive people who are more likely to refuse to take part in the survey.

Finally, the real selection bias nightmare are surveys where people can sign up themselves to participate. This is something that many non-profit organizations suffer from. Not having the money for a professional survey, they often send out "Please take our latest survey" emails to friends and mailing list subscribers -- which is a group that's usually very far away from the average opinion on the topic at hand. It's a bit like asking your five best friends if they like you, and then extrapolating that all the world loves you: Good for your self-esteem, but not very realistic.

Measurement error

Measurement errors are simpler to explain: Some people just don't answer honestly or correctly. Again, how much of a problem this is depends on the question. For some questions, people don't have any incentive to lie. Take for example the already mentioned vanilla or strawberry ice cream preference.

There are other questions however where people are much more likely to lie. "Do you cheat on your wife/husband?" is a classical one. But also certain political parties are generally underrated in pre-election polls because people are too embarrassed to admit that they vote for them. For example, we all know that nobody would ever vote for the FPÖ (except for those 10-20% that regularly do so at the elections, but miraculously never show up in any polls).

Besides intentional lying there are also questions where people simply don't know the correct answer. Smokers for example tend to underrate how much money they really spend on cigarettes, and few people can really tell how many hours per day they spend on the Internet or watching TV.

And sometimes people just don't understand the question. Ask enough people whether they have ever seen a phishing attack, and you will find some who have never heard of "phishing" before, hear "fishing" instead, and answer "no" because no, they have never seen a fish attack anyone.

Handling errors and biases

In order to handle these errors and biases, surveys can for example do the following things:

Estimating known biases

In some cases, previous surveys compared to real data can indicate what biases are to expect. For example, by comparing pre-election polls with election results, it's possible to see patterns how real results differ from predictions. Once it is known that the aforementioned FPÖ generally is underestimated in surveys, it is possible to estimate how much the bias distorts the result and try to calculate it away.

Weighting sample to match demographics

As mentioned before, some groups of the population are underrepresented in those samples because they are less likely to be reachable by phone than others. Surveys that also ask for age, gender and similar attributes can weight the answers so that the overall result better matches the distribution in the population. For example, if it is known that 30% of the population are older than 50 years, but of the people who took part in the survey only 10% are, then those 10% get more weight.

Stratified random selection

When proper distribution amongst certain population groups is crucial, instead of randomly calling phone numbers, the survey participants can be selected randomly per group, and if necessary even be surveyed by different means. For example, when it's important to avoid that homeless people are underrepresented -- for example, for a discount store they might be an important group of the customers --, then a survey will randomly select 200 phone numbers, and in addition perform 100 random in-person surveys at a homeless shelter. It will have other selection biases, but can avoid those that are known to be important to a particular survey.

Estimating result confidence

Even if there were no sampling biases and no measurement errors, there would still stay the problem that only a small fraction of the population was asked. So how much can asking 10 people really tell us about the average person?

Let's look at the simple example from above in detail. We want to know whether Austrians prefer strawberry or vanilla ice cream. We randomly choose 10 phone numbers and call them. 1 person likes strawberry and 9 prefer vanilla. To certain news papers this would be enough evidence for saying that 90% of the Austrians prefer vanilla ice cream. But what do we really know? The only thing we know for sure at this point is that 9 of the 8000000 Austrians like vanilla ice cream. Or, more precisely, that 9 Austrians say that they like vanilla ice cream.

The simple truth is that after calling n people, all we know for sure is how these n people answered.

Calling another n could change the whole result. The next 10 people might all like strawberry, and suddenly the preference for vanilla plummets from 90% down to 45%. And this is where probability comes in.

It is, theoretically, possible, that we have accidentally called the only 9 Austrians who like vanilla ice cream. It is possible that all the other 7999991 Austrians hate it, and that the Austrian preference for vanilla ice cream is thus at 0.0001125%. But how likely is it that with only 10 phone calls we really managed to reach these 9? Right. It's about as likely as calling 10 random phone numbers and meeting 9 attractive, single lottery millionaires, which is less likely than winning in the lottery yourself, which is less likely than being struck by lightning.

I will spare you the mathematics, but the most likely explanation for the 9 out of 10 vanilla answers is that 90% of the Austrians prefer vanilla.

Since we only called 10 persons, the result is not very reliable, though. If we call 10000 people and 9000 of them say "vanilla", we would still guess that 90% of the Austrians prefer vanilla, but we would be more confident about our estimate.

Professional surveys will therefore indicate the margin of error, which indicates how reliable the results are, by giving the range within which the real result lies with a high probability, usually 95% or 99%. And here the problems start again, because it's not possible to calculate that range without making even more assumptions. For example, do you assume that any result between 0 and 8000000 Austrians preferring vanilla is equally likely, or do you assume that roughly half of the Austrians preferring vanilla and the other half strawberry is much more likely than nobody liking strawberry?

The truth is that even the estimate can only be estimated. It can be estimated relatively well, though, and it always holds that the larger the sample size, the higher the reliability of the result. So both with 9 out of 10 and with 9000 out of 10000 answers in favor of vanilla ice cream we estimate the real result to be "around" 90%, but with different confidence levels: With 9 out of 10 answers for vanilla, the real result is with a probability of roughly 99% between 50% and 100%. With 9000 out of 10000 answers, it's with a probability of roughly 99% between 88% and 92%.

Conclusion

Surveys can have their uses, but they aren't the absolute truth and should be taken with a grain of salt. At the end of the day, the only thing they tell us for sure is how the people who were called have answered.

-- Birgit

Hahaha

... or was it "Hahahahaha"?

kR Birgit

Monday, October 4, 2010

srorriM

Insight of the day: "Mirrors don't flip left and right, you know. They flip up and down, but most mirrors are turned sideways because otherwise it looks weird."

kR Birgit

Hinweis / Note

Alle Artikel in diesem blog sind persönliche Meinungen und erheben keinen Anspruch auf lückenlose Einhaltung der journalistischen Sorgfaltspflicht. Informationen zu tagesaktuellen und geschichtlichen Themen entsprechen dem, was ich bei manchmal recht unaufmerksamem Zuhören aus Nachrichten und Geschichtsunterricht aufgeschnappt habe und hin und wieder auf Wikipedia nachlese und können mitunter inkorrekt sein.

All articles in this blog are personal opinions and do not claim complete compliance with the principles of journalistic diligence. Informations about current or historical topics are whatever I picked up while listening, sometimes rather inattentively, to news and history lessons, and occasionally read up in Wikipeda, and may therefore be incorrect at times.

Bei Artikeln, die auf Deutsch und Englisch verfügbar sind, handelt es sich nicht notwendigerweise um wörtliche Übersetzungen, der Inhalt wird aber im Allgemeinen sinngemäß übereinstimmen.

Articles that are available in both German and English are not necessarily literal translations, but they will in general convey the same meaning.

question everything

Monday, October 18, 2010

Pac-Man

Saturday, October 9, 2010

Wahre Namen

Collatz Rule

allzeit bereit

Wednesday, October 6, 2010

Zitat des Tages