Friday, December 04, 2015

Common Sense solution to Monty Hall question - a "hotly debated" Math (Probability) topic

Monty Hall problem keep popping back up recently, most of them with a catchy headline “even geniuses/PhD got it wrong”.  

Some people made their names by sounding as if she is smarter than the smartest Mathematicians by claiming "they were all wrong" (i.e. here: The Time Everyone “Corrected” the World's Smartest Woman and here

Is she really "Smarter than the smartest".  I kind of call it an intellectual scam.  Here is why: 
(according to Wikipedia) 
"The behavior of the host is key to the 2/3 solution. Ambiguities in the "Parade" version do not explicitly define the protocol of the host. However, Marilyn vos Savant's solution (vos Savant 1990a) printed alongside Whitaker's question implies "

So she implied something in her solution that was not explicitly mentioned.  

Ok, according to Wikipedia, she later corrected (clarified) her question in a later post (which made her look the correct one), however the damage is already done, and most people argued against her "implied" on her original posted question which is still available to see on her website: 
Suppose you’re on a game show, and you’re given the choice of three doors. Behind one door is a car, behind the others, goats. You pick a door, say #1, and the host, who knows what’s behind the doors, opens another door, say #3, which has a goat. He says to you, "Do you want to pick door #2?" Is it to your advantage to switch your choice of doors?
Craig F. Whitaker
Columbia, Maryland

For this particular question text, take it literally without assuming informations out of the text, the answer should be "I don't know" like most Mathematicians PhD argued.  (see host behavior section of the Wikipedia page) But 

Marilyn's solution has its own merit. 

What does that mean?  Why most Mathematicians got it "wrong"? 
Well, there is a thing called Mathematical model ... basically that's the process of expressing a question in mathematical equations.  Here is what went wrong.  Marilyn built a mathematical model that "implied" the player KNOWS the host will always open a door with Goat.  The other PhDs, did not build this into their model. 

That's why Paul Erdös kept asking a "Common Sense" solution -- he was able to figure out it, but unfortunately for  whatever reason Andrew Vazsonyi refused to understand it, instead he decided to stick with his decision tree approach which means he stuck with his mathematical model refusing to understand if his model correctly reflects the actual question. (

So those "smarter than smartest" people have been implying information that was not explicitly provided in the original text, and they even built computer simulations with this assumption built in!

Since Paul Erdös' commonsense solution was lost by Andrew Vazsonyi, here is my common sense explanation and why I call Marilyn's posts intellectual scam. 

Here is why "Smart people" get it "wrong"(different understanding than Marilyn): 
There are 3 doors, one of them has a car behind it.  you didn't know which one, so  you randomly chose one, you have 1/3 chance winning the car (no dispute here!).  Now, the host will open a door, there would be the following possibilities: 
1) you selected car; host opens a door to (either one is) a goat - 1/3 chance
2) you selected 1 of the 2 goats, host open the other (only) goat and leave the car closed - 1/3 chance
3) you selected 1 of the 2 goats, host open the car and prove you got a goat and leave the car closed - 1/3 chance

OK, the interesting thing comes here: 
According to this text, the host opens another door shows you a goat -- was that by chance?  IF he opened the goat by chance, which means you were lucky we are not in scenario 3) above, then the 2 left over options are 50:50 -- the PhDs are correct, switch (get 2,) or not (get 1,) is no difference. 

Then why Marilyn could prove she was correct? Well, she proved her solution (switch) is best solution for the math model she built HOWEVER, it is NOT exact reflection of the actual text.  (here in the explanation of her solution she added "...and the host always opens a loser. ")  This is the key of her "scam", this "always" was not given in the original question (mathematically this means she is adding additional conditions and restrictions and changed the actual model). 

What she added "the host always opens a loser" basically eliminated possibility 3) above!!!  (Right! it means between 2) and 3), the host will always use his knowledge to eliminate 3) and leave you with scenario 2) only.  IF you KNOW he is forced by rule to do this(help you eliminate 3), then switch is the best strategy because in 2/3 chance where you make wrong choice in first place, the host is forced to take half of the possibility (scenario 3) away before he ask you if you want to switch.  So Marilyn is correct now to solve her mathematical model and she can prove it, even with computer simulations.  However, this rule was not given in the original question text. 

So now here is the commonsense description of a properly construct the question that Marilyn solved: 
a) a car is randomly placed behind 1 of 3 doors, the other two hide a goat each;
b) you have zero information to begin with other than a);
c) you can chose one of the 3 doors as round 1;
d) the host will then have to open a door with goat (he is forced by rule to do so, and he is not allowed to open a door with car behind it, he cannot chose to skip opening a door either, and you KNOW this rule), this is round 2;
e) you are then asked "switch or stay", this is round 3.

And here is the Commonsense Solution: 
1) for round 1 above, do you agree you have 1/3 chance winning, 2/3 losing? (of course, no one denies that, right?)
2) now it comes the interesting part, because of "host always open a loser", IF your first selection was a loser, then the host is FORCED to open the other loser (so by now both you and the host are holding on to a loser door, but there are only 2 losers, so the leftover must be the winner)  And it does not matter which loser you first selected, because by the rule, the host will have to eliminate the other loser anyways, so either one of the 2 loser you pick, the host had to remove the other loser and leave you with the winner. 
3) so now if I ask you "how likely was your first round choice correct?" the answer of course is 1/3, "how likely your first round choice was wrong?" the answer of course is 2/3. 
4) because of the host is forced by rule to eliminate one loser for you, if you were wrong to begin with, the only door left would be the winner.   So you should always select "switch" which means "I was wrong first round". 

You ask why? I still don't get it?  like mentioned before, the host was forced by rule to eliminate one of the 3 possibilities (the 1/3 chance where he opens the door showing the car), and if you know this rule, then here is the "possibilities"  (lets say you select door #1)
1) car is behind door #1, host can freely open either #2 or #3, he is not helping you - you have 1/3 chance
2) car is behind door #2, host is FORCED by rule to open #3, so if you switch, you get the #2, winner - you have 1/3 chance;
3) car is behind door #3, host is FORCED by rule to open #2, so if you switch, you get #3, winner - you have 1/3 chance;

See, if you stay, your only hope is it turns out to be 1), 1/3 chance; however, if it was either 2) or 3) above, the host is forced to point you to the winner by open the other loser.   So you have two times of these 1/3 chance winning if you switch. 

This ONLY works like Marilyn solved IF the host is forced by rule to always open a loser, and you KNOW this rule.  Her mathematical model assumed this very specifically, so was her computer simulations -- because the computer model was programmed to "always open a loser", the simulation result supported her claim.

Why it's a scam?  Because the original question did NOT specify this "host has to open a door with goat" which in essence change the round 3 to a different question, and most Mathematicians and PhD actually used common sense to think about it like this (without assuming the player knows a rule forces host to always open a goat):  If you do NOT KNOW the host opened goat door was the rule of the game (instead, you think you were lucky that he did not open a door with car prove you were wrong at round 2), then there is no reason to switch in round 3"

 There have been comprehensive discussion of host behavior and why he opens a door with goat i.e. some variation says host only open a door to goat if he knew you were correct, otherwise he would skip round 2, and ask you if you want to switch without opening a door in which case switch is a sure losing strategy.  Of course Marilyn's math model was not built this way.

Thursday, June 10, 2010

The missing "I" in IT (1)

Various articles have pointed out the diminished importance of CIO in organizations. It should not have been a surprise. 

Since the tech bubble in 2000, people should have foreseen this coming, why?
I call it the missing "I" in IT - or as I sometimes say - "IT is about I, before T", or in more avid language, the dog should wag the tail, not the other way around.

Apparently, most CIO focused too much on "T", instead of "I", hence deprecated their own value.  (The tail waged the dog)

OK, let's do a quick quiz, in less than 10 words, can you please explain what is "IT"?   "Computer and related stuff" is the most common answer, next ? "Computer and related stuff used in doing business" - at least business is mentioned.

However, both answers are in the T before I camp, not the I camp.  Computers and related stuff are the technology part of IT that "can be used" in processing information.  However, computers can be used to fry an egg in addition to process data.  (for those of you who are too young to know, when Intel first released its 80486 CPU, CPU fans were not around yet, and the CPU generated so much hit, there was a video posted people fried an egg on it.)

IT is about information.   One of the oldest and still widely used IT innovation is pen and paper.  Seriously, it was high tech at the time it was invented, and still very important tool we use this very day - although we don't call it high-tech anymore.

Now, what a low tech pen/paper has anything to do with our discussion of IT?  It was an invention to record information.  Think about it, what pens do to paper is not to make the paper more valuable in terms of weight, or any other physical measure --  instead, it is making marks that is now called information.

I always advocate people think about computers as pen/paper, because
IT should be about using technology to collect and process information.  
Not about how to sharpen your pencil more reliably.

If a CIO's focus is all "I make sure my computers run!" the CIO adds barely any value to a company.  Think about it, a company has all the pencils that are sharp all the time, but barely any one literate, when can these pencils do to help the business?  Unfortunately sharpening pencils that barely used happens all too often, and it depreciate the value of CIO in the business world.

Running the most reliable gadget that does little to enhance information processing within an Organization is a big trap for many CIOs who lost their prestiges status - they are just a glorified pencil sharpener.

IT should be about efficiency, not technology.   Management has always been about information, collect information regarding how things work, process information to understand how things work, analyze information to facilitate decision making, send out information to execute, present information to manage public relationship, etc.

Information is control - the better, more comprehensive information flows within an organization, the better control the leadership has.  Of course technology helps here.  That is actually the core value of IT - helps information flow with in an organization more effectively, efficiently with all the available technologies.

A classic example is inventory level.  Let's say a warehouse retailer, carries 15 days worth of inventory because it does inventory count weekly, it takes 3 days for the vendor to ship the products, and they need 5 days buffer - because they are not sure how fast each item sells.

Now with an inventory management solution, they should no longer need to count inventory every week - each in and out are traced, so they can save 6 days out of 7 days counting cycle.  So this inventory system can save 6 out of 15 days inventory for this retailer.  If this retailer is large, cutting 40% inventory can mean tens of millions.
Now, how about the POS system that tracks all items sold?  That will make 5 day buffer probably no more than 3 days (the order and shipping time).
A forecast system that forecast sales, and inventory level that enables pre-order of short of stock merchant will cut another 5 days off the cycle -- because no buffer needed any more, and orders can be placed early to accommodate shipping time.

So it can maintain 1 day inventory - now this is what now call JIT.  It was revolutionary at the time, and was enabled/enhanced by information technologies.  The companies invented this architecture knows what is IT.  How many CIOs have done anything remotely similar to such innovation?

Next we will talk about

  • out sourcing

Cloud-enabled Storage - Hype and opportunities in Cloud computing

Cloud enabled storage - the yet-to-be-born technology that enables a customer to "own" their data yet use computing power on the "Cloud" will be the next big thing.

When SAN and NAS was first introduced, they were considered unnecessary innovation that did nothing but extract money from customer's pocket.   However, gradually, they took over the enterprise storage because of their ability to split data storage from data processing (the servers, etc).  This prove to be a key capability customers need.

Moving to the next big thing - cloud, how can someone capitalized on it?

One key trap for Cloud computing is the ownership and control of data.  For consumers, many are willing (knowingly or not) to trade their ownership of data (i.e. privacy) for free services.  So Googles, Yahoos are all offering "the future of computing - Cloud" with minimum resistance.

The story is different in enterprise computing though, given the number of incidents in the past while related to Cloud computing, privacy breach, data leak, there is enough doubt in the enterprise market that the threat of handling core business data to some one one the Cloud is far out weight the benefit -- after all, enterprise rely on data to survive and make money, unlike consumers who are just consuming convenient data services.

So, what is the next big thing?  Since we have separated data storage from data processing with SAN, NAS, etc, it is logical that we can keep data storage in house, and consume data process (utility computing was not new idea at all).

Of course the current protocol used to access data will not be sufficient for reverse hosting the data.  New protocols and ideas need to be invented.

One probably candidate is in-memory database - which hosts the database in memory, and only need burst network bandwidth to load data initially.

Another approach is let customer run database services in house, and run application services on the Cloud.

On the fly data compression will again be a topic - the data flow through the network pipe between data owner and data processor should be compressed, and encrypted.   We know database access are great candidate for data compression because of the sparsity of the data retrieved.

Splitting data storage and data processing is the key for the success of Cloud computing in Enterprise world.  This won't be easy, and will not be driven by companies like Google or Facebook which business model is to exploit customer data.  New start-ups, or EMC, NetApp may have a chance.

I will be working on an architecture framework that enables secure and efficient data flow between data storage and data processing.

Data Storage <=> Data Processor <=====> Data Access

The new architecture is beyond just data transferring, because the application architecture we are using today are designed based on proximity between data and processor, the remote-access architecture now-a-days are more or less a patch to the data-processor combined model.

The new model need to be built on the assumption that Data Storage can be far away from Data Processor.  This assumption is the ultimate enabler for Cloud in enterprise.

If IBM catches this departing train, the day that most people will be using a handful of super fast computers MAY come back.
Well, since the super computers will not store your data, so they are not as scary as last time they appeared.

Of course part of the architecture is to ensure the data processor can not reconstruct data it processes -- better yet -- the processor's partition that used to process one customers data should be isolated and inaccessible by anyone else.

Sounds like an interesting idea, isn't it?