« August 2004 | Main | October 2004 »

September 23, 2004

Oregon Visit

Over the long Labor Day weekend, Elizabeth and I were the well-treated guests of Mary Sillick (Elizabeth's aunt), in Eugene, Oregon. It was a relaxing trip, with visits around the University of Oregon and Eugene, the local pools, and Mary's comfortable backyard. The highlight of the trip was an overnight camping trip to Lake Waldo, about 60 miles southeast of Eugene. I've attached some pictures from that trip.

There seems to be similar themes to camping. First and foremost, it is a good chance to bond with your fellow campers, as it requires working to common goals and experiencing the same events. There seems to be a level of mild discomfort, as one does not have all of the luxuries of modern life. (The discomforts mostly include cold and lack of bathrooms and showers.) It also seems like it is easier to lay aside the abstract struggles of day-to-day life, as there is physical work to do and (of course) amazing sights to see in nature. Camping is not hard, nor does it have to be out in the bare-bones deep wilderness, but it does have its attractions. It's not for every day, but I've never regretted any of my camping trips.

Our tent set up, in the Shadow Bay campground of Lake Waldo.
Elizabeth and Aunt Mary preparing the campsite meal. Somehow, campsite coffee always tastes better.
Exploring the lake shore, with Waldo the dog. Waldo is a very sturdy and intelligent standard poodle.
A canoe on the lake. The lake had amazingly clear water. I don't know the people in the canoe, but I thought it was a great shot. Unfortunately, I was a bit quick on the draw, and ended up with a crooked horizon. On top of that, one of the canoers said to the other: "Did that guy just take our picture?"
Mary and Waldo playing in the water.
More explorations of the environs. I thought that we were very fortunate with our campsite, as it was close to the shore, bathrooms, and the trails around the area. Not bad, for a Labor Day without reservations.
As the sun set, the site became cold. In my hierarchy of values, that is a small price to pay for the amazing view of the stars.
A view of the lake, from the shore nearest our campsite.
Waldo waits patiently for the ladies to awaken.
A good-looking couple.
Another nice picture of us, courtesy of Aunt Mary.

September 21, 2004

Hypergeometric Samples

Once a month, a random sample (of size n) is chosen from a population (of size M). This population might be of student-athletes for NCAA drug-testing, or taxpayers for audit, or some other similar scenario. How likely is it that there will be x repeats from one month to the next month? This is a useful question to ask, as it can help point out systematic problems in the sampling process. Student-athletes, in the hypothetical situation, might be annoyed that they are called for drug-testing many times in a row. Or it might point to the unjust use of the tax audit process for political reasons (assuming, of course, that the tax system could be just at all).

Let's start by trying to characterize this problem. It is a discrete problem, as we are not looking at fractional people. The population size is finite (say, M=700). The sample size may be significant (say, n=40). The individual objects do not matter (i.e. we are concerned if a particular person is chosen twice in a row, but rather that ANY person is chosen twice). We are selecting objects without replacement from the population (i.e. nobody can be chosen twice within a month).

This class of problem actually has a closed-form analytic solution, using the hypergeometric distribution. Consider the second month: There are 40 objects (n) that had been chosen in the previous month and might potentially be chosen this month. (I think of them as red marbles, with the rest of the population being blue marbles. But maybe I'm just weird.) The first time we choose one of the objects, the probability of choosing one of the original 40 is 40/700. The next time we choose an object, the probability has changed, because we have already pulled an object from the population. (If the first selection was one of the last month's group, then the probability that the next would be is 39/699.) And so on, through, the 40th trial. If you multiply out those combinations, then you will have the hypergeometric distribution function (shown below, image from the Mathworks website).

It is also possible to simulate the problem, using a Monte-Carlo approach. This requires many many random trials, but can be done fairly easily using a computer. I used one million trials, and the results match closely to the analytical solution. I programmed it with C# on Microsoft's Visual Studio. It took less than five seconds to run, on my Pentium 4 2.8 GHz laptop. (I didn't even bother to use a release build, which may make things a bit faster.) The (trivial) source code is available here. (Some people have done some tests of the .Net random number generator, available here.)

The results can be seen the graph below (n=40, M=700). The average number of repeats per month is 2.2, with the probability of 0.1% for eight or more repeats in a given month. A good practice would be to record the number of repeats each month. After several months, the recorded results should look similar to the graph, unless the selection process is not random (or some other assumption, such as the population or sample size, are violated).

Thanks to Dave Loeffelholz and Dr. Richard Stern for the insights into the problem.