Archive for the ‘Straight up Statistics’ Category

Price Per Square Foot: Has Redwood City Hit Bottom?

Tuesday, June 30th, 2009

One of themes we harp on here at Cirios is how home price trends are becoming increasingly localized as individual real estate markets grope for a bottom.

Some areas, particularly the hardest hit by foreclosures, have seen fantastic price declines. Others, mainly the high end, are only recently feeling the ill-effects of job losses and our slumping economy. And while this dynamic makes for a tricky housing market, it also breeds opportunity for those savvy enough to identify which markets will be the first to stabilize and eventually rebound.

Take a look at the 2 graphs below showing price per square foot in Redwood City, CA.

The first shows all residential sales in Redwood City. The steep declines in 94601 and 94602 not only show price declines, but help illustrate how areas with high foreclosure rates — like these 2 zip codes — are seeing steeper price declines than areas that are holding up better. (For more on this subject, read about foreclosure sales effect price data.)

The second graph shows 1 segment of the market,
homes with living areas of 1,200 - 2,500 square feet.
While a somewhat arbitrary cutoff, the idea is to pick
like homes within each area to try and compare apples
to apples.

What we see is that in the second graph, is that the “spread” between each zip code, that is, the premium you pay to live in 94063 vs. 94601 or 94602 remained essentially the same throughout the boom and into the bust. Meanwhile, the first graph shows that price per square foot in 94063 (the least desirable part of town) almost touched 94061 (the most desirable part) right around the peak of the bubble.

So what does all this mean? 2 takeaways: First, be skeptical when you look at housing market data, since very small changes in data collection can lead to quite different results. Second, real estate always has been, is, and always will be local. No national, statewide or even citywide trends can capture what’s going on at the street level.

Want to see this analysis for your town? Contact Cirios Real Estate today!

Price per Square Foot: Orinda vs. Lafayette

Thursday, June 18th, 2009

Looking for great schools, rolling hills, big lots and a quick BART ride to the city?

Orinda and Lafayette, two of the most desirable towns in the East Bay, have all this and much more. Home prices remain in a downward trend, to be sure, but inherent desirability and established neighborhoods should keep these two from falling off a cliff, even as their high-end brethren around the Bay Area feel the pain of an tight jumbo loan market.

The graph below shows price per square foot for homes with 1200-2500 square feet, pretty middle of the road for these areas. Once you get above 2500 sqft, you start losing your marginal price per square foot and comparisons with smaller homes start to lose meaning.

What did that mean? Read more about Price per Square foot here.

Want to find out more about these two towns? Contact Cirios Real Estate today!

(click to enlarge image)

Straight Up Statistics - Deconstructing the Average

Tuesday, May 26th, 2009

As more statistics come out on a daily basis that are supposed to tell us that the recession is over and we’ve hit the bottom, its more important than ever to be aware of the nuances these statistics come along with.

Government officials, bankers, retailers and snake oil salesmen alike throw out statistical arguments at the drop of a hat, telling you why their pitch is the only one worth listening to because they have the data to back it up. But before accepting what you hear or read at face value just because some nameless research institute did a study, stop for a minute to ponder the complexities of even the most seemingly innocuous of statistics: The average.

Let’s first assume some particular data being quoted were reliably gathered and analyzed (This is almost never a safe assumption, but that’s a topic for another day), then examine how the average and another so-called “descriptive statistic” –- the median — are used in the data reports we see every day.

While on the surface it may seem that these two statistical measures could be interchangeable (indeed they are often used interchangeably with no explanation), they tell us very different things about the data they describe.

The median of a given group of data is its middle value. For instance, if your dataset has five data points and you lined them all up from smallest to largest, the third value would be your median. On the other hand, the average, or mean, of a dataset is determined by summing all values and dividing by the number of data points.

For example, suppose you are looking at real estate sales in a certain area within a certain time frame and you had the following 5 values: $300,000, $320,000, $320,000, $450,000, and $1,200,000. The median of this set is $320,000 (the middle value). The average is $518,000 (2,590,000 / 5). As you can see, even in this simple example, the two descriptive statistics are significantly different.

Real estate sales are often represented by the median value. The reasons for this are varied, but center around the fact that a few sales at extremely high levels (like that $2 million house on the top of the hill) can easily skew the average of a dataset towards those properties, even though most homes in the area are selling at lower prices.

For example, in Temecula, CA where most homes sell at modest levels (by California standards) but some homes sell for significantly more, the average sale price in 2008 was about $435,000. The median price, on the other hand, was around $359,000. That’s is a difference of over 20%.

Contrast that with areas where home prices are more homogenous, like Daly City, CA, where the average and median values are more closely in line. In 2008, the average sale price for Daly City was around $562,000 while the median was about $558,000 - a much smaller spread (<1%).

So which is better? Average or median? As can be seen from the examples above, neither.

Both display different aspects of the same set of data points. In Temecula, where median and average wildly diverge, using the average skews the data towards a much higher level. An individual from out of state looking to buy there might incorrectly assume they couldn’t afford to do so. On the other hand, solely looking at the median leaves out the fact that there are million dollar plus estates in Temecula available to buyers looking for that sort of thing.

When the National Association of Realtors releases their monthly sales statistics — which is the real estate pricing data carried by most major news outlets — they present sales price data as both median and average values. These values are used to track sales prices over time to identify trends in sales activity nationwide and regionally. While both median and average values are freely available to anyone with internet access, the median values are often the ones quoted in the popular press.

By focusing exclusively on median values, however, one can miss interesting trends.

For example, on a nationwide level and in three of the four regions identified, median and average home sale prices have been tracking at around the same relative spread since 2005. In the West region, however, the median sales price has been falling faster than the average price.

This widening variance helps tell the story of what’s been happening in Western real estate markets in the past few years. In most markets, high-priced homes have retained their value better than homes that are closer to, or below the median. Since so many lower end homes are being sold, many after foreclosure, the sheer volume of these transactions is dragging down the median figures. The average, on the other hand, is propped up by the few expensive homes still being sold.

This analysis then begs the question, why does the trend only exist in the West? As other regions decline, can we expect the same pattern to play out? Why are higher priced homes holding up better? If expensive homes begin to lose their value, what would that do to the median and average sales prices? What does the data look like on a city or zip code level?

It’s easy to see that just by comparing the median and average sales price trends, much insight — or at the very least another list of questions — can be gained.

I could go on all day about the wealth of information that such a seemingly simple statistic as the average can provide those with the patience and curiosity to “drill down” past the headlines. But my point is simply this: Pay attention! Don’t let the evening news or your favorite web news source gloss over the statistics to prove whatever skewed point they want to make that day. Spend the time to think critically about the information or you run the risk being fleeced regularly for the rest of your life.

At the very least, pay close attention to the source of any information you are receiving, particularly when that information comes in the form of a statistic. If you are being presented with a descriptive statistic like an average or a median, notice which one you are being given and pause for a second to think about why they used one and not the other.

Furthermore, if you notice that a single set of data is being described interchangeably by median and average, this should throw up a huge red flag as to the reliability of the information and its source.

Obama’s Mortgage Solution: What’s In It For Me?

Tuesday, February 24th, 2009

By AUSTIN NELSON

There is considerable controversy as to the wisdom of the new measures introduced by the Obama Administration to stabilize the housing market: Will they work? What does it even mean for something like this to work?

While there are strong arguments on both sides, let’s look specifically at who Obama’s plan will definitely help and how that could in turn help the economy.

According to the White House’s official release on the Homeowner Affordability and Stability Plan (HASP), upwards of 9 million homeowners will be helped in their struggle to stay afloat. Even assuming that the administration is inflating these numbers a little, that’s still a lot of families. Each of these families could potentially be given a lifeline, a way to stave off the foreclosure of their homes.

In a plan estimated to cost $275 billion, HASP aims to achieve the lofty goal of slowing foreclosures by:

1.Reducing and subsidizing monthly payments for troubled borrowers
2.Incentivizing servicers and banks to modify loans
3.Instituting clear and consistent guidelines for loan modifications

The argument has been made that the plan rewards those who made poor financial decisions at the expense of those who did not. In some ways, this is true, but there could be effects of these measures beyond the families who are directly helped.

Most importantly, slowing foreclosures can prevent the downward spiral of home values that results when a number of homes get foreclosed within a single neighborhood. In fact, the White House claims that “the average homeowner could see his or her home value stabilized against declines in price by as much as $6,000 dollars.” While the exact modeling used to figure out such a specific number is unclear, the fact remains that preventing foreclosures will stabilize prices, particularly in neighborhoods with high rates of foreclosure.

Notice that I said that staving off foreclosures will STABILIZE prices, not that it would put an end to price declines. The underlying forces involved in the current home price correction go well beyond foreclosure activity. Prices will correct—indeed they must correct before the economy improves–and no foreclosure prevention plan can stop those fundamentals. The key is to make sure that the market doesn’t over-correct and cause unnecessary damage to the economy as a whole.

In a pattern we here at Cirios have seen many times over, a flood of foreclosures can cripple a neighborhood in a matter of weeks. The greatly increased supply caused by newly foreclosed properties coming onto the market results in price declines in the entire neighborhood. Additionally, foreclosed homes often sit on the market for months, largely because they are improperly priced and the bureaucracy involved in their sale is staggering. While on the market, these homes gradually fall into disrepair, decreasing the value of every home on the block simply by their ugly presence. The resulting decrease in home values leads to more homeowners going underwater and in turn even more foreclosures. And the spiral continues, feeding back on itself. By slowing the flow of foreclosures, it is theoretically possible to stabilize this cycle and remove the feedback mechanism.

The bubble that formed from 2001-2006 in the residential real estate market was unprecedented in its scope and magnitude. At the national level, median home prices climbed to more than 30% beyond historical trends. In many areas that number was twice that much.

As you can see in the graph below, a previous bubble (blue arrow) in the late 1980s (a time period where prices climbed above historic trends) was followed by a prolonged trough (red arrow) where prices fell below the trend. The same could be said for the late 1970s, but the bubble was much less severe.

In fact, the size of these “bubbles” and the length of following “troughs” have increased substantially. If the same pattern were to follow the currently deflating bubble, we should expect to see a trough that lasts on the order of fifteen years. With the current plummeting trajectory of home prices, that trough could be even deeper than the historical pattern would predict.

Source: Economagic, analysis by Cirios Real Estate

On the right side of the graph, I’ve placed a few projections of trajectories for housing prices. One represents a deep trough, which would result from a large “overshoot” in housing price declines. The other represents a “soft landing” for home prices which could result from breaking the foreclosure spiral. Note that the difference between the two projections is two-fold: depth and duration.

In other words, how bad will it get and for how long.

The variance between the trajectories is a 12% difference in low price and a five year lag in home prices’ return to historical trends. In the interest of scientific rigor, I have to say that there is no factual basis for either one of these scenarios. I have not run any models or even evaluated any data in a quantifiable way. But what I am trying to show is that the difference between a scenario where the foreclosure fueled home price spiral continues and one where it is attenuated could have drastic consequences for real estate markets and the economy as a whole.

Our fictional 12% difference in home price means well over $1 Trillion dollars in lost home equity. A five year lag in housing recovery means five more years of expensive and destructive foreclosures. The drag that both of these factors would place on the economy would certainly slow any eventual economic recovery we could hope for.

Only time will tell if HASP will have the desired effect on the housing market. As I’ve said, it certainly won’t be a magic bullet to “solve” the economic problems that currently face us. At best it only addresses a symptom and not the disease. But spiraling home prices are a symptom that we cannot afford to ignore. That HASP simultaneously provides a positive solution to a lingering problem while directly helping millions of families most strongly affected by the economic downturn is reason for praise.

That it helps a select few more directly than others is unarguable, but the overall effect on the housing market and the economy should be positive. Whether it is the best possible plan or merely the result of political expedience is a matter for debate, as are the moral implications that such a socialistic policy represents. But now is the time for action, and this plan strikes a powerful blow.

Straight up Statistics: How Random is Your Sample?

Monday, February 16th, 2009

By AUSTIN NELSON

How many times have you read a statistic like this one: “57% of Americans believe that the economy will do X in the next Y years”? Ever wondered how in the heck they can say something like that? Do they poll every American and ask them what they think?

The answer, of course, is no.

Through a series of statistical tricks, it is usually perfectly acceptable to make statements like the one above using only a small sample size. For national poll results, usually between 1000-2000 people suffice. Scientifically speaking, much can be inferred from such a sample, and its accuracy can be evaluated mathematically (hence the ubiquitous +/- 3% info that follows all poll data).

The goal of polling any sample is to get enough people to gather a representative group, without going overboard and designing polls that would take months to perform. The problem — and this is where one can get into trouble with polls and other studies using small samples to evaluate a large population — is in the process of sampling itself. For instance, in “nationwide” polls like one that could get the datum described above, polls are often conducted by phone, with pollsters “randomly” selecting people to poll and collecting the results.

But what does “random” mean? Presumably, these pollsters have a method akin to pulling a name out of a hat filled with every name in the phone book. There are a few problems with this assumption of “randomness.”  The first is that not everyone is in the phone book. The second, and more significant problem with this methodology, is that it takes a very specific kind of person to a) actually answer the phone when someone calls from a number they don’t know and b) actually stay on the line when the person on the other end announces they just need “a few minutes.” By choosing to interview people by phone, the pollsters have actually thrown random out the window and left a large portion of America out of their study entirely.

Now, Im not saying that every national poll is worthless. Far from it. In fact, polls can be very informative as to trends in public opinion because you can accurately compare the results of the same poll over time. But it should never be assumed, not even for one moment, that if the poll says 57% of Americans do whatever, that 57% of Americans in real life actually do that (even including the stated error range).

Sampling is a big issue in any area of scientific inquiry. The assumptions that underlie any statistical analysis are very specific as to the requirements for sampling. Outside of the physics laboratory, these assumptions are almost never met. However, through careful design and data acquisition, one can make a reasonable stab at satisfying their requirements.

One good example of this is the Case-Shiller home price index, or CSHPI. As described previously by Andrew Jeffery, the index uses paired-sale comparisons to evaluate current trends in housing markets. Their methodology is opaquely complex but freely available for the world to see.

Some argue against the method, saying that by only sampling homes that have repeat sales within a given time period, you leave out a huge chunk of homes whose sales could give you insight into home values in its area. This is true, and the CSHPI is far from perfect as a result, but there is simply no way one can achieve perfection in an undertaking like modeling home prices.

The important thing to keep in mind is that with the CSHPI, you know what exactly what you are getting — and what you’re not.

Is the index a perfect indicator for what is going on in Brentwood, CA or Mesa, AZ? Absolutely not, and anyone who tells you otherwise is selling you something you don’t want to buy. But it is a painstakingly accurate and admirably well-designed method for tracking trends on a large scale level. That it leaves a large chunk of the market out of its samples is an inevitable aspect of proper experimental design.

Only by controlling as many variables as possible (in this case, by only comparing one house to itself rather than every other house that has sold within a given time frame) can one hope to do any meaningful analysis of a market as complex as residential real estate.

Because of the way it is constructed, the index itself is really only valuable as a tracker of large scale trends. If the index goes down by 10%, you can’t reasonably say that any given property has declined by 10% or even use it to reliably estimate the price change of a specific property. But if the index has shown a 25% drop from its peak (as it has), you can reliably infer that things are not going well in US housing. By tracking the rate of that decline or the difference in trends between the individual indices of one metro area versus another (there are indices available for 20 metropolitan areas), one can gain valuable and reliable insight about the performance of those markets and make inferences about future trends.

In conclusion, sampling is one of the most important but least appreciated aspects of modern data analysis. In order to correctly interpret any given data, it is absolutely essential to know how that data was sampled and how that sample fits into the area of study. Be wary of data where the data collection and analysis methodology are not freely available. And understand that where samples are involved, usually the most valuable way to use that data is to monitor changes over time rather than making inferences about how any given time period’s data relates to whatever phenomenon you are interested in.

This is especially true when it comes to home values, where there is absolutely no single data model that can tell you how much your house is worth or how much to pay for that new house you’ve got your eye on. However, there is enough data currently available that with careful scrutiny (and the help of trained professionals like the friendly folks at Cirios Real Estate) you can confidently make those assessments.

Straight Up Statistics: Deconstructing the Average

Thursday, January 15th, 2009

By AUSTIN NELSON

In today’s fast paced, data-driven world, it’s easy to get lost in the morass of statistics flashing across our TVs and computer screens at a sometimes maddening pace.

Government officials, bankers, retailers and snake oil salesmen alike throw out statistical arguments at the drop of a hat, telling you why their pitch is the only one worth listening to because they have the data to back it up. But before accepting what you hear or read at face value just because some nameless research institute did a study, stop for a minute to ponder the complexities of even the most seemingly innocuous of statistics: The average.

Let’s first assume some particular data being quoted were reliably gathered and analyzed (This is almost never a safe assumption, but that’s a topic for another day), then examine how the average and another so-called “descriptive statistic” –- the median — are used in the data reports we see every day.

While on the surface it may seem that these two statistical measures could be interchangeable (indeed they are often used interchangeably with no explanation), they tell us very different things about the data they describe.

The median of a given group of data is its middle value. For instance, if your dataset has five data points and you lined them all up from smallest to largest, the third value would be your median. On the other hand, the average, or mean, of a dataset is determined by summing all values and dividing by the number of data points.

For example, suppose you are looking at real estate sales in a certain area within a certain time frame and you had the following 5 values: $300,000, $320,000, $320,000, $450,000, and $1,200,000. The median of this set is $320,000 (the middle value). The average is $518,000 (2,590,000 / 5). As you can see, even in this simple example, the two descriptive statistics are significantly different.

Real estate sales are often represented by the median value. The reasons for this are varied, but center around the fact that a few sales at extremely high levels (like that $2 million house on the top of the hill) can easily skew the average of a dataset towards those properties, even though most homes in the area are selling at lower prices.

For example, in Temecula, CA where most homes sell at modest levels (by California standards) but some homes sell for significantly more, the average sale price in 2008 was about $435,000. The median price, on the other hand, was around $359,000. That’s is a difference of over 20%.

Contrast that with areas where home prices are more homogenous, like Daly City, CA, where the average and median values are more closely in line. In 2008, the average sale price for Daly City was around $562,000 while the median was about $558,000 - a much smaller spread (<1%).

So which is better? Average or median? As can be seen from the examples above, neither.

Both display different aspects of the same set of data points. In Temecula, where median and average wildly diverge, using the average skews the data towards a much higher level. An individual from out of state looking to buy there might incorrectly assume they couldn’t afford to do so. On the other hand, solely looking at the median leaves out the fact that there are million dollar plus estates in Temecula available to buyers looking for that sort of thing.

When the National Association of Realtors releases their monthly sales statistics — which is the real estate pricing data carried by most major news outlets — they present sales price data as both median and average values. These values are used to track sales prices over time to identify trends in sales activity nationwide and regionally. While both median and average values are freely available to anyone with internet access, the median values are often the ones quoted in the popular press.

By focusing exclusively on median values, however, one can miss interesting trends.

For example, on a nationwide level and in three of the four regions identified, median and average home sale prices have been tracking at around the same relative spread since 2005. In the West region, however, the median sales price has been falling faster than the average price.

This widening variance helps tell the story of what’s been happening in Western real estate markets in the past few years. In most markets, high-priced homes have retained their value better than homes that are closer to, or below the median. Since so many lower end homes are being sold, many after foreclosure, the sheer volume of these transactions is dragging down the median figures. The average, on the other hand, is propped up by the few expensive homes still being sold.

This analysis then begs the question, why does the trend only exist in the West? As other regions decline, can we expect the same pattern to play out? Why are higher priced homes holding up better? If expensive homes begin to lose their value, what would that do to the median and average sales prices? What does the data look like on a city or zip code level?

It’s easy to see that just by comparing the median and average sales price trends, much insight — or at the very least another list of questions — can be gained.

I could go on all day about the wealth of information that such a seemingly simple statistic as the average can provide those with the patience and curiosity to “drill down” past the headlines. But my point is simply this: Pay attention! Don’t let the evening news or your favorite web news source gloss over the statistics to prove whatever skewed point they want to make that day. Spend the time to think critically about the information or you run the risk being fleeced regularly for the rest of your life.

At the very least, pay close attention to the source of any information you are receiving, particularly when that information comes in the form of a statistic. If you are being presented with a descriptive statistic like an average or a median, notice which one you are being given and pause for a second to think about why they used one and not the other.

Furthermore, if you notice that a single set of data is being described interchangeably by median and average, this should throw up a huge red flag as to the reliability of the information and its source.

Straight Up Statistics: The Magic of Seasonal Adjustments

Tuesday, December 23rd, 2008

By AUSTIN NELSON

Have you ever wondered what the heck it means when you read that economic data is “seasonally adjusted?” How can non-seasonally adjusted data show one trend while seasonally adjusted data shows something completely different? Which dataset is the most reliable?

The in-depth answer to these questions requires a PhD in statistical analysis. For those of us who don’t know a kernel regression from a Henderson 13-term moving average filter, the short answer is that seasonal adjustment is a process by which consistent seasonal effects are removed from a time series of data. And yes, you can trust them. Well … sort of.

The effect of seasonal adjustment can be most easily explained through an example. Suppose you are looking at a series of data measuring gasoline consumption in the United States to identify trends related to the price of a gallon of gas. A logical hypothesis is that when gas gets more expensive, people drive less.

In examining this dataset, however, we would expect to see increased consumption in the summer months when everyone hits the road for their vacations. Gas prices often rise during the summer when that additional demand constricts supply, so if you were looking at data from a single year without considering seasonal effects, you might wrongly conclude that people actually consume more gasoline when prices rise.

In fact, much of the increase in fuel consumption during summer months has nothing to do with fuel prices, so the seasonal effects need to be removed from the series before any meaningful analysis of consumption versus price can be undertaken.

Looking at non-seasonally adjusted figures by themselves is a bit like saying pumpkin sales spike in October, without mentioning Halloween.

So how does one “remove” seasonal effects from a dataset? By examining several years of data, patterns in the movement of the data can be identified that happen over and over again in the same way each year. From these patterns, statisticians create (through a variety of near-magical statistical techniques) a “filter” that allows them to subtract the seasonal effects from the dataset of interest, theoretically leaving only non-seasonal effects, like that of price on gas consumption.

And VOILA! you have seasonally adjusted your data. The same techniques are applied all the time to financial and economic datasets, so much so that most people accept this “seasonal adjustment” without thinking twice about it.

Our advice is to think twice about it - especially with housing data.

One of the most common patterns in home buying is that sales tend to slow during the winter months. This makes sense, since moving in the winter sucks, and its easier to move kids from school district to school district over the summer. Now, housing economists — particularly our friends at the National Association of Realtors — are adept at spinning even the worst reports in a positive light.

Data released today showed abymsal existing home sales in November, which should come as no surprise to anyone who’s opened a newspaper in the past couple months. Nevertheless, the Realtors managed to find a silver lining.  Chief economist Lawrence Yun “[hopes] the home sales impact from the stock market crash turns out to be short-lived, as was the case in 1987 and 2001,”. If data don’t improve this winter, look for Yun and his crew to start blaming bad weather, snow and a whole host of things that make conditions look better than they are.

The lesson: Never accept data or data analysis at face value.

As my grandfather always said, there are lies, damn lies, and statistics. Unless you can understand how a particular piece of data is derived and can trust the collection and analysis methods that went into its creation, it is as informative as a two-year-old’s fingerpainting.

Seasonal adjustment is no different. Even though almost none of us can understand the mathematical techniques and statistical assumptions that go into the production of official economic figures, you can still look critically at datasets to determine whether they make sense.

In many cases, non-seasonally adjusted data is available along with seasonally adjusted data. Compare the two. Do the changes make sense?

For instance, if all of a sudden non-seasonally adjusted home sales are on par with activity over the summer, one could logically conclude the efforts to unfreeze the mortgage and credit markets may be working. If data bumps along about the same as last year, well, they better get a bigger bailout.

Also think about the source of the data. Does the source have a reason to overly stress or even inappropriately apply a seasonal adjustment to suit their needs? If so, you probably shouldn’t be trusting any data that comes from that source, seasonally adjusted or no. The data source should also have citations for the methods used to complete the adjustment. Even if you don’t know what the citation means, there are those that do and the information should be available to those experts to review.

All this being said, in most cases seasonal adjustment is a completely legitimate analytic technique. Government data has standardized techniques for seasonal adjustment that are well accepted and continually scrutinized. And while many take issue with the government’s collection techniques and even the way they count, say, unemployment, rarely are seasonal adjustments accused of being used to fudge official numbers. Most institutions that put out data reports on a regular basis are also very open about their techniques: These are the ones that can be trusted.

To conclude, with all the sources of data that are available in today’s information age it is becoming increasingly important to develop a healthy skepticism for any particular piece of information. Data is only as reliable as its source and its application.