Monday, March 20, 2006

The TownScores™ Method

TownScores™ Method

After spending hours pouring over several statistical resources on metropolitan areas, I wanted to make the data more easily scannable. The data were displayed in various tables, maps, graphs and charts. Units were in dollars, percentages, per capita ratios, and some scoring systems. For some scoring systems, a higher number is better, and for others (usually ranks, not scores), lower is better.

In order to digest the plethora of resources, human readers must hold complex rules in their minds, while scanning thousands of data points. The rules might be stated "for air quality scores, higher is better", and "bigger climate scores are better", and "a lower cost of living index is better", etc. IMHO, this slow down the process of comparing different cities.

What I have set out to do with TownScores™ is make a unified scoring system. Because higher scores are always "better", the system is easily scannable. This involves some subjective hand-waving, and a rather naive but pragmatic version of data manipulation. Data expressions are subjective in that they contain assumptions about what is "better". While these assumptions will not hold true for all readers, at least they are revealed explicitly. A reader who disagrees with the assumptions would need to fall back on the old way — that of holding complex rules in memory.

http://www.blogger.com/blog-options-basic.g?blogID=24268505 Settings

The data manipulation involves inverting some data measures, such that higher numbers are "better". For example, a source might give two climate metrics, annual inches of snowfall and annual sunny days:

City Name    Snowfall (in.)    Sunny Days/Year
---------    --------------    ---------------
Seattle      10.4              152
Austin       0.5               228

Were I to graph these numbers, readers would need to look at the bar chart, and remember that "more sunny days are better", and "fewer inches of snow are better". Simple enough, until you combine 7 towns and six metrics, for 54 data points and six rules. The problem quickly becomes unwieldy for the rapid data consumer.

Here's my workaround, applied to our two example metrics. To add these to a data view, I make my assumptions, adjust units if necessary, and then invert data expression while leaving the other alone.

Assumptions

  1. Sunny days are better than cloudy days
  2. Snow-free weather is better than snowy weather

Clearly, these assumptions are debatable. However, making them allows me to re-scale the data view as follows.

One graph receives the label "Percentage of sunny days." The label on the snowfall graph becomes "Freedom from Snow". The maximum possible "Sunny Day" count is 365. So, Seattle's 152 days, divided by 365, times 100, give us about 42%. Austin gets 62% Sunny Days.

Freedom from Snow requires further manipulation, since it is expressed in inches, of which there could be any number. I set a minimum of zero inces, and a maximum of 109 inches, that experienced by Syracuse, NY.

Percentage((max-actual)/max) or
Percentage((109-10.4)/109)   or
90.4%

Now, I know that "90% Snow-Free" sounds like nonsense. Consider, though, it's value in comparison. Austin is 99.5% snow-free, whereas Syracuse is 0% snow-free. Which would you rather have?

Give a similar treatment to "days of rain", add in more cities, and we can build a bar graph that lends itself to rapid scanning:

Freedom From Snow
Seattle 90%
Austin 100%
Atlanta 99%
Miami 100%
Tucson 98%
San Diego 99%
US Average 78%
Dry Days / Year
Seattle 58%
Austin 77%
Atlanta 69%
Miami 100%
Tucson 98%
San Diego 99%
US Average 78%
Days of Sun / Year
Seattle 42%
Austin 62%
Atlanta 59%
Miami 68%
Tucson 78%
San Diego 73%
US Average 56%

For further examples, read the full TownScores™ City Climate Comparison

0 Comments:

Post a Comment

<< Home