Lies, Damn Lise and Server Statistics | January 14, 2004

Every now and again I have a cursory look at my server stats. I quite like seeing who's been linking to me, a pastime often referred to as log surfing. It's interesting to know that www.mezzoblue.com sent me the most requests in Dec (2,995). What's more interesting is that a service I've never heard of, called Stumbled Upon, was my second biggest referrer that month (1,599).

It's also quite fun to see what search terms have been bringing people to my site. For instance the most popular search engine query seemed to be 145 requests for "os x p2p". I got quite a few request for "Andy Budd" (58), "Macro Photography" (48) and "css sites" (35). More strangely were the 49 search queries for "fighting techniques". Hmm maybe I do know Kung Fu?

I also find OS stats quite interesting. For instance one in four visitors to my site are using a Mac. Now I do occasionally post articles about OS X, but I wouldn't say this site had a heavy Mac focus, so it's surprising that it gets such a high percentage of Mac users. As my blog is quite web standards focused, I wonder if there is a correlation between standards and Mac use? I have to admit that most of the web standards people I know are Mac users.

The thing that most people are interested in when they talk site stats are visitor figures. However ironically these are some of the hardest stats to accurately check. I've never seen any blogs talking about visitor stat's, so I hope I'm not treading on any unwritten rules here.

In Dec my bandwidth usage was 3.68GB. This seems quite a lot for a small site, but I imagine a large proportion of this comes from my photo gallery, which is obviously quite KB heavy. This is backed up by the fact that calls to my photo gallery image directory accounted for around 45.96% of my bandwidth last month.

As far as pages go, it seems that quite a few people are subscribed to my rss feed. My rdf page got 61,971 requests last month accounting for 9.56% of my total bandwidth. My blog index page had 17,933 requests whereas my photo gallery page came in at 2,314.

In total, my server dealt with 680,200 successful requests in Dec, serving up 35,117 pages to 16,664 distinct hosts. However in all truth I have no idea what all of this means. Without some kind of yard stick, it's really difficult to put server stats in perspective.

Posted at January 14, 2004 7:57 PM

Comments

David House said on January 14, 2004 9:25 PM

They sound good though :)

Scrivs said on January 14, 2004 9:39 PM

Stats have always been the thing that intrigues me the most. I guess that’s why I wrote an entry about it not too long ago. I will admit though that 3.68 GB is a lot of bandwidth. I can see how that is attributed to your photo gallery.

Another confusing thing is how everyone seems to use different terms for different stats. Visitors, unique visitors, page views, hits, requests, distinct hosts. It can all get pretty daunting.

May I ask what you are using to check your stats?

Roger said on January 14, 2004 10:15 PM

My site doesn’t get quite the amount of traffic that you have here, but the statistics are similar. Of those visiting my site, almost 20 % are on a Mac. I was thinking about this the other day, and I think you’re right in assuming that many web standards people are Mac users.

Zelnox said on January 15, 2004 12:06 AM

Like Scrivs, I’m curious also. Andy, you mind sharing with your fans what service to track all these statistics, please? _

qid said on January 15, 2004 1:40 AM

You were asking for a yardstick, so here goes… Your blog index page got 17933 requests in December, mine got 344. I have exactly two readers (my dad and one of my friends online), so using a simple ratio you have perhaps 100 consistent readers. Interestingly enough, when I compare my total hits in December (13380) to your total hits (680200), the ratio is nearly identical (2.5% difference). I have no idea what that means, but it’s probably something interesting.

Scrivs said on January 15, 2004 1:52 AM

Well if Andy is using AWStats I guess I will share my stats, otherwise it would be pointless comparing Apples to Oranges.

Shaun Inman said on January 15, 2004 2:36 AM

I think the Mac/Standards correlation has to do with the shared dispositions of Mac enthusiasts, an appreciation of the finer details. It’s reflected in Apple hardware, the Mac OS and Apple peripherals. It’s also reflected in light structural source and whiz-bang CSS. Sure, you can use any old WYSIWYG to churn out a bunch of web pages that will look fine in IE/NS 4.x but there’s still an art in producing code that looks good and validates. And that’s what Mac users typically are, artists.

Not that you can’t be an artist if you use a generic PC but the Mac platform clearly has a higher percentage of creative professionals—aided in part by it’s low market share.

Quasi said on January 15, 2004 3:03 AM

John Gruber of www.daringfireball.net tends to post his stats too in case you were wondering.

Dris said on January 15, 2004 3:28 AM

I have mentioned my stats from time to time in my blog, but not often. I simply open them to the public (a practice I probably won’t continue for long, for several reasons).

The main reason I like stats, besides gauging the growth of regular readers, is that I get to see where people are coming in from. It’s difficult to gain regular readers (outside those whom you know personally) when the web is so full of good content. Seeing where people come from helps tell me a lot about my visitors, including their interests and what they’re looking for.

Most of my users come from the CSS Zen Garden, and from comments I post in other peoples’ blogs. That by itself can be a treasure trove of information if interpreted right.

Joel said on January 15, 2004 4:34 AM

If you have a diversified site with different subsites then I tend to think “page views” are the most valuable statistic. For my site my “page views” are on average five times my “unique visitors”, but the occasional single visitor stays hours and pulls up 80 odd pages. So for me number of page views seems to reflect the business of the site better than any other stat, since if I went on unique visitors their activity on the site would get lost. Then again, it’s not completely accurate as the extent to which files are downloaded must be important too, such as PDFs. And also some pages receive much less traffic than others. Overall though, I think it’s a good idea to settle on one stat to watch, such as page views.

Dris said on January 15, 2004 5:53 AM

By the way, I checked out that Stumbled Upon toolbar. It’s really great! Thanks for pointing that out.

Scrivs said on January 15, 2004 11:44 AM

Well if we are going on page views then it looks like this month I am on target for about 340,000-350,000, which is getting to where I never thought I would be. I am very interested to see how my stats look at the end of the year. Also is there a “goal” that everyone sets that makes them feel they have done a successful job with their site? I think I passed my goal when I jumped over 1,000 page views :)

Joel said on January 15, 2004 12:25 PM

Ah but Scrivs I’ve certainly noticed that I have to refresh your page every time I go to it in order to see if there has been any new content. If everyone does that a number of times a day it will give you a false picture of number of page views.

As a matter of course I place these metatags on my pages:

meta http-equiv=”expires” content=”-1”
meta http-equiv= “pragma” content=”no-cache”

This ensures the pages are served up afresh each time and there is no need for the visitor to refresh just to check whether it’s a cached page they’re viewing, which of course would make the page view stat pretty meaningless.

You looked like you were getting a bit excited there Scrivs…

Scrivs said on January 15, 2004 1:34 PM

Some good info there Joel, but I don’t understand why you would have to click refresh on my pages unless you kept your browser opened to my site. I am sure I am missing something here so maybe you could help me out.

Patrick Griffiths said on January 15, 2004 4:57 PM

I keep track of visits and page impressions regularly, just as an indicator to see how well my site is doing, as well as checking referrals which often throw up some interesting sites.

I’m no server expert, but these are a few points of interest, which I think I’m right in saying:

Unless you have some kind of cookie thing going on, there is no way of acurately knowing how many people visit a site in a given period. As far as I know, a person with a regular dial-up account may have a completely different IP address every time they log on. So the ‘unique visitors’ figure that a stats package comes up with is often inaccurate.

Another thing that seems to confuse many people is the term ‘hits’. Many seem to think that one hit means one visit, rather than every instance of a file pulled off the server. I’ve come across excited people a number of times who will say ‘my website got x hits the other day’, thinking that they had x number of visitors.

Andy Budd said on January 15, 2004 6:47 PM

The hosting company I use run Analog to provide the stats. However the package we tend to use at work is Webalizer.

On occasion we’ve had clients wanting advanced stats reporting so they can look at user paths and stuff. However unless you really know what you’re doing and have a real need for this info, it’s pretty overkill.

I’ve not found any good open source stats programs that provide this high level of reporting. There are quite a few commercial ones that use a mixture of server logs alongside the use of cookies, but these tend to be pretty pricey. Often in the region of $1000 +

Justin said on January 15, 2004 7:20 PM

About 2 weeks ago Scrivs posted a message mentioning Awstats (http://www.9rules.com/cgi-bin/mt/mt-tb.cgi/96). I’d never heard about it, so I checked it out.

I’ve been looking for a OSS stats program that was powerful like those expensive commercial ones. For the last week I’ve been using it to keep an eye on the stats for the company I work for and it’s been great. I’m impressed how customizable it is, including the ability to have it generate custom reports, graphs, etc. I still want a bit more power, but I suspect that I’ll can get what I want by making my own reports in the config files. It’s not perfect, but IMHO, it’s the closest I’ve seen.

For me, having the ability to use JS to get user agent stats is great. Also to breakdown both search phrases, and search words will be helpful for our SEO efforts. Being a CGI also makes it extremely easy to setup and customize for a client, should they want more than what their ISP supplies.

seriocomic said on January 15, 2004 11:27 PM

I use Power Phlogger by http://www.phpee.com/ - it’s a free php logging and tracking script that you can host yourself (and track your customer’s sites as well). I had been struggling to find a php replacement for AWS stats until I found it. The amount of detail you can pull from it should satisfy any user.

Tyrone said on January 16, 2004 11:36 PM

It seems that everyone here is using a log analyzer script actually installed on the webserver. I’ve tried using these but they seem to be too server intensive. Simply changing a include/exclude filter will force the entire cache (assuming there is one) to be refreshed.

I eventually settled on doing log analysis the old fashioned way; having Apache daily rotate and gzip log files (format access_log_YYYY_MM_DD.gz) at midnight and using a PC based log analyzer to do the dirty work. BTW, I use http://www.weblogexpert.com because, unlike webtrends, it doesn’t require you to decompress the log files onto your hard drive.

BTW, I wonder how many of you go through lengths to exclude any illegitimate traffic: search engines, offline browsers, your own IP address (assuming it’s dedicated, like cable/dsl), test folders/directories, code red worm virus calls, etc.

Sometimes I like to remove all exclusion filters so I can live in a temporary delusionary state of denial regarding my traffic stats.