Web Site Administrivia
[I've been doing a lot of work on my web site, and thought I'd share some of the problems and products I've been working on during this process. Some of this stuff may be old hat to a few of you, but there's some new stuff here as well.—ed.]
Framed No More
When my web site first came on-line, I made heavy use of frames. However, as time has passed, I've come to the realization that frames just don't work, although not for the reasons you might expect.
The biggest problem with frames is that they aren't compatible with many HTTP clients. Duh. I'm not talking about browsers, though. Most late-model browsers are frames-compliant (and most users are running late-model browsers). The HTTP clients I'm referring to are some of the search engine spiders that don't deal with frames appropriately. Some of them, such as Lycos' spider, summarize Web pages with "this site uses frames but your browser doesn't support them" error messages. That doesn't quite convey the focused positioning statements I've been working on for so long. Conversely, AltaVista's bot understands frames too well, showing all of the child documents found in a framed page. This isn't what I wanted either.
Yeah, there are ways around this. By embedding content directly into the <NOFRAMES> tag you can support these agents, but then you have to manage the same content in two areas in order to support both the frames-aware and the frames-ignorant clients. Managing frequently-changed content in two locations is a recipe for trouble, and is more work than I care to do.
Another option is to use custom-generated pages whenever a robot hits my site. This happens more often than you might think. Some sites check to see if the HTTP client is a search engine and then redirect it to a custom-made page. Again, designing and building this kind of setup is much more work than I care to do, considering the number of pages on my site.
Frames are a problem for standard Web browsers as well, of course. I would like to be able to determine if a browser is frames-compliant, and if so, build a frame-based document that provides navigational controls and other fixed elements appropriately. If the browser isn't aware of frames, then merge all the docs together and send down one big, monolithic file. Server-side development platforms like Allaire's Cold Fusion let you do this, but the end-result isn't workable due to reasons outside of Cold Fusion's control.
In order for this type of model to work, the server has to be able to communicate with the client, asking for it's current state ("are frames already loaded?", etc.). Without this knowledge, the server just keeps re-framing new documents. Cookies, URL parameters and other technologies just don't work here due to the dynamic nature of web linking. Since a user can come into a page directly or indirectly, there's no way to force a "start" and "end" to their session. Sites that are using these techniques fail to function properly all of the time.
Overall, dynamic web documents just aren't going to work until we have a stateful protocol that allows the server to communicate with the client on a continual basis. HTTP 1.1 only addresses part of this problem with the concept of "connection management," although it is still very much a work-in-progress with no real, usable implementations. Even then, it'll take years for a significant number of clients to support this technology before it will become functionally usable. In the meantime, I've eliminated frames from my site. I've been beaten on this.
Search Engine Madness
After converting all of my Web pages to monolithic form, I went about the task of updating the various search engines with pointers to the new pages. Before doing this, I made sure to incorporate the appropriate META tags, specifically making use of the DESCRIPTION and KEYWORDS tags.
After playing with the various search engines and directories for a couple of months, I've made some interesting discoveries. First of all, some sites are really good about indexing your pages quickly. AltaVista, InfoSeek and HotBot are the best at this, typically responding to new submissions within a couple of days.
Meanwhile, Excite, Lycos and Northern Light are the worst: none of these engines have re-visited my site in over a month, even after multiple requests to do so. Lycos only did it after I reported a bug in their service and then demanded they reindex my server as pay-back. Excite hasn't even been accepting new submissions for at least six weeks, although they've been promising a new search bot would be available in "two weeks." This reminds me of the running "two weeks" estimate in "The Money Pit" with Tom Hanks and Shelley Long.
AltaVista seems to do the best job of completely scanning a remote site. Once you submit the base URL, AltaVista will seek and index all of your pages within a few days. With InfoSeek and HotBot, you will need to provide each Web page URL individually in order to get the site completely indexed quickly.
Another important aspect here is the elimination of old pages. Since my older documents used frames, there were many orphaned files in the conversion to monolithic pages. Almost none of these sites provide an easy, direct method for removing old pages from their databases, instead suggesting that they will re-walk your site within a couple of weeks, and any dead files will be removed at that time. This doesn't always work. Currently, InfoSeek is the only site that lets you delete a page from their database. I would really like to see this feature added to the other services.
From a webmaster perspective, AltaVista, InfoSeek or HotBot are all great places to go for an up-to-date search of the Web. My personal favorite is InfoSeek, simply because they have what I consider to be the most usable interface.
Excite, Lycos and Northern Light are the worst of the bunch. If you're using any of these services as a starting point for Web searches, I guarantee that you are getting bad or out-dated results. WebCrawler used to be great, but they recently switched to using Excite's technology and as a result have dropped in value tremendously. Whereas they used to be a regular visitor, my site doesn't even show up in their searches anymore.
What about Yahoo? Yahoo sucks. Because they rely on humans to add and update entries in their database, the process of getting listed in their system is an exercise in futility. Once something does get added, you'd better not ever change it 'cause Yahoo probably won't update their entry.
This isn't just my opinion, either. A recent survey found that this inability to communicate with Yahoo was a major problem with webmasters everywhere. Even the folks who did somehow manage to get their sites listed complained about the process. I don't even bother searching Yahoo anymore. It seems like the only relevant links I get back are on the AltaVista sub-page, so why not just go straight to AltaVista instead of supporting Yahoo's madness?
How I Became A Porn King
AltaVista, InfoSeek and HotBot all use the DESCRIPTION meta tag for the page summaries they store in their databases. This really makes it easy for a user to see what the page is about without having to come visit the site. Some sites don't use the DESCRIPTION tag, but instead use the first few lines of text to build a summary. Since my site uses a common menu in the top right hand corner, the menu elements are interpreted as this text. On Lycos, for example, all of my pages share the same description text.
Because of their heavy use of meta tags, AltaVista, InfoSeek and HotBot are all fairly similar in the type of matches that they return. My logs show that each of these search engines generate a substantial amount of traffic that's appropriate to the content I'm providing.
The most common hits are for topics I've written about extensively over the years, like DHCP and directory services. There are also the odd hits for product reviews I've written ("RadioLAN" and "Sonic Interpol" are the leaders). And of course there are hits for things that don't make any sense at all, like "National Directory of Education Programs inGeron." Yeah, I can see "Directory" matching, but the overall phrase definitely shouldn't have returned a pointer to my site. I feel particularly bad about this one because this visitor spent a good fifteen minutes poking around my site, searching for "Page 115 August" and the like. Poor fella.
And then there are the hits I don't want at all, but because of some poor wording on my part, I'm going to be getting them for a long time coming. In "A Call to Arms" I wrote that "the future problem with spam is not the 'meet horney women' junk we get today, but the 'new from McDonalds' ads we will get tomorrow." This seemed a fine way to summarize the essay, so I used that in the DESCRIPTION meta tag for that page. Now my log is littered with hits from folks looking to "meet horney women." On both InfoSeek and HotBot, searching for "meet horney women" returns only my page. <groan>
So, META tags work, but sometimes they work too well. If you use them - and I strongly suggest that you do - make sure they contain exactly what you want people to see.
For those of you who are interested in finding out more about search engines and how they work, I highly recommend that you subscribe to Danny Sullivan's Search Engine Report newsletter (subscribe at http://searchenginewatch.com/sereport/list.html). Although it's quite large (it comes in two e-mail messages), it's an easy read and extremely informative. It's a "must" for any webmaster looking to leverage search engines to the hilt.
I've recently switched from running Apache on RedHat Linux to Netscape's FastTrack server on NT. The motivation for this was to make full use of some NT-centric products, such as Allaire's Cold Fusion and some log-file analyzers.
I had looked at Cold Fusion a while back and was somewhat unimpressed, but this latest release truly dominates. It's got integrated support for ODBC, LDAP, SMTP and POP3 protocols, among many others, allowing me to build a very smart and highly-integrated web server. It also comes with a slimmed-down version of the Verity search engine, which is capable of indexing HTML files as well as database contents. Because of the advanced integration features, I'm now able to offer this newsletter in an HTML format for those of you with modern day e-mail clients. If you want to get the HTML version, go change your profile.
Regarding the log-file analysis products, it is my learned opinion that most of these tools are little more than watered-down report writers. If you really want to analyze your site traffic, dump the logs into a database, and then use real data-analysis tools like Seagate Crystal Reports to extract what you want. This model allows you to conduct complex queries on the fly, whereas the log-file analyzers only generate static summaries.
For example, using ODBC-based logs and Crystal Reports, I can find "the top ten pages for the top ten visitors for last month," which is impossible to do with WebTrends or the other file-centric products. The most difficult aspect of this process is getting the data into ODBC on a timely basis. Microsoft's IIS does this directly already, allowing you to log all traffic into an ODBC database (instead of a text file). I would request that Netscape, O'Reilly and the other Web server vendors add this feature to their products as well.
Finally, on another note, I'm in the process of converting my ISDN circuit to DSL, which has recently become available in my area. I'll be sure to let you know how this goes.