IIS Log Search Term Extractor

Description

Visual Basic utility designed to quickly and efficiently process a series of website log files extracting the exact search terms used to find your site. While some log analysis packages only offer the top X results, or the top X from a selection of search engines, this utility will generate a file containing a complete record of all search terms and referral sources contained in the website logs.

The basic concept was to allow a detailed view of where your site is getting incoming traffic from and which search terms are generating that traffic - but not doing in in the superficial way some log analysis packages do, where you'd lose the minor trends in favor of having the major trends graphed neatly. Also having access to the raw data is a large bonus as it has so many secondary uses.

The utility starts by assembling a list of known search engines, specifically their URLs and how they pass through the user's search terms / query terms - it does this by parsing an "Analog style" list of search engines, or even an Analog configuration script. Analog seems a good choice because they have a frequently updated list of search engines.

Once this list has been loaded the logs are processed - we attempt to match each referring URL against our list of search engines, assuming a match is found we can then attempt to extract the search terms / query terms which were used to bring this particular visitor to our site. As we gather more data it is written to a variety of reports based on the your preferences, these vary from the core report which can be imported into a database package to extract statistical results through to various reports showing what might be search engines but which do not exist in your list, lastly there is also an option to create a list of all the remaining pages.

Requirements

Single Compressed Download

Installation & Setup

  1. Unpack the zip file into a single directory.
  2. Download the latest SearchQuery.txt file.

User Guide

Point the utility towards your log files (using wildcards if multiple files are required) by either entering the path manually or by using the built-in selection dialog. Next point the utility towards your analog configuration file or analog search query file in the same way as above. Finally enter your base URL so the application can avoid showing you internal referrals.

That's all you need to do for the set-up, so now would be a good time to save those setting if you plan on using them again next time around. You can start the process running from either the menu (from the "File" menu choose "Run") or with a keyboard shortcut (CTRL+R).

The first time you run the application the defaults leave you in a batch mode - that means it'll run without requiring any attention and alert you when it's finished. It can generate three files, which will appear in the same directory as your logfiles:
  1. SE Queries - Search Terms.txt This file contains the raw search term data in a tab delimited format, the various columns should be pretty self-explanatory as they now all have headers.
  2. SE Queries - Incoming links.txt This file contains any referring URLs which don't seem to contain search terms that the process can identify, they are primarily going to be links from other sites to yours (aka incoming links / backlinks) which can also provide you with an interesting report showing where non-search traffic is coming from.
By default the "Incoming links" results aren't produced, in order to produce this file you need to tick "Report incoming links?" from the Options menu.

Analog.cfg or SearchQuery.txt?

Really this is up to you, personally I would suggest getting hold of the latest SearchQuery.txt file as this will contain a recent list of search engine definitions which will give you the best results - most Analog users will already have a recent version of this file.

Interactive updates?

This is a built-in "learning" process, it allows a user to improve the match rate of the application by extending the existing information about search engine definitions - for example adding new search engine URLs or updating the list of variables a search engine is known to store its search terms in.

A side-effect of this is that the application is able to extract search term information from more than just search engines - anything that supplies a search result to the user but leaves the search term in the URL is a viable source. It also means having a slightly out-of-date search engine list isn't a major issue as if you start getting traffic from a new engine you can use interactive mode to update your own search engine list!
Evolved
Code
ASP, SQL & VB meet the internet.

Navigate

Home Parent Directory Meta-Search

Technical

ASP Scripts SQL Scripts VB Programs Show All

Guides

Show All

Other

Contact Site News About Legal Sitemap Links