Using URL rewrites
Aside from the normal functions you'd expect from a moderm
404 script,
such as responding to requests for missing resources with 301 and 302 redirects, 410s
for removed resources and so on it's also possible to do basic semi-transparent URL
rewriting without adding any extra software to your host.
Think of it as a poor man's
mod_rewrite
or
.htaccess if you were using
Apache but since you're probably using
IIS
the closest analogy would be
ISAPI_rewrite.
URL rewriting via an ASP script
does work, I've used it as the core of at least
one website but it's important to realise that script-based URL rewrites aren't always
the best solution and they aren't the only solution either.
Script-based URL rewriting adds an extra layer of complexity to a website, on top of
existing programming - for
some websites that means the end-result is messy and adds
additional overhead rather than the simple, streamlined solution most people want.
If you're looking at building a serious commercial ASP-driven website or a rewrite-heavy
ASP-based website then you should at least consider
ISAPI_rewrite
as you can have clean URL rewrites out of the box without the risk of adding extra complexity
to the code. There are two versions their product available: a free version with some
features disabled, and a paid version with all features enabled.
Thinking about buying
ISAPI_rewrite for your website? If you follow
this link
to order then we'll earn a small amount from your purchase.
Enabling the rewrite function
The last few versions of our
custom 404 error handler have supported
simple URL rewriting if you read the
fine print / script comments and were
willing to write a little code to make the magic happen.
Newer versions have the rewrite call built-in as standard, but as the majority of the
users probably wont have a use for it the call has been left commented out in the
source of
custom404.asp, line 33:
'Run custom hardcoded rewrites
If ReDirect_Rewrite( sSrcPage ) Then
Response.End
End If
One that block of code has been uncommented you're half way there, you just need to
add some substance to
Redirect_Rewrite( ... ) so it can perform rewrites
for you.
Building your rewrite function
Below you'll find a very simple example of a short
Redirect_Rewrite( ... )
function written as a proof-of-concept. It works like this;
Redirect_Rewrite( ... )
intercepts the 404, checks the URL and if it matches it dissects the URL into component
parts which can then be used to query your main database.
The example we've supplied is a based around a real-life question about how you could use
this script to create a relatively "clean" URL for a product catalog while still remaining
on basic hosting which didn't include ISAPI rewrite at the time.
Function Redirect_Rewrite( ByVal sSourcePage )
'Trigger function for inline rewrites if required
'demo code assumes that pages are named /products/123456.htm or
' /products/item123456.htm and parses them accordingly, this
' this would allow you to hook into a database and generate the
' appropriate page for that item number.
Const csDirName = "/products/"
Const csPrefix = "item"
ReDirect_ReWrite = False
'Normalise the input
sSourcePage = Trim(LCase( sSourcePage ))
If Left( sSourcePage, Len( csDirName ) ) = csDirName Then
'Trim the directory
sSourcePage = Right( sSourcePage, Len( sSourcePage ) - Len( csDirName ) )
'Trim the suffix
If Right( sSourcePage, 4 ) = ".htm" Then
sSourcePage = Left( sSourcePage, Len( sSourcePage ) - 4 )
ElseIf Right( sSourcePage, 5 ) = ".html" Then
sSourcePage = Left( sSourcePage, Len( sSourcePage ) - 5 )
End If
'Trim any prefix
If Left( sSourcePage, 4 ) = csPrefix Then
sSourcePage = Right( sSourcePage, Len( sSourcePage ) - Len( csPrefix ) )
End If
'Create page for output
Response.Status = "200 OK"
Response.Write "-- your header code --"
Response.Write "Your product code was '" & sSourcePage & "'."
Response.Write "-- your footer code --"
ReDirect_ReWrite = True
End If
End Function
As you can see it's pretty basic; the function returns either
True or
False depending on the path of the resource requested, the "trigger"
path is defined in a constant (
csPrefix) to make any changes easier.
It's important to note that all the processing for the rewritten page needs to
occur inside the main 404 script as if you start trying to get fancy and use an
intermediate page (even via
server.execute or
server.transfer)
then IIS will generate a redirect, making the URL rewriting apparent. As long
as you deal with the processing in a single pass then the rewritten URL will
be totally transparent to both users and search engines.
Nb. file extensions can be important as by default IIS will generate a
redirect to your 404 script for files with .asp extensions, as noted below. If
you have access to the server then the fix is listed on the main page.
For the sample we chose a directory called "products" directly off the websites
root folder because at the time we wanted to demonstrate a way to give a selection
of products
clean URLs despite being dynamic and database-driven (in other
words using no querystrings).
Next we defined approximately how our URLs would appear, in this case we chose a
simple
example.com/products/123456.htm or
example.com/products/item123456.htm
which aren't brilliant but would do the job well enough if the product selection
didn't require product names in the URL. The prefix of "item" is optional but
some might prefer it to a straight "numbers only" page name.
Despite these being on an ASP enabled website we elected to use an .htm extension
as a short-cut to overcome an issue with IIS redirecting when a 404 is generated
by a file with an .asp extension. We could have also gone with an extensionless
URL but generally that tends to confuse people when dealing with short URLs.
Basic Logic
The core of the function (the part that extracts data from the URL) can be written
to suit your own needs, our example met the requirements we were given but yours
could be written to deal with something totally different - don't feel obliged to
follow our example to the letter. If it helps here's a quick run-down of what
our example code was doing...
First of all we check if the page which triggered the 404 matches our "trigger"
path, if that's not the case then we neatly exit the function with the result
set to false by default.
If we did generate a match between our "trigger" path and the path to the requested
page then we strip the trigger from that string. If you look at the URL structure
we chose you'll see that should leave us with just a page name.
The page name includes all the data we need to locate a product cleanly, we just need
to tidy it up a little first. If there's a suffix of .htm or .html we trim that off
and then finally we check for our prefix, if it exists we trim that off as well.
Hopefully that should leave us with something we can use to pick a single product
out of the database and display its details - if we find a product then we can eventually
return
True and the remaining logic will cleanly exit the script, if
we fail to find a product then the return can be left and the default value and
the request can be re-processed by the regular 404 logic.
Conclusion
An advanced topic I'll admit, but it was an interesting piece of coding to create
and reviewing the whole process after completion gave the odd insight - if you have
any questions about points I've raised here then feel free to
drop me a line.