Trivial uses of Telnet - HTTP
Continued from
page 1 - "Getting started, a simple guide to two more practical utilities - telnet & netcat."
HTTP - Browsing page source
Webpages are served up using a protocol called
Hyper
text
Transfer
Protocol or
HTTP for short, standards suggest that port 80 should be where the http service listens although it
is trivial for the administrator to use another port. For the purposes of our little exercise we are going
to look at very simple ways to get the webserver to serve up the front page of it's site, there are a myriad
of things you can do but we are going to keep it pretty trivial.
Start off by connecting your chosen tool to port 80 of a website (
www.example.com for the purposes
of this demonstration), to start off with we type the following and press return twice after we are done.
If you are using telnet you need to get it right the first time otherwise it will not work correctly.
GET / HTTP/1.0
What we have just asked for is for the system to send us the document (
GET) which exists as the
front page of the website (
/ or root document), and we specify that we just want a simple request
without all the extra fuss which I will explain later (
HTTP/1.0).
In response the server replied with the following (this is just an example so yours will be different,
plus I have tweaked this to make it simpler so some of the numbers will not be accurate):
HTTP/1.1 200 OK
Server: Microsoft-IIS/5.0
Cache-Control: no-cache
Expires: Mon, 24 Dec 2001 00:49:17 GMT
Content-Location: http://10.0.0.100/index.html
Date: Mon, 24 Dec 2001 00:49:17 GMT
Content-Type: text/html
Accept-Ranges: bytes
Last-Modified: Mon, 24 Dec 2001 00:27:03 GMT
ETag: "60fa8cb5118cc11:adc"
Content-Length: 103
<HTML>
<HEAD>
<TITLE>Demo page</TITLE>
</HEAD>
<BODY>
This is just a test.
</BODY>
</HTML>
When the webserver is done telling us about this page it closes the connection since we have the data, and it
has other requests to process - when this happens we get this message:
Connection to host lost.
So what have we leant from this? Well everything before the linespace is called the header, and everything
below that is called content and normally you only see content since your browser filters out headers for you.
Headers however do contain rather a lot of information...
-
The site appears to have a standard front page (denoted by the 200 status), a status of 3XX
implies you have to go to a second page to find what you are looking for, a status of 4XX implies
that there was a problem getting this page (either it was missing or you arent allowed access to it at
this time etc.) lastly a status of 5XX would mean that the server had problems processing my request.
-
The site appears to be running Microsoft's IIS version 5 (this is only found on windows 2000) so I know what
webserver and operating system they appear to be running. IF you haven't guessed you get this from the string
that says Microsoft-IIS/5.0.
-
We have a Content-Location header which can often give away information such as the internal addresses
of machines, full paths to documents on the website and a multitude of other things.
-
We have a Last-Modified header which does what you expect it to - details the last date and time
that this page was modified, sometimes very useful to see how frequently a website is really updated.
The actual content side of it will generally only give away the errors of the designers but looking out for and
identifing those is an entire lession in itself.
HTTP/1.1 vs HTTP/1.0
HTTP/1.1 is a widely used extension to
HTTP/1.0 as it allows the client more control over
the content it is being delivered, and like most protocols it is over-engineered so much so that there
are features built into it that are rarely ever used - they were nice ideas but very few people would
implement features; such as only giving out certain types of content if the client can accept them.
The minimum number of details that make up a request you can expect to use and get a valid response back is
the following:
GET / HTTP/1.1
Host: www.example.com
Connection: Close
However in practice you are more likely to be using a replica set of request data since it fools the
website into thinking a browser is visiting their site, and also makes sure that if for some reason
the website needs the regular amount of data it has it.
GET / HTTP/1.1
Accept: image/gif, image/x-xbitmap, image/jpeg, image/pjpeg, */*
Accept-Language: en-us
Accept-Encoding: gzip, deflate
User-Agent: Mozilla/4.0 (compatible; MSIE 5.5; Windows NT 5.0)
Host: www.example.com
Connection: Close
The main difference is the
Host: request header as this allows website hosting companies to
put more than one website on an address and have them all accessible (referred to as a virtual server),
as if you try to access a virtual server without the
Host: line you will not get the site
you expect!
Just incase you were curious the other request headers used in that example are:
- Accept - gives a list of the types of data that you are in theory willing to accept.
- Accept-Language - gives a list of the languages that you are in theory willing to accept.
- Accept-Encoding - gives a list of the encoding methods that you are in theory willing to accept.
- User-Agent - a string that describes the type of browser you are using.
- Host - the hostname this request is destined for.
- Connection - specifies how to handle this request.
The choice of which version you want to use comes down to how much effort you want to put into the task,
because whereas 1.0 is simple to the point that it is noticably unrealistic, 1.1 is complex but is more
believable since this is what a modern browser would us, so will not look out of place. Also it is useful
to remember that you cannot access virtual servers using 1.0 since host came in under the 1.1 specification.
Continued on
page 3 - "Popular ways to hide who you really are on the web - proxy servers and wingates."