Learning to Analyze Web Traffic using Wireshark

Juli 8, 2009 pukul 10:22 am | Ditulis dalam Wireshark Experience | 5 Komentar
Tag: ,

This article assumes that you are familiar with networking basics such as MAC addressing, TCP/IP stuff, browsing, and internet-related activities.

First packet we capture using wireshark after we type www.google.com in my browser is below : ( You can either download my capture file here, or just install/run your wireshark and open google.com ).

For first starters who never uses wireshark before, let us get familiar with this free and amazing tools !

Please check the first row, it is said : Frame 1 ( 680 bytes on wire, 680 bytes captured ).What does it mean ? It means that this first packet has 680 bytes, and you get all the packet. Are there possibilities we don’t get the whole packet ? Of course ! Sometimes we just want to record partial part of a packet in order to minimize our disk space usage, we can limit that in Capture > Options. You can expand (+) this first row to see further explanation, such as arrival time of the packet, etc.

Let’s check the second group line, which is Ethernet II, Src : ….. , Dst : …… We should easily understand, that these Src and Dst are the MAC addresses of our computer and our gateway. In my capture file, I am using dial-up, but if you capture in Ethernet environment, it will reveal your LAN adapter and gateway MAC address.

The 3rd group line, Internet Protocol , Src : 10.199.7.15, Dst : 216.239.61.104. It is clear that in this capture, my dial up IP address is 10.199.7.15 ( which is an internal/not public ip address ), and google.com web server that serves me is 216.239.61.104 ( well, it could be different if you try yourself because a web site might be handled by a lot of load-balancing different servers ). Inside this IP information, we can understand that our IP is version 4 with header length of 20 bytes ; Differentiated Services Field, which is useful to do Quality of Service; Flags : whether this packet is going to be fragmented or not and a simple checksum of source and destination header. Well, we are not going to go detail on this DSCP and Flags in this article.

The 4th group line, is Transmission Control Protocol, with Source Port : 1048 and Destination Port :80. Why is it 1048 ? This 1048 is automatically generated by our OS TCP/IP stack, and this number is important to keep track the flow of this specific stream. Meanwhile, as we might have understood, destination port : 80 is HTTP port. Sequence number is 1 (which is relative number). Wireshark called this relative number, because it is the first packet. The real acknowledgement number is ‘aa 84 ff ef’, we can get this number in my example, by clicking on the ‘sequence number’ row. The next sequence number is 627, this is because the length of the packet is 626 bytes. It means that, our computer is hoping to get this 627 in the reply of google.com.

We can see that this is what exactly happening, in the second packet, which is the direct reply packet from google, we can see that Acknowledgement number is 627. And we must notice as well, that all source and destination parameters has been reversed. For example, source IP and destination IP become reversed, which now Source is 216.239.61.104 and Destination is 10.199.7.15. Please also take note that source port and destination port has now been reversed as well, where source port is now 80 and destination port is now 1048. The combination between Source/Destination IP and Source/Destination Port is something that is very important. Because in a crowded environment, this is the parameter that we need to filter when we want to view only a specific packet stream.

First line up to 4th row line is the information from layer 2 until layer 4 in OSI protocol ( Data Layer = MAC address, Network Layer = IP address, Transport Layer = TCP Source/Destination Port ). Next, we will examine the next row lines, which is the application layer, the HTTP packet itself.

We go back to the first packet to see what are basically send by our browsers to the web servers when we just easily type www.google.com ?

From the screen we can see that the first line is “GET / HTTP/1.1″, followed by “Host : www.google.com“, followed by “User-Agent”, which is the ID of our browser, and followed by a few parameters afterward. The detailed meaning of these parameters can be found in the HTTP protocol, which of course can be searched via google itself J. Also take a look at the Cookie part, where you can find “PREF=ID=….”. This the the cookie that is also sent by our browser to inform the web server where we are.

The easiest way how to see this HTTP stream is by following the TCP stream. So at packet number 1, right click and choose Follow TCP Stream. The results is as follows :

GET / HTTP/1.1

Host: http://www.google.com

User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.16) Gecko/20080702 Firefox/2.0.0.16

Accept: text/xml,application/xml,application/xhtml+xml,text/html;q=0.9,text/plain;q=0.8,image/png,*/*;q=0.5

Accept-Language: en-us,en;q=0.5

Accept-Encoding: gzip,deflate

Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7

Keep-Alive: 300

Connection: keep-alive

Cookie: PREF=ID=36f9f4513ed9f844:TM=1213713072:LM=1213713072:S=qmtKBbuJgUlqUL_L; NID=14=IUsnRSc2pC91Ni0cDivO7Rdpg0HP5ous5nqj-yj_aWKOceNGeQKq5Ll86SaDdKbmsdpGLEAv3NqzDqgHDj7FME9HjkCye_G0S3MmzHlb7CNqmROIvX1IBF47zxFxt3Xq

HTTP/1.1 302 Found

Location: http://www.google.co.id/

Cache-Control: private

Content-Type: text/html; charset=UTF-8

Date: Sat, 13 Sep 2008 23:48:11 GMT

Server: gws

Content-Length: 221

<HTML><HEAD><meta http-equiv=”content-type” content=”text/html;charset=utf-8″>
<TITLE>302 Moved</TITLE></HEAD><BODY>
<H1>302 Moved</H1>The document has moved<A HREF=”http://www.google.co.id/”>here</A&gt;.

</BODY></HTML>

The blue color / second paragraph above is the reply from the server. Although without knowing much about HTML page, we can understand that the reply is telling our browser that the document has been moved, and redirect to www.google.co.id. When we use ‘Follow TCP Stream’, the Wireshark will automatically creates a filter . The filter can be seen in the filter field : (ip.addr eq 10.199.7.15 and ip.addr eq 216.239.61.104) and (tcp.port eq 1048 and tcp.port eq 80). Since we no longer needs this filter, just press the Clear button to delete the filter and show all packets.

The journey continues on packet number 4, where our browser is now issuing another GET HTTP command. This time, the destination host is www.google.co.id, and the tcp source port is 1053, with destination port is 80 (HTTP). Please take note this ’1053′ number, since we will track the

Packet number 5, is still the acknowledgement of our PC regarding the previous packet, so we will ignore this packet. Also packet number 7, which is the ‘reassembled PDU’ packet generated by Wireshark.

The continuation of my www.google.co.id is found in packet number 6, which is the ACK packet of the server, and continued by packet number 8, which is another ACK generated by my PC. The real html data will start from packet number 9, which is generated by the server itself.

When my browser receive this packet, it analyses the content, and found out that it needs to download some more file to make the page complete. We can see it in the next packet, where my browser issues another GET /intl/id_id/images/logo.gif HTTP/1.1. This request is using TCP source port 1050, and of course with TCP destination port of 80 (HTTP).

So, right now, I have 2 opened source port, that is 1053 for the html file of www.google.co.id, and 1050, which is the logo.gif. Now, let’s assume I want to save that logo.gif, there is a very easy way to do this. Wireshark is indeed very cool. Go to File, Export, Objects, HTTP. From there, I just click logo.gif, and save it. That’s it… So simple !

Hopefully from this small article, I can give you a glimpse on how to use Wireshark to read packet. If you don’t want to capture yourself, just download from the following link.

http://rapidshare.de/files/40675044/google-only.cap.html

Just let me know if the link is dead already

About these ads

5 Komentar »

Umpan RSS untuk komentar-komentar pada tulisan ini. TrackBack URI

  1. bagus juga nih ulasannya..terimakasih sob

  2. [...] Here is the original post:  Learning to Analyze Web Traffic using Wireshark « Belajar Jaringan … [...]

  3. makasih atas informasinya, sangat bermanfaat. tak tunggu kunjungan baliknya

  4. Good article, thank you for the share.

    Bagus artikel ini, terima kasih kerana berkongsi.

  5. Oom, file-nya mati oom … tulung dunk Oom..
    soalnya hasil capture gw beda ama punya Oom … dan, literature tentang how to analyze traffic using wireshark masih jarang … jadi, masih meraba-raba nih ..


Tinggalkan Balasan

Isikan data di bawah atau klik salah satu ikon untuk log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Logout / Ubah )

Twitter picture

You are commenting using your Twitter account. Logout / Ubah )

Facebook photo

You are commenting using your Facebook account. Logout / Ubah )

Google+ photo

You are commenting using your Google+ account. Logout / Ubah )

Connecting to %s

Blog pada WordPress.com. | The Pool Theme.
Tulisan dan komentar feeds.

Ikuti

Get every new post delivered to your Inbox.

%d bloggers like this: