Hii. I am trying to extract all packet transition for a particular website visit (for example : google). I am considering websites which takes contents from different servers. Is it possible to filter our whole bunch of packets of a particular website visit apart from multiple webvisit packet existence? Is there any way to determine the ending of a whole web page loaded? Please help me with the issue. asked 02 Apr '13, 03:20 Azfar |
4 Answers:
One method of "binding" the individual HTTP requests to all requests needed for building a particular page is to use the HTTP header "Referer:". Whenever you request a page, all objects that are reference by the initial html page have a "Referer:" header pointing back to this page. One drawback of this method is that when you click on a link, the new "initial" html page will have a "Referer:" pointing to the page in which the link was clicked. But you will be able to distinguish this page by the time gap between the other objects and this new html object. Also, you will see other objects with "Referer:" headers pointing back to this URL. Example: You open http://www.example.com/index.html:
You click on the link to http://www.exaple.com/nl/contact.html :
So, if you want to filter on all http requests involved when opening http://www.example.com/index.html, you can use the filter:
(and take into account that you will get some extra http requests when you clicked on a link on the page) You can also have a look at the http-fox plugin for firefox to follow each request and it’s headers. Firebug is another firefox extension that might help, it will show you exactly which elements were used in loading a page, including the timings. answered 02 Apr ‘13, 13:22 SYN-bit ♦♦ |
if you need to do this with Perl, you better look at one of the tools listed on the following page (esp. Chaosreader), instead of using Wireshark/tshark:
Regards answered 03 Apr '13, 04:01 Kurt Knochner ♦ edited 03 Apr '13, 04:06 |
You might have to be a little clearer in your question, however if you are asking about how can I see all the traffic associated with a particular web page it is a little difficult directly with Wireshark. Obviously if a particular page starts with say a GET for "index.html", in order to determine the subsequent GET fetches you really need to basically parse the result and act like a browser - running the HTML and Javascript, looking in your cache, and so forth. Your best bet is either to make sure nothing else is running on your client, apart from your web browser, or use tools like Firebug or the Chrome debugger and determine what are the various requests. You then filter these by "http.request" filters in Wireshark. answered 02 Apr '13, 04:31 martyvis Thanks for your answer. but this would not solve my problem. Because i need to find out a way to tell my perl code, this packet is the most probable last http response and then take the whole bunch of packets for further analysis and name the page accordingly. (03 Apr '13, 01:42) Azfar |
well, that's a hard problem, unless you are "sitting" in the browser. Just by looking at the network traffic it is hard to identify and map/match all HTTP requests that are a result of a single "page load" of the browser. The reason: HTTP is stateless. Each page load can trigger further requests. None of those new requests contains any kind of information that they belong to the same "page load". Imagine this: You load cnn.com and there is an image embedded that is hosted on apple.com (iCar add). There is no way to map the access to apple.com to the "page load" of cnn.com. The user could simply have accessed that image manually, possibly even in a second instance of the browser. So, to answer you question: No, there is no reliable way to map/match all subsequent HTTP requests that are triggered by a page load, as there is no common criteria to identify those requests. You could approximate it like this: You monitor every HTTP request. Then you parse the content of the HTML code (and Javascript code!!). After that, you know the URLs that are linked in that first HTML document. Every subsequent request from the same IP, with the same "User-Agent" (same browser), within a defined time frame (a few 100 ms), is treated as result of the first "page load". This would work, however only with some uncertainty, as the user could have manually loaded any of the subsequent URLs in a second browser instance. BTW: This kind of approximation is not possible with Wireshark, unless you add some code to do that.
There is no reliable method, for the reasons I explained above. Regards answered 02 Apr '13, 09:08 Kurt Knochner ♦ edited 03 Apr '13, 03:29 Thanks Kurt. (03 Apr '13, 01:44) Azfar BTW: The Referer header is just one further criteria for the approximation approach. The problems I mentioned are the same with/without Referer header, as you will never know for sure if a request is triggered by a certain "page load" or by an independent action of the user (second browser instance, loading similar parts of the web site which trigger the loading of CCS/images/Javascript/etc.). If you can live with the uncertainty of that approximation, it's the best option you may have, unless you are "sitting" in the browser (plugin). (03 Apr '13, 03:24) Kurt Knochner ♦ @Kurt, yes, even when using the Referer header to link requests, it is still an approximation. I might have made that more clear in my answer :-) (03 Apr '13, 03:47) SYN-bit ♦♦ never mind ;-) (08 Apr '13, 07:57) Kurt Knochner ♦ |
Thanks.Sounds good. But could you tell me how can i make Referer?
The “Referer:” header is added by your browser, you don’t have to make it yourself. Have a look at the headers of the http requests in your tracefile and you will see the “Referer:” header :-)