Hi All I have gone through the packet-tcp.c but I am not sure which section deals with extracting the url if it exists in the tcp packets. I have my whole payload packet but I need this specific function. asked 12 Aug '13, 06:09 newbie14 |
3 Answers:
there are no URLs in TCP itself. URLs are a concept of HTTP. You need to look in packet-http.c UPDATE
O.K. in that case, Wireshark is kind of 'overkill' for you. Please have a look at the following tools (and their code): ngrep, xplico, tcpextract, etc.
Regards answered 12 Aug '13, 06:10 Kurt Knochner ♦ edited 13 Aug '13, 04:38 showing 5 of 9 show 4 more comments |
hi Newbie, Open the wireshark app on your laptop, make sure you have your laptop/pc connected to internet. Then from Wireshark turn on packet capture on the interface card. Open browser and type a url and browse. Stop the packet capture. Open the pcap file and in the search filter type "http", you should be able to see packets on HTTP protocol. answered 12 Aug '13, 21:08 pundalik @the problem I am not going to use wireshark for the capture I am using another tool called pf_ring. So I have capture most data except from the payload I need the url. (12 Aug '13, 21:10) newbie14 So you have your own application running on top of a modified libpcap using pfring? If so you will have to reinvent part of wireshark/tcpdump to parse what you need of all the protocol layers upto and including http, not realy a wireshark question. (12 Aug '13, 21:49) Anders ♦ @yes I am using pfring to capture the packets. I can determine the ports no issue with that just that I need the url parser now which I think is available in wireshark and no point me reinventing the wheel? (12 Aug '13, 22:57) newbie14 1
There is no "URL parser" in Wireshark. There is an HTTP parser in Wireshark, which is in (12 Aug '13, 23:47) Guy Harris ♦♦ @Guy I appreciate your explanation so what should I do is reinvent the wheel cause I need to store data into db which is not feasible via wireshark right. Yes I have seen on the tvbuff which I still dont understand as I am new to it. (13 Aug '13, 01:20) newbie14 |
As @Guy has said, there is a lot of work being done by different parts of Wireshark before the URL is extracted. If you're just interested in the URL's and you assume that each HTTP request is generating a new TCP packet (which usually is true, but the nature of TCP does not make this a necessity) and you assume that the requested URL will fit in one TCP segment (which is not true for networks with small MTU's and large request URL's), then you can skip all reassembly and just parse each TCP packet on it's own. When parsing the payload, look for a pattern like "<method> <url> HTTP/<version>" at the start of each TCP payload. Where <method> can be "GET", "POST", "HEAD", etc. Look for the methods in which you're interested. The <url> should always start with "/" and will not contain spaces. Finally, <version> will be "1.0" or "1.1" currently. In short, anything between "GET " and " HTTP/1.", "POST " and " HTTP/1." or "HEAD " and " HTTP/1." (watch the spaces) will be your URL and should be quite easy to extract. Downsides of this method:
So depending on how fool-proof your tool must be, this could be a simple solution to your problem... If you need 100% exact results, there is nothing you could do but follow all the steps that wireshark is taking (TCP reassembly and fully parsing the reassembled TCP stream to exactly determine where a new request starts). answered 13 Aug '13, 04:08 SYN-bit ♦♦ @SYN-bit so if I understand your answer carefully please correct me,what I need to do now if the its a tcp packet transform all the hex payload value into human readable. Next look for anything between "GET " and " HTTP/1.", "POST " and " HTTP/1." or "HEAD " and " HTTP/1." Is that correct ? (13 Aug '13, 10:54) newbie14 |
I am lost here. So say I got a tcp packet how to decide if it will have url or not ? I will go through packet-http.c where to start ? Where does each of this packet-*** starts from ? Sorry I am very new.
Take a look at the function basic_request_dissector() and what is called therein.
BTW: Why do you need 'that specific function'? Maybe there is a better solution to your problem.
within your own Wireshark dissector or in general?
I have capture packets using another tool. So I want to read the url in those packets where I have the whole payload. So what is your best suggestion? I know there available solution so no point to re invent the wheel. I have seen the basic_request_dissector() but I am not too sure with the parameters passed into it.
Do you have those packets in a pcap file or are you interested in a fly-by analysis?
If it is a pcap file, you can still run tshark and print the payload of the packets, then use some perl/python scripts to search for URLs in the output with regular expressions.
@I am capturing the packets via pf_ring. So its all in hex format. So in my case I am interested for fly-by analysis and later store into database. The issue I can get all the ip layer details but the tool does not do further then. So I need to dissect further layers by myself based on the layer types. Any idea? Everything I prefer to be in C as the capture engine is all in C too.
You first decide whether it's traffic for a protocol such as HTTP that has URLs; if not, it doesn't have a URL. Wireshark decides whether traffic is HTTP based on the TCP port it's going to or from; ports such as 80 and 8080 are assumed to be HTTP.
Then you have to parse the HTTP data to see whether it contains an HTTP request or response and, if it does, extract the request URL from requests and other URLs from responses (e.g., a 301 Moved Permanently response has the URL to which the item has moved).
@GuyHarris ok I have capture the source and destination port. So say now I have either way is 80 then I move to next level. With this port I know it will be http traffic right. So now I need help is on parsing the payloaad to capture the url. Thank you for your insights.
see the UPDATE in my answer
@Kurt thank you for the link let me go through them. I think the second links looks good.