Hi people! I am finally using wireshark with USB! Cheers! Here is my problem: I am developing a Linux USB V4L2 driver for some hardware. My driver sends some bulk messages to the hardware when the driver is probed. I see those bulk messages on wireshark during capturing. Now the hardware is supposed to act in some specific way when it gets this sequence of bulk messages, but "sometimes" it does not, given that everytime I confirm that the message was actually sent by the driver using wireshark. I am trying to understand where the problem could be happening, so bare with me. Here I list my potential problems 1 - The hardware has a bug? I find it very hard to believe that 2 & 3 may be occuring. Can someone agree or disagree with me on my potential problems and perhaps shed the light on something that I am not seeing? asked 14 Aug '16, 04:38 dawood edited 14 Aug '16, 04:46 sindy showing 5 of 15 show 10 more comments |
Can you publish the capture of the "working" and "non-working" case and provide a link to them?
To answer your question - there is the kernel and the host controller (root hub) between the usbmon and the USB bus; depending on the design of the board you are using, a single chip may contain several root hubs.
Normally, you should see a "S" (submit) type URB, marked as "from host to full endpoint address (x.y.z)", carrying the bulk data you give to the root hub to deliver it to the hardware, and a "C" (complete) type URB, marked as "from endpoint address to host", as soon as the host controller has finished sending them. Due to the relatively slow timing on the USB bus, you'll often see a C immediately followed by a S for the same URB ID, but these two do not belong to the same transfer.
In any case:
Hi Sindy, thanks for your prompt reply. Here is a link to a single dump that has both "working" and "non-working".
https://drive.google.com/open?id=0B9Rd-5KDpqFJdmI5TW5haFdCazQ
First you need to filter for device 4 using this command usb.device_address == 4
What I did was I probed the driver (modprobe) then I removed it (rmmod) several times. Upon probing, the driver does bulk writes requesting the device to perform some operations, and then the driver does bulk reads reading some info from the device. An example of the "working" transaction is in number 1427-1430 (Write, Success, Read, Success). An example of a "non-working" transaction is 1779-1784 (Write, Success, Read, -ENOENT).
Where do I see the "S"s and the "C"s? Are you referring to WireShark or some file in the kernel that I can look at?
Here is one more discovery: If I reset the usb connection by writing 0 then 1 to /sys/bus/usb/devices/bus-device/authorized file, everything goes back to normal until I unload and load the driver several times again.
I am referring to Wireshark. In your file, e.g. in frame 1426, in the dissection pane, after clicking the
USB URB
tree open, you can see two lines marked here with!!!!!!
because it is impossible to use bold in code:USB URB [Source: 3.4.2] [Destination: host] URB id: 0xffff8800a2c74000 URB type: URB_COMPLETE ('C') !!!!!! URB transfer type: URB_BULK (0x03) Endpoint: 0x02, Direction: OUT Device: 4 URB bus id: 3 Device setup request: not relevant ('-') Data: not present ('>') URB sec: 1471262056 URB usec: 943464 URB status: Success (0) !!!!!! URB length [bytes]: 4 Data length [bytes]: 0 [Request in: 1425] [Time from request: 0.000031000 seconds] [bInterfaceClass: Unknown (0xffff)] Unused Setup Header Interval: 0 Start frame: 0 Copy of Transfer Flags: 0x00000000 Number of ISO descriptors: 0
Now if you apply a display filter
!(usb.urb_status == 0) and usb.device_address == 4 and usb.dst == host
, you can see four URBs reporting a non-zero status-ENOENT
where you've asked the hardware to send you something and that something wasn't available. I assume it is correlated with the failure case, is it?As you describe it I would assume the hardware to be nervous from getting the same bulk data again without a reset, but it is not more than an experienced speculation. Have you compared the bulk transfer payload extracted from the capture to be identical between the working and non-working case?
Thanks for the explanation. Yes, all 4 cases are correlated.
I have looked at the bulk transfer payload in both cases. In the non-working case, the data sometimes appears , while other times does not, and sometimes comes out of order. I cannot see any deterministic behaviour.
In my assumption number 2, is it valid to think that there may be a possiblility that the hardware is not getting the bulk write messages although they are listed in wireshark's capture? Is there a way where I can confirm that?
The device has a windows driver, and we have never seen such an issue, so I am inclined to think that there may be a chance that it is not a hardware bug.
I'm afraid I may not have been clear enough, I had in mind to compare between the good and bad cases the data which your driver sends to the hardware.
If you got me that way and so you also talk about the sent data in the captured URBs to appear or not or to come in swapped order, it is an issue of the driver.
usbmon
shows you the URBs in the order the device driver has sent them to the host controller. If they are garbled already there, the device driver is to be blamed.If you compare the responses of the device and they come garbled although the sent data were exactly the same as in the working case, you would need a hardware monitor of the USB bus (which is a box worth hundreds of $) to be able to see what has happened to the sent data between the host controller and the wire.
I see what you mean. Yes, I tried to reduce the bulk payload sent from the driver as much as possible, and yet cause this behavior for comparison purposes. And we compared the data sent by the driver in the good and the bad cases, but we found nothing suspicous. They are even identical.
We also confirmed that the bulk messages sent by the driver are in order, but the usb payload that we read back from the hardware (or what appears to be coming back from hardware) either comes out of order, or does not come at all.
Ok, I see, maybe we can try to get our hands on a hardware monitor of the USB bus. Thanks a lot Sindy!
Maybe you could send the GET DESCRIPTOR each time rather than just once? In the first case in the capture, you send GET DESCRIPTOR request, get the response, exchange a couple of bulk out and bulk in, and then use SET INTERFACE to alternate setting 0 which is the only one available. But then, you start from bulk_out, so maybe the internal context of the device is different and the device treats the same data in the bulk out in a different way, so a GET DESCRIPTOR is necessary to put the state machine in it to the correct state?
Yes, that is a great idea. I was just looking at that. I tried to reset the connection by writing 0 and 1 to the authorized file, and on wireshark I see a GET DESCRIPTOR, SET CONFIGURATION. So maybe that is exactly what I am going to do!
Wow man! It worked! Thanks a lot!
So if I get all that right: in normal case, the issue wouldn't have happened, but as you were rmmod'ing and insmod'ing the driver as you were debugging it, you've confused the hardware by sending bulk outs without resetting the FSM first.
Yes, in the normal circumstance, it does not happen. But should the hardware get confused? Did I mess up the USB protocol standard initialization process? That is why the FSM was out of sync when I insmod and rmmod? Or is this a potential bug?
To me the concept used by the hardware vendor seems messy enough.
The clean way would be to have two alternate settings. Alternate setting 0, which is the default one at device startup, would be used for the exchange of bulk transfers in the initialisation phase, and alternate setting 1 would be used for the operation phase. Here, the philosophy seems to be that the init phase spawns from GET DESCRIPTOR to SET INTERFACE, and the operation phase is from SET INTERFACE onwards.
Are you reverse engineering the protocol from a Windows version? Have you captured the communication on Windows?
I see. No I am not reverse engineering. We developed the chip ourselves but we bought a third party USB IP, and we developed the windows driver inhouse. I could obtain the communication on windows though in the near future. Could this shed the light on something?
If the windows driver is an in-house development, capturing its behaviour could shed light on why you've never experienced the issue on Windows but did on linux :-) I.e. maybe you'll find out that the driver on Windows needs revisiting as well if a reinitialisation similar to rmmod followed by insmod is theoretically possible.
But initially I had in mind that if you were not reverse engineering, the programmer's guide to the device (or to its software stack, I don't know what "IP" means in your "USB IP") should have told you in advance what we had to painfully discover (actually, reverse engineer).
I see, yes I agree. I should be able to obtain the dump in the near future. My explanation was not clear; USB IP = USB intellectual property core (digital circuit that takes care of USB stuff). Yes, the programmer guide should have this information listed in it. I should double check this with them. Thanks a lot though, you have helped a lot!