I don't expect anyone to be able to help me with this, but I didn't expect any help last time I posted a message yet was pleasantly surpised to get a very spot-on answer. This one is a bit tougher though. I've been working on an iSCSI server, mostly just for fun. Been chipping away at it in my spare time. I've managed to get through a lot of things. The Device is seen by Windows Server 2012. I can connect to it. I can initialize a disk. I can create partitions on the disk as far as I can tell (windows's own cache could be lying to itself about some of those things). But the actual formatting of the volume is where I run into trouble. I've implemented all forms of ModeSense, ReadCapacity, Read, Write, Data-In, and Data-out. Also implemented a task-management class which seems to be functioning as intended.... but by golly, I just have no idea why this is still failing. Some clues and relevant information: 2) Windows Server 2012 Successfully completes the "Gather information" step and the "Create new partition" step, but during the "Format Volume" Step reports "Failed to format volume - One or more parameter values passed to the method were invalid." 3) The failure occurs immediately AFTER Packet Number 6384 which I verified with breakpoints. All of that internal stuff kinda seems to be working, but the format still fails. I've been looking at this problem for quite some time and I'm out of ideas and will need to shelve it for a while. Unfortunately, I think that any one of these 6000+ packets could potentially be causing the problem. I wish windows would just tell me WHICH parameter is incorrect. asked 22 Jul '13, 15:36 ev1lr0b0t edited 22 Jul '13, 15:44 |
6 Answers:
The wireshark capture might not show it. But after some prodding and poking and creating volumes of varying sizes, I determined that my WRITE operations were not writing the proper amount of data. I discovered this by comparing the subsequent READ operation that followed the WRITE operation. The fact is that the write operation would actually only write the first block to disk. answered 24 Jul '13, 11:29 ev1lr0b0t would you mind to post one capture file with a failure and one with a successful operation? I wonder if the problem would be visible in the capture file. This would be a good chance to learn something more about iSCSI (for me) ;-) (25 Jul '13, 11:28) Kurt Knochner ♦ I got sorta past that point, but unfortunately I don't yet have 100% success. My current issue is as big of a pain in the butt as my previous issue. Part of my problem is that I cannot find any sample captures that show Read() operations that exceed the MaxRecvDataLength value negotiated during login. It kinda seems like that for some reason my login negotiation is being ignored by the client, even though it matches the sample I used practically byte for byte. When talking to the Target that I reverse engineered, the client never requests more than 128 blocks of data per read (65536 bytes) which also matches the MacRecvDataLength negotiated during login. However, when talking to MY target, it will request up to 256 blocks, and my response to this request causes the client to reset the connection with no warnings or messages logged. I respond with Data-In followed by Data-In with the Final bit set. (2 PDUs). Maybe I need to try 3 PDUs sending a separate PDU for the status.... I dunno... running out of ideas. (25 Jul '13, 11:34) ev1lr0b0t |
In frame 6385 the 'client' (172.16.0.82) sends a SCSI Mode Sense. This packet is ACKed in the next frame. However, then the 'server' (172.16.0.33) does not react for 119 seconds!! Then it ACKs a packet that is not in the capture file (see frame 6387). This is totally different to previous behavior in the capture file. If we can assume that the capture file is complete (can we?), there seems to be a bug in the iSCSI server implementation. However, I don't think you will find the reason for the problem just by looking at the capture file, as neither the inactivity for 119 seconds can be explained with the observed behavior in the capture file nor the ACK for an unseen packet. Regards answered 22 Jul '13, 15:57 Kurt Knochner ♦ edited 22 Jul '13, 16:04 The 119 second pause is the breakpoint. I didn't notice the ACK issue, I'm not sure I ever handle them... I think they're handled at the TCP layer if I'm not mistaken, and therefore I have no control over them. Whether I break on it for 119 seconds or not.. it fails in the same place in the same way. So I'm not sure that'll help me. Maybe I can find some insight with your help, however. Thanks for looking. -- Jason (22 Jul '13, 16:30) ev1lr0b0t
Ah, that's O.K. then.
Yes, and the TCP layer should not ACK something it did not receive, if that was the case !?! Anyway, I'm not sure if this could be caused by a bug in your code as well.
Can you please post a capture file without a breakpoint and 'add' some information about the frame where you think the problem shows up? (22 Jul '13, 16:38) Kurt Knochner ♦ I will, but probably won't be able to get to it until tomorrow. Thanks again. (22 Jul '13, 17:35) ev1lr0b0t 119 seconds delay would cause the initiator's upper layer timer to expire and it should perform some type of SCSI layer error recover if a frame was not received. However, if there there was some type of protocol error in the frame that was not captured then it would be normal of the initiator to close the TCP connection provided that the 2 devices were operating at error recovery level 0. NOT the iSCSI standard requires devices that only support error level 0 to close the connection when a protocol error is encountered (14 Nov '16, 03:10) LouD |
We've run into the same issue when assigning SAS disks (LUN's) to a server. Turned out that when automount is disabled (which we had because we don't want to see all SAS disks as drive letters since they're mounted in directories) this happened. So enabling automount through diskpart fixed the issue immediately. After assigning the disks to the directories we disabled automount again and all was well. answered 18 May '15, 03:44 binbash |
For what it's worth, there was a bug in version 1.0.37 (and presumably earlier) of "tgt" on linux that prevented Windows initiators from being able to format. (http://lists.wpkg.org/pipermail/stgt/2013-May/012368.html) This also seems to have caused problems with file contents for a CentOS 7 guest running with the iscsi device as a root disk in OpenStack. Other guests seem to be fine. answered 25 May '15, 10:43 cbf123 |
Hi I faced the same issue and finally I found that using Microsoft bridged NIC was behind all issues, I removed all NIC from bridge and gave dedicated IPs, this reaolved it with immediate action. Regards, Ayed answered 13 Nov '16, 06:47 Ayed |
Sorry, I am reposting this because I posted it in a wrong location 119 seconds delay would cause the initiator's upper layer timer to expire and it should perform some type of SCSI layer error recover if a frame was not received. Since there is no SCSI layer error recovery I am guessing the frame was not sent. Unless there was some type of iSCSI protocol error in the frame - then it would be normal of the initiator to close the TCP connection provided that the 2 devices were operating at error recovery level 0. NOT the iSCSI standard requires devices that only support error level 0 to close the connection when a protocol error is encountered. answered 14 Nov '16, 04:48 LouD edited 14 Nov '16, 06:34 sindy |
The posted capture file ends at frame 6442
BTW: What was your user name last time?
I fixed the errant packet numbers in an edit. The exported packet numbers were different than what was displayed on my screen. I was unable to login with the account I used before for some reason. evilrobot.
BTW thanks for looking. 6384 is the last packet before the error (a reply to a Read(10) request that is 128 blocks. --Jason