Hello!
There is just 1 difference: display filter contains field "test" (English text) or field "тест" (Russian text). How can I fix it? Wireshark (not tshark) handles both display filters properly. asked 06 May '14, 21:59 factorial edited 07 May '14, 07:26 cmaynard ♦♦ |
One Answer:
I'm not sure if this will work or not, but you could try specifying the hex bytes instead of the text. For example, instead of I'm not sure if this is correct, but instead of answered 07 May '14, 07:56 cmaynard ♦♦ showing 5 of 11 show 6 more comments |
Looks like a Unicode issue, is it that the Russian text is a multibyte-character string and your cmd shell isn't handling that?
I didn't know about this opportunity. What's the conception of conversion text to hex and vise versa? I'll try your advice tomorrow and report result.
What's the conception of conversion text to hex and vise versa?
Sorry, but I'm not sure I understand your question.
Sorry, but I'm not sure I understand your question. No, It's my guilt. English isn't native language for me. Sorry:) I'd like to know how did you obtain 74:65:73:74 from "test" and d1:82:d0:b5:d1:81:d1:82 from "тест"?
Looks like a Unicode issue, the Russian text is a multibyte-character string and your cmd shell isn't handling that. I also thinked about it. But I couldn't find solutions in Internet. Do you have any advices?
"test" is the four ASCII characters with values 0x74, 0x65, 0x73, 0x74. Using the method described in the answer @cmaynard found the Russian text to be the bytes he describes.
A further complication is how the Russian text is encoded in the packet you are looking at. Do you know the encoding used?
Do you know the encoding used? UCS-2. May be I can use format of command sort of tshark ... -Y "gsm_sms.sms_text contains ASCII="test"" or tshark ... -Y "gsm_sms.sms_text contains UCS-2="test""?
if the data in the packet is in UCS-2, then you could determine the UCS-2 codepoints for your Russian characters and then use those as the set of bytes to search for.
I've checked tshark -Y "gsm_sms.sms_text contains 74:65:73:74" and -Y "gsm_sms.sms_text contains d1:82:d0:b5:d1:81:d1:82". It works properly. Now I understand how ascii code 74:65:73:74 was obtained. But I can't understand how "тест" was converted to d1:82:d0:b5:d1:81:d1:82. UCS2 is a format that used for coding sms-messages. And I suppose that Wireshark decodes it and saves field "gsm_sms.sms_text" in other code sheme, because in UCS2 "тест" is 442:435:441:442 and it dosn't work. Christopher, explain me, please, how d1:82:d0:b5:d1:81:d1:82 was obtained from "тест"?
Answering my own question. d1:82:d0:b5:d1:81:d1:82 is obtained from "тест" with the aid UTF-8 for CP1251 coding scheme.
Ok, there is operating (suitable) solution - to use an HEX-form of text-field in gsm_sms.sms_text-filter. It isn't comfortably, but anyway I'll be able to write convertor to use in script. Thanks to Christopher and Graham!
Well, there might be an easier way, but I'm glad you've found at least 1 solution. Instead of searching for hex bytes though, maybe you could switch code pages first beforehand, perhaps via chcp.com or something like it? Could you post a small capture file to cloudshark, one that contains the
тест
text? I'm curious if it'll work in my console, with codepage 437.