This is a static archive of our old Q&A Site. Please post any new questions and answers at ask.wireshark.org.

pyShark crashes due to no proper UTF8 encoding

0
1

Hey folks,

I need to sniff in Pyhton using pyShark on Windows. On UNIX systems I didn't have any problems.

On Windows a strange error occurs:

>>> cap = pyshark.LiveCapture(interface=nwdev)
>>> cap.sniff(timeout=10)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "C:\Users\mu\WinPython-32bit-2.7.10.3\python-2.7.10
\lib\site-packages\pyshark\capture\capture.py", line 109, in load_packets
    self.apply_on_packets(keep_packet, timeout=timeout)
  File "C:\Users\mu\WinPython-32bit-2.7.10.3\python-2.7.10
\lib\site-packages\pyshark\capture\capture.py", line 201, in apply_on_packets
    return self.eventloop.run_until_complete(coro)
  File "C:\Users\mu\WinPython-32bit-2.7.10.3\python-2.7.10
\lib\site-packages\trollius\base_events.py", line 300, in  un_until_complete
    return future.result()
  File "C:\Users\mu\WinPython-32bit-2.7.10.3\python-2.7.10
\lib\site-packages\trollius\futures.py", line 287, in result 
    raise self._exception
lxml.etree.XMLSyntaxError: Input is not proper UTF-8, indicate encoding !
Bytes: 0xE4 0x69 0x73 0x63, line 6, column 69
>>>

I work on a 32 bit Python version.

I even tried - I know that's very dangerous -

reload(sys)
sys.setdefaultencoding('utf8')

right in front of my pyshark call and in the parent processes. Without success.

Hope somebody is more familiar with pyShark running on Win. Thanks in advance!

asked 06 Apr '16, 22:31

elektm's gravatar image

elektm
6124
accept rate: 100%

edited 06 Apr '16, 23:26

Guy%20Harris's gravatar image

Guy Harris ♦♦
17.4k335196


One Answer:

0

After putting a lot of effort into the problem and spending a lot of time on searching for the error on the wrong place, I finally managed it to work.

It seems to be the problem, that the incoming data (in XML format) is not encoded the right way and pyshark does not cast to 'UTF-8'. While debugging it posed that it appeared to be in 'latin-1'.

I did the following:

  • I checked out the source from Github (https://github.com/KimiNewt/pyshark.git)
  • added following line between line 26 + 27 in src\pyshark\tshark\tshark_xml.py:

    xml_pkt = xml_pkt.decode('latin-1')

  • python setup.py install and done

Took a lot of work and energy, but finaly solved for me!

answered 14 Apr '16, 23:52

elektm's gravatar image

elektm
6124
accept rate: 100%

edited 14 Apr '16, 23:55