How to extract the payload of a packet using Pyshark - python-3.x

I am trying to read the payload of all packets in a .pcap file using Pyshark. I am able to open and read the file, access the packets and their other information but I am not able to find the correct attribute/method to use to access the payload of a packet. Any suggestions ? Is there any other way to read packet payloads in .pcap files using python for windows 10 ?
(I tried using Scapy instead of Pyshark, but apparently there is some issue with running Scapy on Windows, it does not work on my system as well)
I found these lines in different code snippets of pyshark projects on the Internet and on StackOverflow. I tried them but none of them work :
import pyshark
cap = pyshark.FileCapture('file.pcap')
pkt = cap[1]
#for other information
print(pkt.tcp.flags_ack) #this works
print(pkt.tcp.flags_syn) #this works
print(pkt.tcp.flags_fin) #this works
#for payload
print(pkt.tcp.data) #does not work, AttributeError
print(pkt.tcp.payload) #does not work, AttributeError
print(pkt.data.data) #does not work, AttributeError

This code will print the value associated with the field name tcp.payload.
capture = pyshark.FileCapture(pcap_file, display_filter='tcp')
for packet in capture:
field_names = packet.tcp._all_fields
field_values = packet.tcp._all_fields.values()
for field_name in field_names:
for field_value in field_values:
if field_name == 'tcp.payload':
print(f'{field_name} -- {field_value}')
# outputs
tcp.payload -- \xc2\xb7\xc2\xb7\xc2\xb7\xc2\xb7\xc2\xb7\xc2\xb7\xc2\xb7AP\xc2\xb7\xc2\xb7\xc2\xb7
tcp.payload -- 0x00001e2c
tcp.payload -- 113977858
...

In order to use that API you have to pass appropriate parameter into constructor of 'FileCapture' class:
import pyshark
cap = pyshark.FileCapture('file.pcap', include_raw=True, use_json=True)
pkt = cap[1]
print(pkt.data.data) # Will work
'include_raw' is the key here, but 'use_json' is needed when when 'include_raw' is used.

dir cap[].
This one will give you all accessible attributes related to your capture., look there if there is the payload option.

Related

Unable to compute checksum for igmpv3 using scapy

Following is the snippet of my code.
It opens a pcap file called test.
File : https://easyupload.io/w81oc1
Edits a value called as QQIC.
Creates a new pcap file.
from scapy.all import *
from scapy.utils import rdpcap
from scapy.utils import wrpcap
import scapy.contrib.igmpv3
#Read the pcap file
pkt = rdpcap("test.pcap")
#Edit the value of qqic
pkt[0]['IGMPv3mq'].qqic = 30
# Writ it to the pcap file.
#wrpcap("final.pcap",pkt)
All this works fine.
However, when I check the pcap, I get an error stating that the checksum is invalid.
PCAP
Cant figure out a way to re compute the check sum.
When you edit a packet (particularly an explicit packet, that is, a packet that has been read from a PCAP file or a network capture) in Scapy, you have to "delete" the fields that need to be computed again (checksum fields as here, but also sometimes length fields). For that, you can use the del statement:
from scapy.all import *
load_contrib("igmpv3")
# Read the pcap file
pkt = rdpcap("test.pcap")
# Edit the value of qqic
pkt[0]['IGMPv3mq'].qqic = 30
# Force Scapy to compute the IGMP checksum
# XXX the important line is here XXX
del pkt[0][IGMPv3].chksum
# Write it to the pcap file.
wrpcap("final.pcap", pkt)
I have also simplified the imports.

RE: Transferring Python2 to Python3 on This Specific Line

I am attempting to change this line to become acceptable by python3 from a python2 set of source:
Here is the error:
TypeError: unicode strings are not supported, please encode to bytes:
'$PMTK251,9600*17\r\n'
Can anyone tell my why this is this way or how I can change it to suit Python3 methods?
It is a GPS set of source in Python2 that still works but I see that all ideas relating to Python2 will be gone from availability and/or is already pretty much done and gone.
So, my ideas were to update that line and others.
In python3, I receive errors relating to bytes and I have currently read about the idea of (arg, newline='') in source when attempting to make .csv files in Python3.
I am still at a loss w/ how to incorporate Python3 in this specific line.
I can offer more about the line or the rest of the source if necessary. I received this source from toptechboy.com. I do not think that fellow ever updated the source to work w/ Python3.
class GPS:
def __init__(self):
#This sets up variables for useful commands.
#This set is used to set the rate the GPS reports
UPDATE_10_sec = "$PMTK220,10000*2F\r\n" #Update Every 10 Seconds
UPDATE_5_sec = "$PMTK220,5000*1B\r\n" #Update Every 5 Seconds
UPDATE_1_sec = "$PMTK220,1000*1F\r\n" #Update Every One Second
UPDATE_200_msec = "$PMTK220,200*2C\r\n" #Update Every 200 Milliseconds
#This set is used to set the rate the GPS takes measurements
MEAS_10_sec = "$PMTK300,10000,0,0,0,0*2C\r\n" #Measure every 10 seconds
MEAS_5_sec = "$PMTK300,5000,0,0,0,0*18\r\n" #Measure every 5 seconds
MEAS_1_sec = "$PMTK300,1000,0,0,0,0*1C\r\n" #Measure once a second
MEAS_200_msec= "$PMTK300,200,0,0,0,0*2F\r\n" #Meaure 5 times a second
#Set the Baud Rate of GPS
BAUD_57600 = "$PMTK251,57600*2C\r\n" #Set Baud Rate at 57600
BAUD_9600 ="$PMTK251,9600*17\r\n" #Set 9600 Baud Rate
#Commands for which NMEA Sentences are sent
ser.write(BAUD_57600)
sleep(1)
ser.baudrate = 57600
GPRMC_ONLY = "$PMTK314,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0*29\r\n" #Send only the GPRMC Sentence
GPRMC_GPGGA = "$PMTK314,0,1,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0*28\r\n"#Send GPRMC AND GPGGA Sentences
SEND_ALL = "$PMTK314,1,1,1,1,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0*28\r\n" #Send All Sentences
SEND_NOTHING = "$PMTK314,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0*28\r\n" #Send Nothing
...
That is the GPS Class Mr. McWhorter wrote for a GPS Module in python2. I am trying to configure this python2 source into a workable python3 class.
I am receiving errors like "needs to be bytes" and/or "cannot use bytes here".
Anyway, if you are handy w/ Python3 and know where I am making mistakes on this source to transfer it over to Python3, please let me know. I have tried changing the source many times to accept bytes and to be read as a utf-string.
Here: Best way to convert string to bytes in Python 3? <<< This seems like the most popular topic on this subject but it does not answer my question so far (I think).
This line simply works when adding a b for bytes in front of the string...like so.
(b'$PMTK251,9600*17\r\n')
That should rid you of that error of TypeError: unicode strings are not supported, please encode to bytes:

requests.get(url).headers.get('content-disposition') returning NONE on PYTHON

Well, I've got the need of automate a process in my job(actually I'm an intern), and I just wondered if I could use Python for such process. I'm still processing my ideas of how to do those stuffs, and now I'm currently trying to understand how to download a file from a web URL using python3. I've found a guide on another website, but there's no active help there. I was told to use the module requests to download the actual file, and the module re to get the real file name.
The code was working fine, but then I tried to add some features like GUI, and it just stopped working. I took off the GUI code, and it didn't work again. Now I have no idea of what to do to get the code working, pls someone helo me, thanks :)
code:
import os
import re
# i have no idea of how this function works, but it gets the real file name
def getFilename(cd):
if not cd:
print("check 1")
return None
fname = re.findall('filename=(.+)', cd)
if len(fname) == 0:
print("check 2")
return None
return fname[0]
def download(url):
# get request
response = requests.get(url)
# get the real file name, cut off the quota and take the second element of the list(actual file name)
filename = getFilename(response.headers.get('content-disposition'))
print(filename)
# open in binary mode and write to file
#open(filename, "wb").write(response.content)
download("https://pixabay.com/get/57e9d14b4957a414f6da8c7dda353678153fd9e75b50704b_1280.png?attachment=")
os.system("pause")```

audio file isn't being parsed with Google Speech

This question is a followup to a previous question.
The snippet of code below almost works...it runs without error yet gives back a None value for results_list. This means it is accessing the file (I think) but just can't extract anything from it.
I have a file, sample.wav, living publicly here: https://storage.googleapis.com/speech_proj_files/sample.wav
I am trying to access it by specifying source_uri='gs://speech_proj_files/sample.wav'.
I don't understand why this isn't working. I don't think it's a permissions problem. My session is instantiated fine. The code chugs for a second, yet always comes up with no result. How can I debug this?? Any advice is much appreciated.
from google.cloud import speech
speech_client = speech.Client()
audio_sample = speech_client.sample(
content=None,
source_uri='gs://speech_proj_files/sample.wav',
encoding='LINEAR16',
sample_rate_hertz= 44100)
results_list = audio_sample.async_recognize(language_code='en-US')
Ah, that's my fault from the last question. That's the async_recognize command, not the sync_recognize command.
That library has three recognize commands. sync_recognize reads the whole file and returns the results. That's probably the one you want. Remove the letter "a" and try again.
Here's an example Python program that does this: https://github.com/GoogleCloudPlatform/python-docs-samples/blob/master/speech/cloud-client/transcribe.py
FYI, here's a summary of the other types:
async_recognize starts a long-running, server-side operation to translate the whole file. You can make further calls to the server to see whether it's finished with the operation.poll() method and, when complete, can get the results via operation.results.
The third type is streaming_recognize, which sends you results continually as they are processed. This can be useful for long files where you want some results immediately, or if you're continuously uploading live audio.
I finally got something to work:
import time
from google.cloud import speech
speech_client = speech.Client()
sample = speech_client.sample(
content = None
, 'gs://speech_proj_files/sample.wav'
, encoding='LINEAR16'
, sample_rate= 44100
, 'languageCode': 'en-US'
)
retry_count = 100
operation = sample.async_recognize(language_code='en-US')
while retry_count > 0 and not operation.complete:
retry_count -= 1
time.sleep(10)
operation.poll() # API call
print(operation.complete)
print(operation.results[0].transcript)
print(operation.results[0].confidence)
for op in operation.results:
print op.transcript
Then something like
for op in operation.results:
print op.transcript

Encoding issue with python3 and click package

When the lib click detects that the runtime is python3 but the encoding is ASCII then it ends the python program abruptly:
RuntimeError: Click will abort further execution because Python 3 was configured to use ASCII as encoding for the environment. Either switch to Python 2 or consult http://click.pocoo.org/python3/ for mitigation steps.
I found the cause of this issue in my case, when I connect to my Linux host from my Mac, the Terminal.app set the SSH session locale to my Mac locale (es_ES.UTF-8) However my Linux host hasn't installed such locale (only en_US.utf-8).
I applied an initial workaround to fix it (but It had many issues, see accepted answer):
import locale, codecs
# locale.getpreferredencoding() == 'ANSI_X3.4-1968'
if codecs.lookup(locale.getpreferredencoding()).name == 'ascii':
os.environ['LANG'] = 'en_US.utf-8'
EDIT: For a better patch see my accepted answer.
All my linux hosts have installed 'en_US.utf-8' as locale (Fedora uses it as default).
My question is: Is there a better (more robust) way to choose/force the locale in a python3 script ? For instance, setting one of the available locales in the system.
Maybe there is a different approach to fix this issue but I didn't find it.
If you have python version >= 3.7, then you should not need to do anything. If you have python 3.6 see the original solution.
EDIT 2017-12-08
I've seen that there is a PEP 538 for py3.7, that will change the entire behavior of python3 encoding management during startup, I think that the new approach will fix the original problem: https://www.python.org/dev/peps/pep-0538/
IMHO the changes targeted to python 3.7 for encoding issues, should have been planed years ago, but better late than never, I guess.
EDIT 2015-09-01
There is an opened issue (enhancement), http://bugs.python.org/issue15216, that will allow to change the encoding in a created (not-used) stream easily (sys.std*). But is targeted to python 3.7 So, we'll have to wait for a while.
Original solution that targets python version 3.6
NOTE: this solution should not be needed for anyone running python version >= 3.7 see PEP 538
Well, my initial workaround had many flaws, I got to pass the click library check about the encoding, but the encoding itself was not fixed, so I get exceptions when the input parameters or output had non-ascii characters.
I had to implement a more complex method, with 3 steps: set locale, correct encoding in std in/out and re-encode the command line parameters, besides I've added a "friendly" exit if the first try to set the locale doesn't work as expected:
def prevent_ascii_env():
"""
To avoid issues reading unicode chars from stdin or writing to stdout, we need to ensure that the
python3 runtime is correctly configured, if not, we try to force to utf-8,
but It isn't possible then we exit with a more friendly message that the original one.
"""
import locale, codecs, os, sys
# locale.getpreferredencoding() == 'ANSI_X3.4-1968'
if codecs.lookup(locale.getpreferredencoding()).name == 'ascii':
os.environ['LANG'] = 'en_US.utf-8'
if codecs.lookup(locale.getpreferredencoding()).name == 'ascii':
print("The current locale is not correctly configured in your system")
print("Please set the LANG env variable to the proper value before to call this script")
sys.exit(-1)
#Once we have the proper locale.getpreferredencoding() We can change current stdin/out streams
_, encoding = locale.getdefaultlocale()
import io
sys.stderr = io.TextIOWrapper(sys.stderr.detach(), encoding=encoding, errors="replace", line_buffering=True)
sys.stdout = io.TextIOWrapper(sys.stdout.detach(), encoding=encoding, errors="replace", line_buffering=True)
sys.stdin = io.TextIOWrapper(sys.stdin.detach(), encoding=encoding, errors="replace", line_buffering=True)
# And finally we need to re-encode the input parameters
for i, p in enumerate(sys.argv):
sys.argv[i] = os.fsencode(p).decode()
This patch solves almost all issues, however it has a caveat, the method shutils.get_terminal_size() raises a ValueError because the sys.__stdout__ has been detached, click lib uses that method to print the help, to fix it I had to apply a monkey-patch on click lib
def wrapper_get_terminal_size():
"""
Replace the original function termui.get_terminal_size (click lib) by a new one
that uses a fallback if ValueError exception has been raised
"""
from click import termui, formatting
old_get_term_size = termui.get_terminal_size
def _wrapped_get_terminal_size():
try:
return old_get_term_size()
except ValueError:
import os
sz = os.get_terminal_size()
return sz.columns, sz.lines
termui.get_terminal_size = _wrapped_get_terminal_size
formatting.get_terminal_size = _wrapped_get_terminal_size
With this changes all my scripts work fine now when the environment has a wrong locale configured but the system supports en_US.utf-8 (It's the Fedora default locale).
If you find any issue on this approach or have a better solution, please add a new answer.
It's an aged thread, however this answer might help other in the future or myself. If it's *nux
env | grep LC_ALL
if it's set, do the follows. That's all of it.
unset LC_ALL
If you are running python 3.6 then you will still get this error. Here is a simple solution that the authors of click recommend:
#!/bin/bash
# before your python code executes set two environment variables
export LANG=en_US.utf8
export LC_ALL=en_US.utf8
NOTE: replace the values with whatever your locale is configured to
NOTE: this solution is even given in the PEP 538 document seen here.
I haven't found this simple method (re-exec script with proper environment before doing anything) so I'll add it for future travellers using old Python version for some reason. Add it bellow imports to be that first :
if os.environ["LC_ALL"] != "C.UTF-8" or os.environ["LANG"] != "C.UTF-8":
os.execve(sys.executable,
[os.path.realpath(__file__)] + sys.argv,
{"LC_ALL": "C.UTF-8", "LANG": "C.UTF-8"})

Resources