Is it possible to extract wep iv fields with Scapy - scapy

Is it possible to write a Query with Scapy ? What i want is to extract all iv from all paquets of a pcap file. I do not want to filter packets, i also want to extract IV information and i need to extract ssid in the same time because there are multiple ap
Thanks

For the IV part you could do:
import binascii
from scapy.all import *
for pkt in PcapReader('myfile.cap'):
try:
if (pkt[1].iv):
print(binascii.hexlify(pkt[1].iv))
except:
pass

Related

Achieve Hexdump in python

How can I achieve the following in Python 3?
cat file | sigtool –hex-dump | head -c 2048
I’m attempting to read a .ndb database which I have created in order to check for malicious PHP files, however I need to be able to create hex signatures of files in python in order to check against this database.
Try using the binascii.hexlify function:
import binascii
# get the filename from somewhere
# read the raw bytes
with open(filename, 'rb') as f:
raw_bytes = f.read(1024) # two hex digits per byte, so read 1024
# convert to hex
hex_bytes = binascii.hexlify(raw_bytes) # this will be 2048 bytes long
# use the hex, as desired
hex_str = hexbytes.decode('ascii') # hexidecimal uses a subset of ASCII
print(hex_str) # or put this in a database, or whatever
I converted the hex_bytes into a string so I could print them out, but you might omit that if you can use the bytes object returned by hexlify directly (e.g. writing it to a file in binary mode, or saving it directly into a database).

Unable to compute checksum for igmpv3 using scapy

Following is the snippet of my code.
It opens a pcap file called test.
File : https://easyupload.io/w81oc1
Edits a value called as QQIC.
Creates a new pcap file.
from scapy.all import *
from scapy.utils import rdpcap
from scapy.utils import wrpcap
import scapy.contrib.igmpv3
#Read the pcap file
pkt = rdpcap("test.pcap")
#Edit the value of qqic
pkt[0]['IGMPv3mq'].qqic = 30
# Writ it to the pcap file.
#wrpcap("final.pcap",pkt)
All this works fine.
However, when I check the pcap, I get an error stating that the checksum is invalid.
PCAP
Cant figure out a way to re compute the check sum.
When you edit a packet (particularly an explicit packet, that is, a packet that has been read from a PCAP file or a network capture) in Scapy, you have to "delete" the fields that need to be computed again (checksum fields as here, but also sometimes length fields). For that, you can use the del statement:
from scapy.all import *
load_contrib("igmpv3")
# Read the pcap file
pkt = rdpcap("test.pcap")
# Edit the value of qqic
pkt[0]['IGMPv3mq'].qqic = 30
# Force Scapy to compute the IGMP checksum
# XXX the important line is here XXX
del pkt[0][IGMPv3].chksum
# Write it to the pcap file.
wrpcap("final.pcap", pkt)
I have also simplified the imports.

UnicodeDecodeError: 'utf-8' codec can't decode byte 0x8b in position 1: invalid start byte while accessing csv file

I am trying to access csv file from aws s3 bucket and getting error 'utf-8' codec can't decode byte 0x8b in position 1: invalid start byte code is below I am using python 3.7 version
from io import BytesIO
import boto3
import pandas as pd
import gzip
s3 = boto3.client('s3', aws_access_key_id='######',
aws_secret_access_key='#######')
response = s3.get_object(Bucket='#####', Key='raw.csv')
# print(response)
s3_data = StringIO(response.get('Body').read().decode('utf-8')
data = pd.read_csv(s3_data)
print(data.head())
kindly help me out here how i can resolve this issue
using gzip worked for me
client = boto3.client('s3', aws_access_key_id=aws_access_key_id,
aws_secret_access_key=aws_secret_access_key)
csv_obj = client.get_object(Bucket=####, Key=###)
body = csv_obj['Body']
with gzip.open(body, 'rt') as gf:
csv_file = pd.read_csv(gf)
The error you're getting means the CSV file you're getting from this S3 bucket is not encoded using UTF-8.
Unfortunately the CSV file format is quite under-specified and doesn't really carry information about the character encoding used inside the file... So either you need to know the encoding, or you can guess it, or you can try to detect it.
If you'd like to guess, popular encodings are ISO-8859-1 (also known as Latin-1) and Windows-1252 (which is roughly a superset of Latin-1). ISO-8859-1 doesn't have a character defined for 0x8b (so that's not the right encoding), but Windows-1252 uses that code to represent a left single angle quote (‹).
So maybe try .decode('windows-1252')?
If you'd like to detect it, look into the chardet Python module which, given a file or BytesIO or similar, will try to detect the encoding of the file, giving you what it thinks the correct encoding is and the degree of confidence it has in its detection of the encoding.
Finally, I suggest that, instead of using an explicit decode() and using a StringIO object for the contents of the file, store the raw bytes in an io.BytesIO and have pd.read_csv() decode the CSV by passing it an encoding argument.
import io
s3_data = io.BytesIO(response.get('Body').read())
data = pd.read_csv(s3_data, encoding='windows-1252')
As a general practice, you want to delay decoding as much as you can. In this particular case, having access to the raw bytes can be quite useful, since you can use that to write a copy of them to a local file (that you can then inspect with a text editor, or on Excel.)
Also, if you want to do detection of the encoding (using chardet, for example), you need to do so before you decode it, so again in that case you need the raw bytes, so that's yet another advantage to using the BytesIO here.

How to extract the payload of a packet using Pyshark

I am trying to read the payload of all packets in a .pcap file using Pyshark. I am able to open and read the file, access the packets and their other information but I am not able to find the correct attribute/method to use to access the payload of a packet. Any suggestions ? Is there any other way to read packet payloads in .pcap files using python for windows 10 ?
(I tried using Scapy instead of Pyshark, but apparently there is some issue with running Scapy on Windows, it does not work on my system as well)
I found these lines in different code snippets of pyshark projects on the Internet and on StackOverflow. I tried them but none of them work :
import pyshark
cap = pyshark.FileCapture('file.pcap')
pkt = cap[1]
#for other information
print(pkt.tcp.flags_ack) #this works
print(pkt.tcp.flags_syn) #this works
print(pkt.tcp.flags_fin) #this works
#for payload
print(pkt.tcp.data) #does not work, AttributeError
print(pkt.tcp.payload) #does not work, AttributeError
print(pkt.data.data) #does not work, AttributeError
This code will print the value associated with the field name tcp.payload.
capture = pyshark.FileCapture(pcap_file, display_filter='tcp')
for packet in capture:
field_names = packet.tcp._all_fields
field_values = packet.tcp._all_fields.values()
for field_name in field_names:
for field_value in field_values:
if field_name == 'tcp.payload':
print(f'{field_name} -- {field_value}')
# outputs
tcp.payload -- \xc2\xb7\xc2\xb7\xc2\xb7\xc2\xb7\xc2\xb7\xc2\xb7\xc2\xb7AP\xc2\xb7\xc2\xb7\xc2\xb7
tcp.payload -- 0x00001e2c
tcp.payload -- 113977858
...
In order to use that API you have to pass appropriate parameter into constructor of 'FileCapture' class:
import pyshark
cap = pyshark.FileCapture('file.pcap', include_raw=True, use_json=True)
pkt = cap[1]
print(pkt.data.data) # Will work
'include_raw' is the key here, but 'use_json' is needed when when 'include_raw' is used.
dir cap[].
This one will give you all accessible attributes related to your capture., look there if there is the payload option.

Decoding/Encoding using sklearn load_files

I'm following the tutorial here
https://github.com/amueller/introduction_to_ml_with_python/blob/master/07-working-with-text-data.ipynb
to learn about machine learning and text.
In my case, I'm using tweets I downloaded, with positive and negative tweets in the exact same directory structure they are using (trying to learn sentiment analysis).
Here in the iPython Notebook I load my data just like they do:
tweets_train =load_files('Path to my training Tweets')
And then I try to fit them with CountVectorizer
vect = CountVectorizer().fit(text_train)
I get
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xd8 in position
561: invalid continuation byte
Is this because my Tweets have all sorts of non standard text in them? I didn't do any cleanup of my Tweets (I assume there are libraries that help with that in order to make a bag of words work?)
EDIT:
Code I use using Twython to download tweets:
def get_tweets(user):
twitter = Twython(CONSUMER_KEY,CONSUMER_SECRET,ACCESS_KEY,ACCESS_SECRET)
user_timeline = twitter.get_user_timeline(screen_name=user,count=1)
lis = user_timeline[0]['id']
lis = [lis]
for i in range(0, 16): ## iterate through all tweets
## tweet extract method with the last list item as the max_id
user_timeline = twitter.get_user_timeline(screen_name=user,
count=200, include_retweets=False, max_id=lis[-1])
for tweet in user_timeline:
lis.append(tweet['id']) ## append tweet id's
text = str(tweet['text']).replace("'", "")
text_file = open(user, "a")
text_file.write(text)
text_file.close()
You get a UnicodeDecodeError because your files are being decoded with the wrong text encoding.
If this means nothing to you, make sure you understand the basics of Unicode and text encoding, eg. with the official Python Unicode HOWTO.
First, you need to find out what encoding was used to store the tweets on disk.
When you saved them to text files, you used the built-in open function without specifying an encoding. This means that the system's default encoding was used. Check this, for example, in an interactive session:
>>> f = open('/tmp/foo', 'a')
>>> f
<_io.TextIOWrapper name='/tmp/foo' mode='a' encoding='UTF-8'>
Here you can see that in my local environment the default encoding is set to UTF-8. You can also directly inspect the default encoding with
>>> import sys
>>> sys.getdefaultencoding()
'utf-8'
There are other ways to find out what encoding was used for the files.
For example, the Unix tool file is pretty good at guessing the encoding of existing files, if you happen to be working on a Unix platform.
Once you think you know what encoding was used for writing the files, you can specify this in the load_files() function:
tweets_train = load_files('path to tweets', encoding='latin-1')
... in case you find out Latin-1 is the encoding that was used for the tweets; otherwise adjust accordingly.

Resources