python dpkt pcap how to get protocol? - protocols

I have a lab and I need to find the protocol for each packet of a huge pcap file. I am going to make a dictionary to hold them all but my first step is just to pull the information using dpkt. It looks like ip.get_proto is what I want but I missing some point. I am reading http://www.commercialventvac.com/dpkt.html#mozTocId839997
#!/usr/bin/python
# -*- coding: utf-8 -*-
import dpkt
import socket
import sys
import datetime
import matplotlib.pyplot as ploot
import numpy as arrayNum
from collections import Counter
packets = 0
protocolDist = {}
f = open('bob.pcap')
#f = open('trace1.pcap')
pcap = dpkt.pcap.Reader(f)
print "Maj Version: " , dpkt.pcap.PCAP_VERSION_MAJOR
print "Min Version: " , dpkt.pcap.PCAP_VERSION_MINOR
print "Link Layer " , pcap.datalink()
print "Snap Len: " , pcap.snaplen
# How many packets does the trace contain? Count timestamps
# iterate through packets, we get a timestamp (ts) and packet data buffer (buf)
for ts,buf in pcap:
packets += 1
eth = dpkt.ethernet.Ethernet(buf)
ip = eth.data
# what is the timestamp of the first packet in the trace?
if packets == 1:
first = ts
print "The first timestamp is %f " % (first)
print ip.get_proto
break
# What is the average packet rate? (packets/second)
# The last time stamp
last = ts
print "The last timestamp is %f " % (ts)
print "The total time is %f " % (last - first)
print "There are %d " % (packets)
#print "The packets/second %f " % (packets/(last-first))
# what is the protocol distribution?
# use dictionary
f.close()
sys.exit(0)

Check ip.p
It returns a number corresponding to the protocol number. For ex, UDP has 17.
ot chec
Cheers

If you want to get the ip protocol number, you can use
ip.get_proto(ip.p)
This helper function translates the protocol numbers to a protocol class. Checkout https://www.iana.org/assignments/protocol-numbers/protocol-numbers.xhtml for the official list of IP protocols. Sometimes it's useful to get the representation in a human readable format. I find it useful to use __name__ to get the string.
proto = ip.get_proto(ip.p).__name__
print(proto)
>>> 'TCP'

Related

Why is pyperclip not copying result of phone numbers to clipboard

I'm a beginner learning python with Automate The Boring Stuff by Al Sweigart.
I'm currently on the part where he created a program using Regular expression on how to extract emails and phone numbers from documents and have them pasted to another document.
Below is the script:
#! python3
import re
import pyperclip
# Create a regex for phone numbers
phoneRegex = re.compile(r'''
# 08108989212
(\d{11}) # Full phone number
''', re.VERBOSE)
#Create a regex for email a`enter code here`ddressess
emailRegex = re.compile(r'''
# some.+_thing#(\d{2,5}))?.com
[a-zA-Z0-9_.+] + # name part
# # #symbol
[a-zA-Z0-9_.+] + # domain name part
''', re.VERBOSE)
#Get the text off the clipboard
text = pyperclip.paste()
# TODO: Extract the email/phone from this text
extractedPhone = phoneRegex.findall(text)
extractedEmail = emailRegex.findall(text)
allPhoneNumbers = []
for allPhoneNumber in extractedPhone:
allPhoneNumbers.append(allPhoneNumber[0])
print(extractedPhone)
print(extractedEmail)
# Copy the extracted email/phone to the clipboard
results = '\n'.join(allPhoneNumbers) + '\n' + '\n'.join(extractedEmail)
pyperclip.copy(results)
The script is expected to extract, print both phone numbers and email addresses to the terminal which it does. It is also expected to copy the extracted phone number and email addresses to the clipboard automatically, so they can be pasted to another text editor or word document.
Now the problem is, it copies only the email address but converts the phone numbers to 0 when pasted.
What am i not getting right?
Please pardon the errors in my English.
for library: phonenumbers (pypi, source)
Python version of Google's common library for parsing, formatting,
storing and validating international phone numbers.
I think you will need to use this to format those phone numbers.
To be more specific, you'll need to install the package using:
pip install phonenumbers
The main object that the library deals with is a PhoneNumber object. You can create this from a string representing a phone number using the parse function, but you also need to specify the country that the phone number is being dialled from (unless the number is in E.164 format, which is globally unique).
import phonenumbers
x = phonenumbers.parse("+442083661177", None)
print(x)
Country Code: 44 National Number: 2083661177 Leading Zero: False
type(x)
<class 'phonenumbers.phonenumber.PhoneNumber'>
y = phonenumbers.parse("020 8366 1177", "GB")
print(y)
Country Code: 44 National Number: 2083661177 Leading Zero: False
x == y
True
z = phonenumbers.parse("00 1 650 253 2222", "GB") # as dialled from GB, not a GB number
print(z)
Country Code: 1 National Number: 6502532222 Leading Zero(s): False
More information can be found here: https://pypi.org/project/phonenumbers/
The problem is you don't need this part of your code
allPhoneNumbers = []
for allPhoneNumber in extractedPhone:
allPhoneNumbers.append(allPhoneNumber[0])
all it does is to create list with first char (obviously always 0) from all extracted phone numbers.
Then change the result as follows:
results = '\n'.join(extractedPhone) + '\n' + '\n'.join(extractedEmail)

how do i manipulate the path name so it doesn't print out the entire name

I'm new to programming. i need to index three separate txt files. And do a search from an input. When i do a print it gives me the entire path name. i would like to print the txt file name.
i've trying using os.list in the function
import os
import time
import string
import os.path
import sys
word_occurrences= {}
def index_text_file (txt_filename,ind_filename, delimiter_chars=",.;:!?"):
try:
txt_fil = open(txt_filename, "r")
fileString = txt_fil.read()
for word in fileString.split():
if word in word_occurrences:
word_occurrences[word] += 1
else:#
word_occurrences [word] = 1
word_keys = word_occurrences.keys()
print ("{} unique words found in".format(len(word_keys)),txt_filename)
word_keys = word_occurrences.keys()
sorted(word_keys)
except IOError as ioe: #if the file can't be opened
sys.stderr.write ("Caught IOError:"+ repr(ioe) + "/n")
sys.exit (1)
index_text_file("/Users/z007881/Documents/ABooks_search/CODE/booksearch/book3.txt","/Users/z007881/Documents/ABooks_search/CODE/booksearch/book3.idx")
SyntaxError: invalid syntax
(base) 8c85908188d1:CODE z007881$ python3 indexed.py
9395 unique words found in /Users/z007881/Documents/ABooks_search/CODE/booksearch/book3.t
xt
i would like it to say 9395 unique words found in book3.txt
One way to do it would be to split the path on the directory separator / and pick the last element:
file_name = txt_filename.split("/")[-1]
# ...
# Then:
print("{} unique words found in".format(len(word_keys)), file_name)
# I would prefer using an fstring, unless your Python version is too old:
print(f"{len(word_keys)} found in {file_name}")
I strongly advise to change the name of txt_filename into something less misleading like txt_filepath, since it does not contain a file name but a whole path (including, but not limited to, the file name).

How to write console output on text file

I am new to programming and I've searched the webpage for the answer to this question and have tried many possibilities without success. I have currently managed to connect a potentiometer to my raspberry and get values on the console, but I don't know how to save these values onto a text file. This is my code:
#!/usr/bin/python
import spidev
import time
#Define Variables
delay = 0.5
ldr_channel = 0
#Create SPI
spi = spidev.SpiDev()
spi.open(0, 0)
def readadc(adcnum):
# read SPI data from the MCP3008, 8 channels in total
if adcnum > 7 or adcnum < 0:
return -1
r = spi.xfer2([1, 8 + adcnum << 4, 0])
data = ((r[1] & 3) << 8) + r[2]
return data
while True:
ldr_value = readadc(ldr_channel)
print ('---------------------------------------')
print("LDR Value: %d" % ldr_value)
time.sleep(delay)
file = open('data.txt','w')
file.write("LDR Value: %d" % ldr_value)
file.close()`
As you can see from the code, I can get the last value onto data.txt, but not all the values in time. Thank you very much in advance and I am sorry for my "noobness"
When you execute a file in the terminal, you can redirect the outputs of this script to a file like this:
$ python script.py > /the/path/to/your/file
In Python you just have to set the sys.stdout to a file and then, all the prints will be redirected to /the/path/to/your/file.
import sys
sys.stdout = open('/the/path/to/your/file', 'w')
and do not forget to close the file at the end of your script ;)
sys.stdout.close()

How to remove/reduce noise from .wav audio file in python

I have a .wav audio file, I'm working on converting the audio to text. I need to reduce/remove noise to get more accurate result.
Please let me know how to go about it
import wave
import sys
import binascii
ip = wave.open('C:\\Users\\anagha\\Documents\\Python Scripts\\a1.wav', 'r')
op = wave.open('C:\\Users\\anagha\\Documents\\Python Scripts\\r_1.wav', 'w')
op.setparams(ip.getparams())
for i in range(ip.getnframes()):
iframes = ip.readframes(1)
amp = int(binascii.hexlify(iframes))
if amp > 32767:
amp = 65535 - int(binascii.hexlify(iframes))#-ve
print(amp)
else:
amp = int(binascii.hexlify(iframes))#+ve
print(amp)
if amp < 2000:
#make it zero
final_frame = '\x00\x00'
else:
#Keep the frame
final_frame = iframe
op.writeframes(final_frame)
op.close()
ip.close()
getting error :
ValueError: invalid literal for int() with base 10: b'ffff'
You're trying to convert a non- number into int,
Perhaps, you meant.
amp = int(len(binascii.hexlify(iframes)))
While converting to int a hex string which is without leading '0x' specify the base manually. Like follows:
int(binascii.hexlify(iframes), 16)

Need to skip line containing "Value Error"

I'm trying to extract some legacy data from a Teradata server, but some of the records contain weird characters that don't register in python, such as "U+ffffffc2".
Currently,
I'm using pyodbc to extract the data from Teradata
Placing the results into a numpy array (because when I put it directly into pandas, It interprets all of the columns as a single column of type string)
Then I turn the numpy array into a pandas dataframe to change things like Decimal("09809") and Date("2015,11,14") into [09809,"11,14,2015"]
Then I try to write it to a file, where this error occurs
ValueError: character U+ffffffc2 is not in range [U+0000; U+10ffff]
I don't have access to edit this data, so from a client perspective what can I do to skip or, preferably, remove the character before writing it trying to write it to a file and getting the error?
Currently, I have a "try and except" block to skip queries with erroneous data, but I have to query the data in row chunks of at least 100. So if I just skip it, I lose 100 or more lines at a time. As I mentioned before, however, I would prefer to keep the line, but remove the character.
Here's my code. (Feel free to point out any bad practices as well!)
#Python 3.4
#Python Teradata Extraction
#Created 01/28/16 by Maz Baig
#dependencies
import pyodbc
import numpy as np
import pandas as pd
import sys
import os
import psutil
from datetime import datetime
#create a global variable for start time
start_time=datetime.now()
#create global process variable to keep track of memory usage
process=psutil.Process(os.getpid())
def ResultIter(curs, arraysize):
#Get the specified number of rows at a time
while True:
results = curs.fetchmany(arraysize)
if not results:
break
#for result in results:
yield results
def WriteResult(curs,file_path,full_count):
rate=100
rows_extracted=0
for result in ResultIter(curs,rate):
table_matrix=np.array(result)
#Get shape to make sure its not a 1d matrix
rows, length = table_matrix.shape
#if it is a 1D matrix, add a row of nothing to make sure pandas doesn't throw an error
if rows < 2:
dummyrow=np.zeros((1,length))
dummyrow[:]=None
df = pd.DataFrame(table_matrix)
#give the user a status update
rows_extracted=rows+rows_extracted
StatusUpdate(rows_extracted,full_count)
with open(file_path,'a') as f:
try:
df.to_csv(file_path,sep='\u0001',encoding='latin-1',header=False,index=False)
except ValueError:
#pass afterwards
print("This record was giving you issues")
print(table_matrix)
pass
print('\n')
if (rows_extracted < full_count):
print("All of the records were not extracted")
#print the run durration
print("Duration: "+str(datetime.now() - start_time))
sys.exit(3)
f.close()
def StatusUpdate(rows_ex,full_count):
print(" ::Rows Extracted:"+str(rows_ex)+" of "+str(full_count)+" | Memory Usage: "+str(process.memory_info().rss/78
def main(args):
#get Username and Password
usr = args[1]
pwd = args[2]
#Define Table
view_name=args[3]
table_name=args[4]
run_date=args[5]
#get the select statement as an input
select_statement=args[6]
if select_statement=='':
select_statement='*'
#create the output filename from tablename and run date
file_name=run_date + "_" + table_name +"_hist.dat"
file_path="/prod/data/cohl/rfnry/cohl_mort_loan_perfnc/temp/"+file_name
if ( not os.path.exists(file_path)):
#create connection
print("Logging In")
con_str = 'DRIVER={Teradata};DBCNAME=oneview;UID='+usr+';PWD='+pwd+';QUIETMODE=YES;'
conn = pyodbc.connect(con_str)
print("Logged In")
#Get number of records in the file
count_query = 'select count (*) from '+view_name+'.'+table_name
count_curs = conn.cursor()
count_curs.execute(count_query)
full_count = count_curs.fetchone()[0]
#Generate query to retrieve all of the table data
query = 'select '+select_statement+' from '+view_name+'.'+table_name
#create cursor
curs = conn.cursor()
#execute query
curs.execute(query)
#save contents of the query into a matrix
print("Writting Result Into File Now")
WriteResult(curs,file_path,full_count)
print("Table: "+table_name+" was successfully extracted")
#print the scripts run duration
print("Duration: "+str(datetime.now() - start_time))
sys.exit(0)
else:
print("AlreadyThere Exception\nThe file already exists at "+file_path+". Please remove it before continuing\n")
#print the scripts run duration
print("Duration: "+str(datetime.now() - start_time))
sys.exit(2)
main(sys.argv)
Thanks,
Maz
If you have only 4-byte unicode points giving an error, this probably may help.
One solution is to register a custom error handler using codecs.register_error, which would filter out error points and then just try to decode:
import codecs
def error_handler(error):
return '', error.end+6
codecs.register_error('nonunicode', error_handler)
b'abc\xffffffc2def'.decode(errors='nonunicode')
# gives you 'abcdef' which's exactly what you want
You may futher impove your handler to catch more complicated errors, see https://docs.python.org/3/library/exceptions.html#UnicodeError and https://docs.python.org/3/library/codecs.html#codecs.register_error for details

Resources