Return regular expression only one time - python-3.x

I am trying to have regular expression only print the result one time. Is there any suggestions? Since I want the code to read the entire text file, but there are many dates that are the same, but I just want the code to return that date one time only.
code:
import re
filename = set(open('wireshark.txt', 'r'))
pattern_object = re.compile(r'(\d\d\d\d-\d\d-\d\d)')
for line in filename:
match_object = pattern_object.search(line)
if match_object:
regex = match_object.group(1)
print(regex)
Text file:
No. Time Source Destination Protocol Length Info
2 2021-02-12 13:33:12.206424 192.168.1.151 172.217.10.46 QUIC 1392 Initial, DCID=e4267bae554f387d, PKN: 1, CRYPTO, PADDING
Frame 2: 1392 bytes on wire (11136 bits), 1392 bytes captured (11136 bits) on interface \Device\NPF_{28AA034F-AC94-4D4A-9CA9-9AEA5D0EF2C1}, id 0
Ethernet II, Src: Micro-St_0e:cd:34 (00:d8:61:0e:cd:34), Dst: Verizon_fb:8b:82 (20:c0:47:fb:8b:82)
Internet Protocol Version 4, Src: 192.168.1.151, Dst: 172.217.10.46
User Datagram Protocol, Src Port: 57189, Dst Port: 443
QUIC IETF
No. Time Source Destination Protocol Length Info
3 2021-02-12 13:33:12.225610 172.217.10.46 192.168.1.151 QUIC 1392 Initial, SCID=e4267bae554f387d, PKN: 1, ACK, CRYPTO, PADDING
Frame 3: 1392 bytes on wire (11136 bits), 1392 bytes captured (11136 bits) on interface \Device\NPF_{28AA034F-AC94-4D4A-9CA9-9AEA5D0EF2C1}, id 0
Ethernet II, Src: Verizon_fb:8b:82 (20:c0:47:fb:8b:82), Dst: Micro-St_0e:cd:34 (00:d8:61:0e:cd:34)
Internet Protocol Version 4, Src: 172.217.10.46, Dst: 192.168.1.151
User Datagram Protocol, Src Port: 443, Dst Port: 57189
QUIC IETF
No. Time Source Destination Protocol Length Info
4 2021-02-12 13:33:12.225989 192.168.1.151 172.217.10.46 TLSv1.2 146 Application Data
No. Time Source Destination Protocol Length Info
4 2021-04-12 13:33:12.225989 192.168.1.151 172.217.10.46 TLSv1.2 146 Application Data
No. Time Source Destination Protocol Length Info
4 2021-06-12 13:33:12.225989 192.168.1.151 172.217.10.46 TLSv1.2 146 Application Data
No. Time Source Destination Protocol Length Info
4 2021-06-12 13:33:12.225989 192.168.1.151 172.217.10.46 TLSv1.2 146 Application Data
Code execute output:
2021-02-12
2021-02-12
2021-02-12
2021-02-12
2021-02-12
2021-02-12
2021-04-12
2021-06-12
2021-06-12
desire code execute output:
2021-02-12
2021-04-12
2021-06-12

Here's a minimal example for how to get all the unique dates in the file.
Essentially, it's a 4 stage process:
Store the pattern to search for as a string
Open the file and get all the text
Use re.findall() to get all of the text matching the pattern
Use set() to keep only the unique matches
import re
# Make the pattern
pattern = '(\d\d\d\d-\d\d-\d\d)'
# Open the file and read all the text into a variable
with open('wireshark.txt') as file:
text = file.read()
# Search the text for anything matching the pattern
matches = re.findall(pattern, text)
# Print the unique matches
print(set(matches))
The key thing here is the combination of re.findall() (search for multiple matches at once) and set() (to get rid of duplicates.

Related

ACK packets forged issues: "This frame is a (suspected) retransmission"

I'm playing with scapy. I'm trying to forge JUST PSH/ACK and ACK packets in sequence
I coded two tools: A which sends PSH/ACK packets and then sniffs the resulting ACK, writing the sequence in a file to use it later
.....
bitack = random.randrange(1,656787969)
bitseq = random.randrange(1,4294967295)
if os.path.exists('test.txt'):
with open('test.txt','r') as f:
bitseq = int(f.read())
else:
with open('test.txt','w') as f:
f.write(str(bitseq))
.....
text = "Ok"
TSval = int(time.time())
TSecr = TSval
acker = IP(src="127.0.0.1",dst=destinazione"127.0.0.1")/TCP(sport=88,dport=8888,
flags="PA", seq=bitseq, ack=bitack, options=[('Timestamp', (TSval, TSecr))])/text
send(acker)
.....
rx = sniff(filter="host 127.0.0.1 and src port 8888", iface="lo", count=1)
seqcc = rx[0].getlayer(TCP).seq
ackcc = rx[0].getlayer(TCP).ack
with open('test.txt','w') as f:
f.write(str(ackcc))
print("SEQFINALE=", ackcc)
B: which sends ACK packets AFTER it sniffs a PSH/ACK packet from A. I know the ack packets contain text ( in this example the same of A), but this is what I want
....
rx = sniff(filter="host 127.0.0.1 and dst port 8888", iface="lo", count=1)
seqcc = rx[0].getlayer(TCP).seq
print("seq:", seqcc)
ackcc = rx[0].getlayer(TCP).ack
print("ack:", ackcc)
var = rx[0][Raw].load.decode(encoding='utf-8', errors='ignore')
acker = IP(src="127.0.0.1",dst="127.0.0.1")/TCP(sport=8888,dport=88, flags="A",
seq=ackcc, ack=seqcc + int(len(var)), options=[('Timestamp', (TSval, TSecr))])/var
send(acker)
.....
Everything works fine expect that wireshark gives some warning and I don't understand why:
"Expert Info (Note/Sequence): This frame is a (suspected) retransmission"
The first two packets are perfect:
Is there any issue in how I handle the sequence number/ ack number?
This makes me crazy
It is a retransmission. Your capture shows a frame from 8888 to 88 at seq=1 with 52 bytes of data (len=52). If you ever send another frame from 8888 to 88 at seq=1, it's a retransmission. TCP streams are in a single direction: A sends to B, B ACK's what A sent. (in this case, there should be an ACK=53 in a frame from 88 to 8888, either alone or piggybacking data.)

Python - parser over multiline text

my goal is to create a text parser for file containing multilines data:
Applying option loglevel (set logging level) with argument debug.
Successfully parsed a group of options.
Parsing a group of options: input url http://prod7.team.cn/test/tracks-v1a1/mono.
Successfully parsed a group of options.
Opening an input file: http://prod7.team.cn/test/tracks-v1a1/mono
[NULL # 000001e002039000] Opening 'http://prod7.team.cn/test/tracks-v1a1/mono' for reading
[http # 000001e00203a040] Setting default whitelist 'http,https,tls,rtp,tcp,udp,crypto,httpproxy'
[tcp # 000001e00203ba80] Original list of addresses:
[tcp # 000001e00203ba80] Address 92.223.97.22 port 80
[tcp # 000001e00203ba80] Interleaved list of addresses:
[tcp # 000001e00203ba80] Address 92.223.97.22 port 80
[tcp # 000001e00203ba80] Starting connection attempt to 92.223.97.22 port 80
[tcp # 000001e00203ba80] Successfully connected to 92.223.97.22 port 80
[http # 000001e00203a040] request: GET /test/tracks-v1a1/mono HTTP/1.1
User-Agent: Lavf/58.31.101
Accept: */*
Range: bytes=0-
Connection: close
Host: prod7.team.cn
Icy-MetaData: 1
each files contain multiple set of such information.
My target is to find every "Successfully conneted" IP address, followed by the HOST detail, till LF.
In the case mentioned a valid match should be
IP 92.223.97.22 HOST prod7.team.cn
I can easily find the IP using a regex, but I don't understand how to create a valid match, skipping further lines till "host".
UPDATE
If I use this Regex
(connected to).([0-9].(?:\.[0-9]+){3}.port.*.*)
I find:
Match 1
Full match connected to 92.223.97.22 port 80
Group 1. connected to
Group 2. 92.223.97.22 port 80
I'm receiving error if I add .* or .host.* at the end. I'm confused how to add another pattergn to detect 'Host:' and get match until end of row.
https://docs.python.org/3.7/library/re.html#re.MULTILINE
You want to run your regex in MULTILINE mode which should allow you to match over line breaks. Then you could use something like .* to capture the in-between.
A caveat to notice is that you should be sure to check to make a sure you don't run into a new matching start. Like CA.*B would match both CAB and CACB and CACAB. So most likely will want to explicitly check in your regex to not overrun the beginning of a valid match with the .*.
I was able to sort out using nested Regex:
ip_list = []
regex = r'connected(.*?)Host[^\n]+$'
text_as_string = open('C:\\temp\\log.txt', 'r').read()
matches = re.finditer(regex, text_as_string, re.DOTALL | re.MULTILINE)
for matchNum, match in enumerate(matches, start=1):
block = str(match.group())
#print connected IP
ip = re.compile('(connected to).[0-9]+(?:\.[0-9]+){3}.port.*')
for match in re.finditer(ip, block):
f_id=match.group()
#print connected host
host = re.compile('Host[^\n]+$')
for match in re.finditer(host, block):
f_host=match.group()
if f_id =='':
f_id='NA'
if f_host =='':
f_host='NA'
ip_list.append([f_id,f_host])
unique_ip = reduce(lambda l, x: l if x in l else l+[x], ip_list, [])

Cannot read tls section even after calling load_layer('tls') in scapy

This question explains how to read the TLS section of a packet using scapy.
However, my program is not able to read it. All it returns is a bunch of hexadecimal characters
>>> from scapy.all import *
>>> load_layer('tls')
>>> cap = rdpcap('tls.pcap')
>>> p1=cap[0]
>>> p1
<Ether dst=14:cc:20:51:33:ea src=f4:f2:6d:93:51:f1 type=0x800 |<IP version=4 ihl=5 tos=0x0 len=146 id=62970 flags=DF frag=0 ttl=64 proto=tcp chksum=0x50a0 src=192.168.1.143 dst=54.254.250.149 |<TCP sport=49335 dport=50443 seq=549695462 ack=200962336 dataofs=5 reserved=0 flags=PA window=4380 chksum=0xb0ac urgptr=0 |<Raw load="\x17\x03\x01\x00 \xf2\x10\xfd\x95N'\xf2\xaf\x99tp\x93\xbc\xe9\x81w\x91\x1b\xe0\xc9M:\x9a!]\xb0!\xae\xd2\x86\xb0>\x17\x03\x01\x00#d>\x0b\xee\xf0\xab\xded\x02E)\x0e0\xbb\xe6\x82uU\xb22\x87\xd6\xe4n[\x1d\x18\xe8\xd6\x1c\x00N_C\xe6\xdd\xbe\x89#6p\xd9\xaf\x19\xb3s\x07H\xdeF\x88\xdar\x0f\x8a\n!4\xeb\xd3F\xefgH" |>>>>
I want to get the tls record version, tls record length and the tls record content type.
This is screenshot of the packet opened in wireshark.
Can somepne please show me what I am doing wrong and how to read the tls content properly?
I am using Python3.6, and thus am not able to use stable scapy-ssl_tls, which is currently limited to Python 2.
You are so close. You just need to use TLS(pkt.load).
Download a TLS Capture
For this example, use this tls capture from Wireshark's Bugzilla.
We can see that packet 4 is the TLS Client Hello:
tshark -r DNS-over-TLS.pcapng -Y "frame.number==4"
4 0.122267 133.93.28.45 → li280-151.members.linode.com TLSv1 384 Client
Hello 00:00:5e:00:01:18 ← 48:d7:05:df:86:0b
Load with Scapy
Make sure that you have the cryptography library installed, as it's required for loading TLS captures.
>>> import cryptography
>>> # No errors
Reproducing what you have so far with this capture:
>>> from scapy.all import *
>>> load_layer('tls')
>>> cap = rdpcap('DNS-over-TLS.pcapng')
>>> tls_client_hello=cap[3] # Wireshark numbers packets starting at 1, scapy at 0
>>> tls_client_hello
<Ether dst=14:cc:20:51:33:ea src=f4:f2:6d:93:51:f1 type=0x800 |<IP version=4
ihl=5 tos=0x0 len=146 id=62970 flags=DF frag=0 ttl=64 proto=tcp chksum=0x50a0
src=192.168.1.143 dst=54.254.250.149 |<TCP sport=49335 dport=50443 seq=549695462
ack=200962336 dataofs=5 reserved=0 flags=PA window=4380 chksum=0xb0ac urgptr=0 |
<Raw load="\x17\x03\x01\x00
\xf2\x10\xfd\x95N'\xf2\xaf\x99tp\x93\xbc\xe9\x81w\x91\x1b\xe0\xc9M:\x9a!]\xb0!\xa
e\xd2\x86\xb0>\x17\x03\x01\x00#d>\x0b\xee\xf0\xab\xded\x02E)\x0e0\xbb\xe6\x82uU\x
b22\x87\xd6\xe4n[\x1d\x18\xe8\xd6\x1c\x00N_C\xe6\xdd\xbe\x89#6p\xd9\xaf\x19\xb3s\
x07H\xdeF\x88\xdar\x0f\x8a\n!4\xeb\xd3F\xefgH" |>>>>
Note that the part that we want to view is called Raw load. To access this part of the packet, you use tls_client_hello.load. Keep in mind that TLS will take a bytes object that contains the data, but not an entire packet.
>>> TLS(tls_client_hello.load)
<TLS type=handshake version=TLS 1.0 len=313 iv=b'' msg=[<TLSClientHello
msgtype=client_hello msglen=309 version=TLS 1.2 gmt_unix_time=Tue, 18 May 2077
23:20:52 +0000 (3388605652)
random_bytes=d6d533aca04dca42db8b123b0a143dcd580079147122e4de095c15cf sidlen=0
sid='' cipherslen=182 ciphers=[TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384,
< TLS output truncated ... >
Further Reading
I highly recommend looking at Scapy TLS Notebooks that do a good job of documenting scapy+TLS usage.

remove this string ^[[38;1H^[[K^[[7m71%^[[27m^[[38;1H^[[38;1H^[[K

I am trying to remove from a text file the following string as displayed by vim
^[[38;1H^[[K^[[7m71%^[[27m^[[38;1H^[[38;1H^[[K
in this text files i have 7m1000 entries
meaning
^[[38;1H^[[K^[[7m71%^[[27m^[[38;1H^[[38;1H^[[K
^[[38;1H^[[K^[[7m72%^[[27m^[[38;1H^[[38;1H^[[K
^[[38;1H^[[K^[[7m73%^[[27m^[[38;1H^[[38;1H^[[K ...
^[[38;1H^[[K^[[7m1000%^[[27m^[[38;1H^[[38;1H^[[K
I tried with cat/grep/sed..
I tried with the following script
def Process(data):
text = data.split()[0]
#print repr(text)
text = re.sub('[%s]' % re.escape(string.punctuation), '', text)
data.split()[0]= text
return data
Producing
:python Clo.py
IP: 138.42.153.194->10.132.136.42, protocol 6, [38;1H[K[7m86%[27m[38;1H[38;1H[KTCP: sport 3389, dport 58187, seq 978549389, ack 33554488, flags 0x0018 ( ACK PSH), urgent data 0, Flow fastpath, session 911218, wqe index 487973 packet 0x0x80000000416988e6, Packet info: len 107 port 17 interface 17 vsys 0, Packet from interface 256 forwarded to DP0 for tunnel encap
would it be possible to remove ["'\x1b[38;1H\x1b[K\x1b[7m######%\x1b[27m\x1b[38;1H\x1b[38;1H\x1b[KTCP:] directly from VI?
the solution for me was
:%s/^[.*^[//g

Understanding the Scapy "Mac address to reach destination not found. Using broadcast." warning

If I generate an Ethernet frame without any upper layers payload and send it at layer two with sendp(), then I receive the "Mac address to reach destination not found. Using broadcast." warning and frame put to wire indeed uses ff:ff:ff:ff:ff:ff as a destination MAC address. Why is this so? Shouldn't the Scapy send exactly the frame I constructed?
My crafted package can be seen below:
>>> ls(x)
dst : DestMACField = '01:00:0c:cc:cc:cc' (None)
src : SourceMACField = '00:11:22:33:44:55' (None)
type : XShortEnumField = 0 (0)
>>> sendp(x, iface="eth0")
WARNING: Mac address to reach destination not found. Using broadcast.
.
Sent 1 packets.
>>>
Most people encountering this issue are incorrectly using send() (or sr(), sr1(), srloop()) instead of sendp() (or srp(), srp1(), srploop()). For the record, the "without-p" functions like send() are for sending layer 3 packets (send(IP())) while the "with-p" variants are for sending layer 2 packets (sendp(Ether() / IP())).
If you define x like I do below and use sendp() (and not send()) and you still have this issue, you should probably try with the latest version from the project's git repository (see https://github.com/secdev/scapy).
I've tried:
>>> x = Ether(src='01:00:0c:cc:cc:cc', dst='00:11:22:33:44:55')
>>> ls(x)
dst : DestMACField = '00:11:22:33:44:55' (None)
src : SourceMACField = '01:00:0c:cc:cc:cc' (None)
type : XShortEnumField = 0 (0)
>>> sendp(x, iface='eth0')
.
Sent 1 packets.
At the same time I was running tcpdump:
# tcpdump -eni eth0 ether host 00:11:22:33:44:55
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on eth0, link-type EN10MB (Ethernet), capture size 65535 bytes
12:33:47.774570 01:00:0c:cc:cc:cc > 00:11:22:33:44:55, 802.3, length 14: [|llc]

Resources