Write pcap file about TCP traffic of a web-crawler - python-3.x

url request and sniff(count=x) don't work together. sniff(count) is waiting for x packets to sniff, and though I have to put the line before the url-request it blocks the program, the url-request never starts and it never sniffs any packet.
When I opened 2 Windows in ubuntu command line, it worked. In the first window I activated the interactive mode of python and activated the sniffer. After doing that, I started the web-crawler int the second window and the sniffer in the 1st window received the packets correctly and put it on the screen / into a pcap-file.
Now the easiest way would be to write 2 scripts and start them from 2 different Windows, but I want to do the complete work in one script: Webcrawling, sniffing the packets and putting them into a pcap-file
Here is the code that does not work:
class spider():
…
def parse():
a = sniff(filter="icmp and host 128.65.210.181", count=1)
req = urllib.request.urlopen(self.next_url.replace(" ",""))
a.nsummary()
charset = req.info().get_content_charset()
Now the first line blocks the program, waiting 4 the packet to come in, what cannot do so because only in the next line the request is done. Swapping the lines also doesn't work. I think that the only way to resolve the problem is to work with paralessisms, so I've also tried this:
class protocoller():
...
def run(self):
self.pkt = sniff(count=5) # and here it blocks
…
prot = protocoller()
Main.thr = threading.Thread(target=prot.run())
Main.thr.start()
I Always thought that the thread is running indipendently from the main program, but it blocks it as if it would be part of it. Any suggestions?
So what I would need is a solution in which the web-crawler and the IP/TCP protocoller based on scapy are running independently from each other.
Could the sr()-function of scapy be an alternative?
https://scapy.readthedocs.io/en/latest/usage.html
Is it possible to put the request manually in the packet and to put the received packet into the pcap-file?

Your example doesn't show what's going on in other threads so I assume you've got a second thread to do the request etc. If all that is in order the obvious error is here:
Main.thr = threading.Thread(target=prot.run())
This executes the function prot.run and passes the result to the target parameter of Thread. It should be:
Main.thr = threading.Thread(target=prot.run)
This passes the function itself into Thread

The other answer works great.
FYI, Scapy 2.4.3 also has a native way of doing this:
https://scapy.readthedocs.io/en/latest/usage.html#asynchronous-sniffing

Related

^HV Only returning some values

I'm trying to print some RFID tags and retrieve their TIDs to store them in my system and know which tags have been printed. Right now I'm reading the TID and sending it back to my computer (connected via USB with the my ZT421 printer) with the following code:
^RFR,H,0,12,2^FN0^FS^FH_^HV0,24,,_0D_0A,L^FS
^RFW,H,2,12,1^FD17171999ABABABAAAAAAAAAB^FS
This is repeated for each tag that I'm printing. However, when printing 10 tags, I only get 9 TIDs. If after that I try to print 7 tags, I still get 9 TIDs. To be honest I'm a bit lost now, because even trying to use the code examples from the ZPL manual (I've tried the ^RI instruction also) it doesn't seem to work.
The communication with the printer is beeing done through Zebra Setup Utilities' direct communication tool.
I tried to retrieve each printed tag TID with:
^RFR,H,0,12,2^FN0^FS^FH_^HV0,24,,_0D_0A,L^FS
^RFW,H,2,12,1^FD17171999ABABABAAAAAAAAAB^FS
but I always get 9 TIDs.
I also tried getting the TID with the ZPL manual example for the ^RI command:
^XA
^FO20,120^A0N,60^FN0^FS
^RI0,,5^FS
^HV0,,Tag ID:^FS
^XZ
And I got absolutely nothing returned to the computer, just a mssage saying "Tag ID:" and no value shown.
I would really appreciate some help with this...
Thanks in advance!
I've fixed the issue, but I'm going to leave the solution here just in case someone else is facing the same problem.
I thought that maybe it wasn't a code issue, but something related to the computer-printer communication. It turned out to be the case. The Zebra Setup Utilities program has a button that says "options". If you click it, a new screen will open and there you can configure the seconds that the program will wait for the printer response (in this case through USB). By default it's set to 5, i changed this value to 100, which is the maximum. This meant that instead of just printing and retrieving the TIDs of 6-9 tags, now I can do it for about 100.
This is not amazing because in my case it implied creating 25 files for the 2500 tags I had to print and store the TIDs, however it's far better than before.

Detecting when a child process is waiting for stdin

I am making a terminal program that is able to run any executable (please ignore safety concerns). I need to detect when the child process is waiting for the user input (from stdin). I start the child process using:
process = subprocess.Popen(command, close_fds=False, shell=True, **file_descriptors)
I can think of 2 ways of detecting if the child process is waiting for stdin:
Writing a character then backspace and checking if the child has processed those 2 bytes. But here it says that "CMD does support the backspace key". So I need to find a character that when printed to the screen will delete what ever is in the stdin buffer in the command prompt.
The second method is to use the pywin32 library and use the WaitForInputIdle function as described here. I looked at the source code for the subprocess library and found that it uses pywin32 and it keeps a reference to the process handle. So I tried this:
win32event.WaitForInputIdle(proc._handle, 100)
But I got this error:
(1471, 'WaitForInputIdle', 'Unable to finish the requested operation because the specified process is not a GUI process.')
Also in the windows api documentation here it says: "WaitForInputIdle waits only once for a process to become idle; subsequent WaitForInputIdle calls return immediately, whether the process is idle or busy.". I think that means that I can't use the function for its purpose more than once which wouldn't solve my problem
Edit:
This only needs to work on Windows but later I might try to make my program computable with Linux as well. Also I am using pipes for the stdin/stdout/stderr.
Why I need to know if the child is waiting for stdin:
Currently, when the user presses the enter key, I send all of the data, that they have written so far, to stdin and disable the user from changing it. The problem is when the child process is sleeping/calculating and the user writes some input and wants to change it before the process starts reading from stdin again.
Basically lets take this program:
sleep(10)
input("Enter value:")
and lets say that I enter in "abc\n". When using cmd it will allow me to press backspace and delete the input if the child is still sleeping. Currently my program will mark all of the text as read only when it detects the "\n" and send it to stdin.
class STDINHandle:
def __init__(self, read_handle, write_handle):
self.handled_write = False
self.working = Lock()
self.write_handle = write_handle
self.read_handle = read_handle
def check_child_reading(self):
with self.working:
# Reset the flag
self.handled_write = True
# Write a character that cmd will ignore
self.write_handle.write("\r")
thread = Thread(target=self.try_read)
thread.start()
sleep(0.1)
# We need to stop the other thread by giving it data to read
if self.handled_write:
# Writing only 1 "\r" fails for some reason.
# For good measure we write 10 "\r"s
self.write_handle.write("\r"*10)
return True
return False
def try_read(self):
data = self.read_handle.read(1)
self.handled_write = False
def write(self, text):
self.write_handle.write(text)
I did a bit of testing and I think cmd ignores "\r" characters. I couldn't find a case where cmd will interpret it as an actual character (like what happened when I did "\b"). Sending a "\r" character and testing if it stays in the pipe. If it does stay in the pipe that means that the child hasn't processed it. If we can't read it from the pipe that means that the child has processed it. But we have a problem - we need to stop the read if we can't read from stdin otherwise it will mess with the next write to stdin. To do that we write more "\r"s to the pipe.
Note: I might have to change the timing on the sleep(0.1) line.
I am not sure this is a good solution but you can give it a try if interested. I just assumed that we execute the child process for its output given 2 inputs data and TIMEOUT.
process = subprocess.Popen(command, close_fds=False, shell=True, **file_descriptors)
try:
output, _ = process.communicate(data, TIMEOUT)
except subprocess.TimeoutExpired:
print("Timeout expires while waiting for a child process.")
# Do whatever you want here
return None
cmd_output = output.decode()
You can find more examples for TimeoutExpired here.

How to get the git action on git receive-pack piped on python 3?

By example, have a simple pipe on python 3:
processHandler = subprocess.Popen(
[ 'git-receive-pack', '/tmp/repo.git' ],
stdin=sys.stdin,
stdout=sys.stdout
)
processHandler.communicate()
processHandler.wait()
The python script replace the original git-receive-pack command (from ssh) and python script call to git-receive-pack with a simple pipe. This works fine, but, need detect the commit data, by example: the branch, actions (commit, merge, etc).
The communication is not always separated with newlines and the data transfer can be in both ways, so I cannot do a while to get the whole request and then send it to the execution of the command since there can be more than one request and more than one answer, so the use of pipe is required.
Have two problems: 1. I don't know the plaintext protocol that is transferred between the client and the server application, so I don't know how to create a regular expression that takes these values. 2. I cannot intercept the input and output data of the process because the pipes of the same system are established, if I redirect stdout to a file then it will not be able to send the response bytes to the client.
How can I intercept the sys.stdin and sys.stdout without affecting the pipe between python and git?

Multithreading crawler get slower and slower after running for some time

I wrote a multithreaded web crawler under Windows. The libraries that I used were requests and threading. I found the program became slower and slower after running for some time (about 500 pages). When I stop the program and run again, the program speeds up again. It seems that there are many pending connections, causing the slowdown. How should I manage the problem?
My code:
import requests, threading,queue
req = requests.Session()
urlQueue = queue.Queue()
pageList = []
urlList = [url1,url2,....url500]
[urlQueue.put(i) for i in urlList]
def parse(urlQueue):
try:
url = urlQueue.get_nowait()
except:
break
try:
page = req.get(url)
pageList.append(page)
except:
continue
if __name__ == '__main__':
threadNum = 4
threadList = []
for i in threadNum:
t = threading.Thread(target=(parse),args=(urlQueue,))
threadList.append(t)
for thread in threadList:
thread.start()
for thread in threadList:
thread.join()
I searched for the problem. An answer told that it was the reuse and recycling problem of TCP under Linux. I don't understand that answer very well. The answer is below. I translated the answer from the Chinese.
Type command in Linux shell: netstat -n | awk '/^tcp/ {++S[$NF]} END {for(a in S) print a, S[a]}'
Found the TIME_WAIT is nearly 2W. So, there must be many TCP connections.
Use the following code to set the reuse time and recycling time, respectively of TCP:
echo "1" > /proc/sys/net/ipv4/tcp_tw_reuse, echo "1" > /proc/sys/net/ipv4/tcp_tw_recycle
That answer seems correct. It should be a network problem. How should I solve this under Windows.
The multithreaded crawler will exhaust the TCP connections. We need to set the TcpTimedWaitDelay to quickly reuse and recycle the TCP connections. We can solve the problem by manually changing the regedit or typing the code.
How to do it on Windows with code:
(You need to run the code as an administrator, or otherwise, an error would be raised.)
import win32api,win32con
key = win32api.RegOpenKey(win32con.HKEY_LOCAL_MACHINE, r'SYSTEM\CurrentControlSet\Services\Tcpip\Parameters', 0, win32con.KEY_SET_VALUE)
win32api.RegSetValueEx(key, 'TcpTimedWaitDelay', 0, win32con.REG_SZ, '30')
win32api.RegCloseKey(key)
How to do it on Windows manually:
Open RUN, and type regedit
Find: HKEY_LOCAL_MACHINE\System\CurrentControlSet\Services\Tcpip\Parameters
Click Edit - New - Expandable String Value
Create TcpTimedWaitDelay (if this entry already existed, you do not need to create)
Change the value to 30. (The TCP value ranges from 30 to 300 seconds, and the default is 120 seconds. The default value is too
long for multithreading crawler.)
Thank you for all of your guys' contribute to the questions. This helps a lot of people.
Reference site

X3270 Connection and Programming

I'm looking at using a X3270 terminal emulator. I have http://x3270.bgp.nu/ looked over this source material and still don't see how to start using the tool or configure it.
I'm wonder how I can open a terminal and connect. Another question is how could I integrate this into a python program?
edit:
here is a snippet:
em = Emulator()
em.connect(ip)
em.send_string('*user name*')
em.exec_command('Tab')
em.send_string('*user password*')
em.send_enter()
em.send_enter()
em.wait_for_field()
em.save_screen("{0}screenshot".format(*path*))
looking at the save screen i see that the cursor hasn't moved? I can move the cursor using
em.move_to(7,53)
but after that i don't get any text sent through. Any Ideas?
Here's what I do; it works 100% of the time:
from py3270 import *
import sys, os
host = "%s" % sys.argv[1].upper()
try:
e = Emulator()
e.connect(host)
e.wait_for_field()
except WaitError:
print "py3270.connect(%s) failed" % (host)
sys.exit(1)
print "--- connection made to %s ---" % (host)`
If you haven't got a network connection to your host, that wait_for_field() call is going to wait for a full 120 seconds. No matter what I do, I don't seem to be able to affect the length of that timeout.
But your user doesn't have to wait that long, just have him kill your script with a KeyboardInterrupt. Hopefully, your user will grow accustomed to success equaling the display of that "--- connection made ..." message so he'll know he's in trouble when/if the host doesn't respond.
And that's a point I need to make: you don't connect to a terminal (as you described), rather you connect to a host. That host can be either a VTAM connection or some kind of LPAR, usually TSO or z/VM, sometimes CICS or IMS, that VTAM will take you to. Each kind of host has differing prompts & screen content you might need to test for, and sometimes those contents are different depending on whose system you're trying to connect to. Your script becomes the "terminal", depending on what you want to show your user.
What you need to do next depends on what kind of system you're trying to talk to. Through VTAM? (Need to select a VTAM application first?) To z/VM? TSO? Are you logging on or DIALing? What's the next keystroke/field you have to use when you're working with a graphic x3270/c3270 terminal? You need to know that in order to choose your next command.
Good luck!
Please read my comment above first - it would be helpful to have more detail as to what you need to do.
After considering that…have you looked at the py3270 package at https://pypi.python.org/pypi/py3270/0.1.5 ? The summary says it talks to x3270.

Resources