RETR downloading zip File from ftp not writing - python-3.x

I am trying to donwload a huge zip file (~9Go zipped and ~130GO unzipped) from an FTP with python using the ftplib library but unfortunately when using the retrbinary method, it does create the file in my local diretory but it is not writing into the file. After a while the code runs, I get an timeout error. It used to work fine before, but when I tried to go deeper in the use of sockets by using this code it does not work anymore. Indeed, as the files I am trying to download are huge I want to have more control with the connection to prevent timeout error while downloading the files. I am not very familar with sockets so I may have misused it. I have been searching online but did not find any problems like this. (I tried with smaller files too for test but still have the same issues)
Here are the function that I tried but both have problems (method 1 is not writing to file, method 2 donwloads file but I can't unzip it)
import time
import socket
import ftplib
import threading
# To complete
filename = ''
local_folder = ''
ftp_folder = ''
host = ''
user = ''
mp = ''
# timeout error in method 1
def downloadFile_method_1(filename, local_folder, ftp_folder, host, user, mp):
try:
ftp = ftplib.FTP(host, user, mp, timeout=1600)
ftp.set_debuglevel(2)
except ftplib.error_perm as error:
print(error)
with open(local_folder + '/' + filename, "wb") as f:
ftp.retrbinary("RETR" + ftp_folder + '/' + filename, f.write)
# method 2 works to download zip file, but header error when unziping it
def downloadFile_method_2(filename, local_folder, ftp_folder, host, user, mp):
try:
ftp = ftplib.FTP(host, user, mp, timeout=1600)
ftp.set_debuglevel(2)
sock = ftp.transfercmd('RETR ' + ftp_folder + '/' + filename)
except ftplib.error_perm as error:
print(error)
def background():
f = open(local_folder + '/' + filename, 'wb')
while True:
block = sock.recv(1024*1024)
if not block:
break
f.write(block)
sock.close()
t = threading.Thread(target=background)
t.start()
while t.is_alive():
t.join(60)
ftp.voidcmd('NOOP')
def unzip_file(filename, local_folder):
local_filename = local_folder + '/' + filename
with ZipFile(local_filename, 'r') as zipObj:
zipObj.extractall(local_folder)
And the error I get for method 1:
ftplib.error_temp: 421 Timeout - try typing a little faster next time
And the error I get when I try to unzip after using method 2:
zipfile.BadZipFile: Bad magic number for file header
Alos, regarding this code If anyone could explain what this does concerning socketopt too would be helpful:
ftp.sock.setsockopt(socket.SOL_SOCKET, socket.SO_KEEPALIVE, 1)
ftp.sock.setsockopt(socket.IPPROTO_TCP, socket.TCP_KEEPINTVL, 75)
ftp.sock.setsockopt(socket.IPPROTO_TCP, socket.TCP_KEEPIDLE, 60)
Thanks for your help.

Related

Python Selenium: Check if a new file in the download folder is added

I have this when I press on a link it downloads to the download folder.
my Url looks something like so
url='https://vle......ac.uk/pluginfile.php/2814969/mod_page/content/16/Statistics_for_Business_and_Economics_----_%28Unit_I_Introduction%29.pdf'
driver.execute_script("window.open('%s', '_blank')" % URL)
Where the URL is a pdf file that I am trying to download.
I want to write a code that waits until number of files in the download folder increases to move on to the next itteration in the loop.
I wrote this code:
def wait_till_number_of_files_is_byound_the_current_file():
path_download=r'\\Mac\Home\Downloads\*'
list_of_files = glob.glob(path_download)
a=len(list_of_files)
while len(list_of_files)==a:
time.sleep(1)
list_of_files = glob.glob(path_download)
In my for loop I also tried this code
item = WebDriverWait(driver, 10).until(lambda driver: driver.execute_script("window.open('%s', '_blank')" % URL))
but this made the file being pressed infinitely not only once.
The best way, to get around this (I hope there would be a better way) is to use the following function
def download_wait(directory, timeout, nfiles=None):
"""
Wait for downloads to finish with a specified timeout.
Args
----
directory : str
The path to the folder where the files will be downloaded.
timeout : int
How many seconds to wait until timing out.
nfiles : int, defaults to None
If provided, also wait for the expected number of files.
"""
seconds = 0
dl_wait = True
while dl_wait and seconds < timeout:
time.sleep(1)
dl_wait = False
files = os.listdir(directory)
if nfiles and len(files) != nfiles:
dl_wait = True
for fname in files:
if fname.endswith('.crdownload'):
dl_wait = True
seconds += 1
return seconds
In my for loop, I wrote the following
for url in hyper_link_of_files:
# Click on this link
driver.execute_script("window.open('%s', '_blank')" % url)
# time.sleep(2)
download_wait(r'\\Mac\Home\Downloads', 10, nfiles=None)
time.sleep(2)
# move the last download file into the destination folder
Move_File(dest_folder)
I will share my Move_File function for reference to those who are interested in moving the downloaded file into a new destination
def Move_File(path_needed):
# Get the working directory of the downloads folder
path_download=r'\\Mac\Home\Downloads\*'
list_of_files = glob.glob(path_download)
latest_file = max(list_of_files, key=os.path.getctime)
# Copy to the new file into the destination
path_destination=os.path.join(path_needed,os.path.basename(latest_file))
shutil.move(latest_file,path_destination)

Paramiko SSH connection timeout after 3 hours

I develop a Python two scripts to transfer lot of data (~120Go) on my vm, with Paramiko.
My vm is on OVH server.
First script transfert ~ 40Go, and the second script ~ 80Go.
Stack :
Python 3.9.1
Paramiko 2.7.2
SCP 0.13.3
On my both scripts, I use this function to setup SSH connection.
def connect():
transport = paramiko.Transport((target_host, target_port))
transport.connect(None, target_username, target_pwd)
sftp_client = paramiko.SFTPClient.from_transport(transport)
green_print("SSH connected")
return sftp_client, transport
If I create one script which do the two transfer, I'm timeout after 3 hours.
With two distincts script which run in the same time, I'm timeout after 2h30 of transfer.
I already read many many many post on Paramiko, SSH connection, timeout parameter, ClientAliveInterval, etc... But nothing works.
After this time, I have this error
Connexion fermée par l'hôte distant / Connection closed by remote host
Three functions of my script :
def connect():
transport = paramiko.Transport((target_host, target_port))
transport.connect(None, target_username, target_pwd)
sftp_client = paramiko.SFTPClient.from_transport(transport)
green_print("SSH connected")
return sftp_client, transport
def transfert(sftp, vm, object_path):
os.chdir(os.path.split(object_path)[0])
parent = os.path.split(object_path)[1]
try:
sftp.mkdir(vm)
except:
pass
for path, _, files in os.walk(parent):
try:
sftp.mkdir(os.path.join(vm, path))
except:
pass
for filename in files:
sftp.put(os.path.join(object_path, filename),
os.path.join(vm, path, filename))
def job():
green_print("\nProcess start...")
check_folder()
folder = forfiles_method()
vm, lidar, pos = name_path(folder)
sftp, transport = connect()
transfert(sftp, vm, pos)
sftp.close()
transport.close()
minimal reproducible example :
from paramiko.sftp_client import SFTPClient
import paramiko
import os
target_host = 'xx.xx.x.xxx'
target_port = 22
target_username = "xxxxxxx"
target_pwd = 'xxxxxx'
remote_path = "e:/x/" # => on your vm
target_folder = '/folder1' # => on your computer
def connect():
transport = paramiko.Transport((target_host, target_port))
transport.connect(None, target_username, target_pwd)
sftp_client = paramiko.SFTPClient.from_transport(transport)
return sftp_client, transport
def transfert(sftp, remote_path, object_path):
os.chdir(os.path.split(object_path)[0])
parent = os.path.split(object_path)[1]
try:
sftp.mkdir(remote_path)
except:
pass
for path, _, files in os.walk(parent):
try:
sftp.mkdir(os.path.join(remote_path, path))
except:
pass
for filename in files:
sftp.put(os.path.join(object_path, filename),
os.path.join(remote_path, path, filename))
def job():
sftp, transport = connect()
transfert(sftp, remote_path, target_folder)
sftp.close()
transport.close()
The tree structure of my files, and I want to transfer only the "test" folder which contains more than 120GB.
folder / test
I'm new in Python dev.
If someone have a solution, I take it !
So the solution :
subprocess.run(["winscp.com", "/script=" + cmdFile], shell=True)
If winscp.com is not found like command, insert the path like : C:/Program Files (x86)/WinSCP/winscp.com
Write your commandes line in a txt file, here cmdFile.
Links, which can help you :
Running WinSCP command from Python
From Python run WinSCP commands in console
https://winscp.net/eng/docs/commandline

Python Program error - The process cannot access the file because it is being used by another process

I am trying to test a python code which moves file from source path to target path . The test is done using pytest in Python3 . But I am facing a roadblock here. It is that , I am trying to remove the source and target paths at end of code completion. For this I am using a command like shutil.rmtree(path) or os.rmdir(path) . This is causing me the error - " [WinError 32] The process cannot access the file because it is being used by another process". Please help me on this. Below is the python pytest code :
import pytest
import os
import shutil
import tempfile
from sample_test_module import TestCondition
object_test_condition = TestCondition()
#pytest.mark.parametrize("test_value",['0'])
def test_condition_pass(test_value):
temp_dir = tempfile.mkdtemp()
temp_src_folder = 'ABC_File'
temp_src_dir = os.path.join(temp_dir , temp_src_folder)
temp_file_name = 'Sample_Test.txt'
temp_file_path = os.path.join(temp_src_dir , temp_file_name)
os.chdir(temp_dir)
os.mkdir(temp_src_folder)
try:
with open(temp_file_path , "w") as tmp:
tmp.write("Hello-World\n")
tmp.write("Hi-All\n")
except IOError:
print("Error has occured , please check it.")
org_val = object_test_condition.sample_test(temp_dir)
print("Temp file path is : " + temp_file_path)
print("Temp Dir is : " + temp_dir)
shutil.rmtree(temp_dir)
print("The respective dir path is now removed.)
assert org_val == test_value
Upon execution of the code , the below error is popping up :
[WinError32] The process cannot access the file because it is being used by another process : 'C:\Users\xyz\AppData\Local\Temp\tmptryggg56'
You are getting this error because the directory you are trying to remove is the current directory of the process. If you save the current directory before calling os.chdir (using os.getcwd()), and chdir back to that directory before removing temp_dir, it should work.
Your code isn't correctly indented, so here is my best guess at what it should look like.
import pytest
import os
import shutil
import tempfile
from sample_test_module import TestCondition
object_test_condition = TestCondition()
#pytest.mark.parametrize("test_value",['0'])
def test_condition_pass(test_value):
temp_dir = tempfile.mkdtemp()
temp_src_folder = 'ABC_File'
temp_src_dir = os.path.join(temp_dir , temp_src_folder)
temp_file_name = 'Sample_Test.txt'
temp_file_path = os.path.join(temp_src_dir , temp_file_name)
prev_dir = os.getcwd()
os.chdir(temp_dir)
os.mkdir(temp_src_folder)
try:
with open(temp_file_path , "w") as tmp:
tmp.write("Hello-World\n")
tmp.write("Hi-All\n")
except IOError:
print("Error has occured , please check it.")
org_val = object_test_condition.sample_test(temp_dir)
print("Temp file path is : " + temp_file_path)
print("Temp Dir is : " + temp_dir)
os.chdir(prev_dir)
shutil.rmtree(temp_dir)
print("The respective dir path is now removed.)
assert org_val == test_value
Can you try to close the temp file before removing
temp.close()

Python Script Creates Directories In /tmp/, Taking Up System Space

I am running a script that acts as a server, allows two clients to connect to it, and for one specific client to send a message to the server, the server modifies it, then sends it to the other client.
This appears to work, as the receiving client acknowledges that the input was received and is valid. This is a script that I intend to run continuously.
However, a big issue is that my /tmp/ directory is filling up with directories named _M... (The ellipses representing a random string), that contains python modules (such as cryptography, which, as far as I'm aware, I'm not using), and timezone information (quite literally every timezone that python supports). It seems to be creating them very frequently, but I can't identify what in the process exactly is doing this.
I have created a working cleanup bash script that removes files older than 5 minutes from the directory every 5 minutes, however, I cannot guarantee that when I am duplicating this process for other devices, that the directories will have the same name formatting. Rather than create a unique bash script for each process that I create, I'd rather be able to clean up the directories from within the python script, or even better, to prevent the directories from being created at all.
The problem is, I'm not certain of how this is accomplished, and I do not see anything on SO regarding what is creating these directories, nor how to delete them.
The following is my script
import time, socket, os, sys, re, select
IP = '192.168.109.8'
PORT = [3000, 3001]
PID = str(os.getpid())
PIDFILE = "/path/to/pidfile.pid"
client_counter = 0
sockets_list = []
def runCheck():
if os.path.isfile(PIDFILE):
return False
else:
with open(PIDFILE, 'w') as pidfile:
pidfile.write(PID)
return True
def openSockets():
for i in PORT:
s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
s.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR, 1)
s.bind((IP, i))
s.listen(1)
sockets_list.append(s)
def receiveMessage(client_socket):
try:
message = client_socket.recv(2048).decode('utf-8')
if not message:
return False
message = str(message)
return message
except:
return False
def fixString(local_string):
#processes
return local_string
def main():
try:
openSockets()
clients = {}
print(f'Listening for connections on {IP}:{PORT[0]} and {PORT[1]}...')
client_count = 0
while True:
read_sockets, _, exception_sockets = select.select(sockets_list, [], sockets_list)
for notified_socket in read_sockets:
if notified_socket == sockets_list[0] or notified_socket == sockets_list[1]:
client_socket, client_address = sockets_list[client_count].accept()
client_count = (client_count + 1) % 2
sockets_list.append(client_socket)
clients[client_socket] = client_socket
print('Accepted new connection from: {}'.format(*client_address))
else:
message = receiveMessage(notified_socket)
if message is False:
continue
message = fixString(message)
for client_socket in clients:
if client_socket != notified_socket:
if message != "N/A":
client_socket.send(bytes(message, "utf-8"))
for notified_socket in exception_sockets:
sockets_list.remove(notified_socket)
del clients[notified_socket]
time.sleep(1)
except socket.timeout:
for i in sockets_list:
i.close()
os.remove(PIDFILE)
sys.exit()
except Exception as e:
for i in sockets_list:
i.close()
err_details = str('Error in line {}'.format(sys.exc_info()[-1].tb_lineno), type(e).__name__, e)
os.remove(PIDFILE)
print("Exception: {}".format(err_details))
sys.exit()
if __name__ == "__main__":
if runCheck():
main()
else:
pass
How might I set it up so that the python script will delete the directories it creates in the /tmp/ directory, or better, to not create them in the first place? Any help would be greatly appreciated.
As it would turn out, it is PyInstaller that was generating these files. In the documentation, it states that pyinstaller generates this _MEI directory when creating the executable in single-file mode, and it is supposed to delete it as well, but for some reason it didn't.

Linecache getline does not work after my application was installed

I am creating a tool that gives an overview of hundredths of test results. This tool access a log file, checks for Pass and Fail verdicts. When it is a fail, I need to go back to previous lines of the log to capture the cause of failure.
The linecache.getline works in my workspace (Python Run via eclipse). But after I created a windows installer (.exe file) and installed the application in my computer, the linecache.getline returns nothing. Is there something I need to add into my setup.py file to fix this or is it my code issue?
Tool Code
precon:
from wx.FileDialog, access the log file
self.result_path = dlg.GetPath()
try:
with open(self.result_path, 'r') as file:
self.checkLog(self.result_path, file)
def checkLog(self, path, f):
line_no = 1
index = 0
for line in f:
n = re.search("FAIL", line, re.IGNORECASE) or re.search("PASS", line, re.IGNORECASE)
if n:
currentline = re.sub('\s+', ' ', line.rstrip())
finalresult = currentline
self.list_ctrl.InsertStringItem(index, finaltestname)
self.list_ctrl.SetStringItem(index, 1, finalresult)
if currentline == "FAIL":
fail_line1 = linecache.getline(path, int(line_no - 3)) #Get reason of failure
fail_line2 = linecache.getline(path, int(line_no - 2)) #Get reason of failure
cause = fail_line1.strip() + " " + fail_line2.strip()
self.list_ctrl.SetStringItem(index, 2, cause)
index += 1
line_no += 1
The issue was resolved by doing the get_line function from this link:
Python: linecache not working as expected?

Resources