My codes runs well, but have one flaw. They are not saving accordingly. For example, Let's say I caught 3 jpeg files, when I ran the codes, it saves 3 times on slot 1, 3 times on slot 2, and 3 times on slot 3. So I ended up with 3 same files.
I think there is something wrong with my looping logic?
If I changed for n in range(len(soup_imgs)): to for n in range(len(src)):, the operation saves infinitely of the last jpeg files.
soup_imgs = soup.find(name='div', attrs={'class':'t_msgfont'}).find_all('img', alt="", src=re.compile(".jpg"))
for i in soup_imgs:
src = i['src']
print(src)
dirPath = "C:\\__SPublication__\\"
img_folder = dirPath + '/' + soup_title + '/'
if (os.path.exists(img_folder)):
pass
else:
os.mkdir(img_folder)
for n in range(len(src)):
n += 1
img_name = dirPath + '/' + soup_title + '/' + str({}).format(n) + '.jpg'
img_files = open(img_name, 'wb')
img_files.write(requests.get(src).content)
print("Outputs:" + img_name)
I am amateur in coding, just started not long ago as a hobby of mine. Please give me some guidance, chiefs.
Try this when you are writing your image files:
from os import path
for i, img in enumerate(soup_imgs):
src = img['src']
img_name = path.join(dirPath, soup_title, "{}.jpg".format(i))
with open(img_name, 'wb') as f:
f.write(requests.get(src).content)
print("Outputs:{}".format(img_name))
You need to loop over all image sources, rather than using the last src value from a previous for block.
I've also added a safer method for joining directory and file paths that should be OS independent. Finally, when opening a file, always use the with open() as f: construct - this way Python will automatically close the filehandle for you.
Related
I want to find folders that has no data in it and get thier folder names.
The first and second folders are named randomly in numbers and has data in random folders.
the codes are
path = 'M://done/mesh/*'
FL = glob.glob(path)
FL2 = glob.glob(FL[0] + '/*')
FL2
['M://done/mesh\\41\\23',
'M://done/mesh\\41\\24',
'M://done/mesh\\41\\33',
'M://done/mesh\\41\\34',
'M://done/mesh\\41\\35',
'M://done/mesh\\41\\36',
'M://done/mesh\\41\\43',
'M://done/mesh\\41\\44',
'M://done/mesh\\41\\45',
'M://done/mesh\\41\\46',
'M://done/mesh\\41\\47',
'M://done/mesh\\41\\53',
'M://done/mesh\\41\\54',
'M://done/mesh\\41\\55',
'M://done/mesh\\41\\63',
'M://done/mesh\\41\\64',
'M://done/mesh\\41\\65',
'M://done/mesh\\41\\66',
'M://done/mesh\\41\\67',
'M://done/mesh\\41\\74',
'M://done/mesh\\41\\75',
'M://done/mesh\\41\\76',
'M://done/mesh\\41\\77',
'M://done/mesh\\41\\85',
'M://done/mesh\\41\\86',
'M://done/mesh\\41\\87']
FL2[24][24:26] + FL2[24][27:30] + '0000' # why do I need [24:26}, [27:30]???
finding_files = ['_Caminfo.dat','running.csv']
print(FL2[0] + '/0000/02_output/' + FL2[0][24:26] + FL2[0][27:30] + '0000/' + fn1[0])
'41860000'
You can use os.listdir to find empty folders, like below:
import os
folder_list = ["D:\\test_1", "D:\\test_2"]
empty_folders = []
for folder in folder_list:
try:
if not os.listdir(folder):
empty_folders.append(folder)
except FileNotFoundError:
pass
print(empty_folders)
I have a folder with the following files:
[11111]Text.txt
[22222]Text.txt
[33333]Text.txt
[44444]Text.txt
I need rename the files to remove the [11111] designation from the beginning of the file name, however that results in duplicate file names.
I wrote a basic script out that will strip the [11111] from the first file, and if any duplication occurs with subsequent files it will name the file [Duplicate]_[#]_text.txt where [#] is a random number
When I ran the code, it renamed the first file correctly, and renamed the second file with the required string, but it did not continue with the other files, and instead presented the following error:
FileExistsError: [WinError 183] Cannot create a file when that file already exists: 'Destination/[33333]Text.txt' -> 'Destination/[Duplicate]_[1]Text.txt'
The code below is what I have currently, though i have tried several iterations also
Location = (Destination_Folder)
Dupe_Counter = random.randint(0,255)
for filename in os.listdir(Location):
try:
if filename.startswith("["):
os.rename(Location + filename, Location + filename[7:])
except:
os.rename(Location + filename, Location +'[Duplicate]_' + '[' + str(Dupe_Counter) +']' + filename[7:])
I'm assuming that it's not actually picking up the Dupe_Counter when creating new files, however I'm not 100% sure where i'm going wrong.
Any help appreciated.
In your Dupe_Counter you are generating a random number that can collide with the results sometimes. But on top of that, you are generating the random Dupe_Counter once only.
Try to generate a random number for each iteration.
Location = (Destination_Folder)
for filename in os.listdir(Location):
Dupe_Counter = random.randint(0,255)
try:
if filename.startswith("["):
os.rename(Location + filename, Location + filename[7:])
except:
os.rename(Location + filename, Location +'[Duplicate]_' + '[' + str(Dupe_Counter) +']' + filename[7:])
But I would recommend generating an increasing sequence for renaming the files and better understanding.
Something Like this:
Location = (Destination_Folder)
for filename in os.listdir(Location):
Dupe_Counter = 101
try:
if filename.startswith("["):
os.rename(Location + filename, Location + filename[7:])
except:
os.rename(Location + filename, Location +'[Duplicate]_' + '[' + str(Dupe_Counter) +']' + filename[7:])
Dupe_Counter += 1
Hope I've been of some help.
I am trying to donwload a huge zip file (~9Go zipped and ~130GO unzipped) from an FTP with python using the ftplib library but unfortunately when using the retrbinary method, it does create the file in my local diretory but it is not writing into the file. After a while the code runs, I get an timeout error. It used to work fine before, but when I tried to go deeper in the use of sockets by using this code it does not work anymore. Indeed, as the files I am trying to download are huge I want to have more control with the connection to prevent timeout error while downloading the files. I am not very familar with sockets so I may have misused it. I have been searching online but did not find any problems like this. (I tried with smaller files too for test but still have the same issues)
Here are the function that I tried but both have problems (method 1 is not writing to file, method 2 donwloads file but I can't unzip it)
import time
import socket
import ftplib
import threading
# To complete
filename = ''
local_folder = ''
ftp_folder = ''
host = ''
user = ''
mp = ''
# timeout error in method 1
def downloadFile_method_1(filename, local_folder, ftp_folder, host, user, mp):
try:
ftp = ftplib.FTP(host, user, mp, timeout=1600)
ftp.set_debuglevel(2)
except ftplib.error_perm as error:
print(error)
with open(local_folder + '/' + filename, "wb") as f:
ftp.retrbinary("RETR" + ftp_folder + '/' + filename, f.write)
# method 2 works to download zip file, but header error when unziping it
def downloadFile_method_2(filename, local_folder, ftp_folder, host, user, mp):
try:
ftp = ftplib.FTP(host, user, mp, timeout=1600)
ftp.set_debuglevel(2)
sock = ftp.transfercmd('RETR ' + ftp_folder + '/' + filename)
except ftplib.error_perm as error:
print(error)
def background():
f = open(local_folder + '/' + filename, 'wb')
while True:
block = sock.recv(1024*1024)
if not block:
break
f.write(block)
sock.close()
t = threading.Thread(target=background)
t.start()
while t.is_alive():
t.join(60)
ftp.voidcmd('NOOP')
def unzip_file(filename, local_folder):
local_filename = local_folder + '/' + filename
with ZipFile(local_filename, 'r') as zipObj:
zipObj.extractall(local_folder)
And the error I get for method 1:
ftplib.error_temp: 421 Timeout - try typing a little faster next time
And the error I get when I try to unzip after using method 2:
zipfile.BadZipFile: Bad magic number for file header
Alos, regarding this code If anyone could explain what this does concerning socketopt too would be helpful:
ftp.sock.setsockopt(socket.SOL_SOCKET, socket.SO_KEEPALIVE, 1)
ftp.sock.setsockopt(socket.IPPROTO_TCP, socket.TCP_KEEPINTVL, 75)
ftp.sock.setsockopt(socket.IPPROTO_TCP, socket.TCP_KEEPIDLE, 60)
Thanks for your help.
I am stuck from a couple of days on an issue in my micro Address Book project. I have a function that writes all records from a SQLite3 Db on file in order to open in via OS module, but as soon as I try to open the file, Python gives me the following error:
Error while opening tempfile. Error:startfile: filepath should be string, bytes or os.PathLike, not _io.TextIOWrapper
This is the code that I have to write records on file and to open it:
source_file_name = open("C:\\workdir\\temp.txt","w")
#Fetching results from database and storing in result variable
self.cur.execute("SELECT id, first_name, last_name, address1, address2, zipcode, city, country, nation, phone1, phone2, email FROM contacts")
result = self.cur.fetchall()
#Writing results into tempfile
source_file_name.write("Stampa Elenco Contatti\n")
for element in result:
source_file_name.write(str(element[0]) + "|" + str(element[1]) + "|" + str(element[2]) + "|" + str(element[3]) + "|" + str(element[4]) + "|" + str(element[5]) + "|" + \
str(element[6]) + "|" + str(element[7]) + "|" + str(element[8]) + "|" + str(element[9]) + "|" + str(element[10]) + "|" + str(element[11]) + "\n")
#TODO: Before exiting printing function you MUST:
# 1. filename.close()
# 2. exit to main() function
source_file_name.close()
try:
os.startfile(source_file_name,"open")
except Exception as generic_error:
print("Error while opening tempfile. Error:" + str(generic_error))
finally:
main()
Frankly I don't understand what this error means, in my previous code snippets I've always handled text files without issues, but I realize this time it's different because I am picking my stream from a database. Any ideas how to fix it?
Thanks in advance, and sorry for my english...
Your problem ultimately stems from poor variable naming. Here
source_file_name = open("C:\\workdir\\temp.txt","w")
source_file_name does not contain the source file name. It contains the source file itself (i.e., a file handle). You can't give that to os.startfile(), which expects a file path (as the error also says).
What you meant to do is
source_file_name = "C:\\workdir\\temp.txt"
source_file = open(source_file_name,"w")
But in fact, it's much better to use a with block in Python, as this will handle closing the file for you.
It's also better to use a CSV writer instead of creating the CSV manually, and it's highly advisable to set the file encoding explicitly.
import csv
# ...
source_file_name = "C:\\workdir\\temp.txt"
with open(source_file_name, "w", encoding="utf8", newline="") as source_file:
writer = csv.writer(source_file, delimiter='|')
source_file.write("Stampa Elenco Contatti\n")
for record in self.cur.fetchall():
writer.writerow(record)
# alternative to the above for loop on one line
# writer.writerows(self.cur.fetchall())
I wrote a python script to take audio in 30 minute mp3's and slice it into unix timestamped, second long files. The source audio files are 192kbps, 441000Hz, stero mp3 files.
I want it that way for a service that archives audio from a radio station (where I work) and can deliver it to a user over a given start and end time, to the second. We had the server shut down for an hour for maintenance (we try not to but it happens) and we recorded it over that time using a different computer that saved our audio in 30-minute chunks. Normally this archive server saves the second-long chunks itself without issue.
The function that does the conversion, given a 30 minute input audio file, the directory to save the output chunks in, and the start time of the file as a unix timestamp:
def slice_file( infile, workingdir, start ):
#find the duration of the input clip in millliseconds
duration_in_milliseconds = len(infile)
print ("Converting " + working_file + " (", end="", flush=True)
song = infile
#grab each one second slice and save it from the first second to the last whole second in the file
for i in range(0,duration_in_milliseconds,1*one_second):
#get the folder where this second goes:
arr = datefolderfromtimestamp( int(start) + (int(i/1000)))
#print ("Second number: %s \n" % (int(i/1000)) )
offset = (i + one_second)
current_second = song[i:offset]
ensure_dir(working_directory + "/" + arr[0] + "/" + arr[1] + "/" + arr[2] + "/")
filename = os.path.normpath(working_directory + "/" + arr[0] + "/" + arr[1] + "/" + arr[2] + "/" + str(int(start) + (int(i/1000))) + "-second.mp3")
current_second.export(filename, format="mp3")
#indicate some sort of progress is happening by printing a dot every three minutes processed
if( i % (3*60*one_second) == 0 ):
print ('.', end="", flush=True)
print (")")
My issue is that all the second files converted by this script seem to be longer than a second with on average 70 ms of silence at the start of them. When I download files from my archiver server it gives me all the files concatenated together, so it sounds terrible and glitchy.
Can someone help me out? I'm not sure where this error is coming from.
My full script if you're curious:
http://pastebin.com/fy8EkVSz
Update: Found out the source of this - LAME adds buffers to the start of the file.
See: http://lame.sourceforge.net/tech-FAQ.txt