New to this, so apologies. I am trying to download some files using the requests module in Python (3.7), and getting a syntax error on the iter_content, and I cannot work out why. Drawing lots from Automate the Boring Stuff. Here is the relevant block:
# Collecting a list of the relevant lecture files to download:
lectureList = []
for item in elems:
if '/lectures/esm' in str(item):
urlIWant = item.get('href')
finalUrl = 'https://nworbmot.org/courses/esm-2020' + urlIWant[1:]
lectureList.append(finalUrl)
# Download lecture PDFs to 'Lectures' folder
os.makedirs('Lectures', exist_ok = True)
print('Downloading Lecture Files...')
for lectureUrl in lectureList:
res = requests.get(lectureUrl)
print(f'Dowloading file: {lectureUrl}')
res.raise_for_status()
downloadFile = open(os.path.join('Lectures', os.path.basename(lectureUrl), 'wb')
for chunk in res.iter_content(chunk_size=10000): #This is the line that gets the syntax error
downloadFile.write(chunk)
downloadFile.close()
print('Done')
It is the fourth from last line that that is throwing the error. I have tried removing and changing chunk_size, indentations, checked typos. Clearly missing something... thanks.
Related
# ***** Reading the Data********
if processed_first:
#Reading all the features and labels for this chunk
shared_list = []
p = threading.Thread(target=read_lab_fea, args=(cfg_file, is_production, shared_list, output_folder))
p.start()
p.join()
data_name = shared_list[0]
data_end_index = shared_list[1]
fea_dict = shared_list[2]
lab_dict = shared_list[3]
arch_dict = shared_list[4]
data_set = shared_list[5]
enter image description here
First I did run kaldi's run.sh file
When I did that, I corrected cmd.sh's contents.
Original --> call queue.pl
to --> call run.pl
Because I met bug when i run original source
Reference : https://www.google.com/url?q=https://groups.google.com/g/kaldi-help/c/tokwXTLdGFY?pli%3D1&sa=D&source=editors&ust=1631002151871000&usg=AOvVaw1FYQHJEmI-kkAAeAB2tcKt
enter image description here
I found that fea_dict and lab_dict in data_io.py has no shared element. How can I progress the TIMIT tutorial experiments?
I'm doing experiment using cfg/TIMIT_baselines/TIMIT_MLP_mfcc_basic.cfg file. Just correcting absolute directory of linux.
I refered https://github.com/mravanelli/pytorch-kaldi/issues/185
run copy-feats.
I saw--> kaldierror::KaldiFatalError
I am currently working with several XML files that require the text of the element mods:namePart changed. I have created a script that should loop through all the XML files I have specified in a particular directory and make the intended changes. However, when I run the script the changes are not reflected in the new files. It executes as expected, and I even get the "namepart changed" output in my console, but the text I want to replace remains the same. PLEASE HELP!! I am extremely new to coding so any tips/comments are welcome. Here is the code I'm using:
list_of_files = glob.glob('/Users/#####/Documents/test_xml_files/*.xml')
for file in list_of_files: xmlObject = ET.parse(file)
root = xmlObject.getroot()
namespaces = {'mods':'http://www.loc.gov/mods/v3'}
for namePart in root.iterfind('mods:name/mods:namePart', namespaces):
if namePart.text == 'Tsukioka, Kōgyo, 1869-1927':
new_namePart = namePart.text.replace('Tsukioka, Kōgyo, 1869-1927', 'Tsukioka Kōgyo, 1869-1927', 1)
namePart.text == new_namePart
print('namepart changed')
else:
continue
nf = open(os.path.join('/Users/####/Documents/updated_test_directory', os.path.basename(file)), 'wb')
xmlString = ET.tostring(root, encoding="utf-8", method="xml", xml_declaration=None)
nf.write(xmlString)
nf.close()
I am trying to donwload a huge zip file (~9Go zipped and ~130GO unzipped) from an FTP with python using the ftplib library but unfortunately when using the retrbinary method, it does create the file in my local diretory but it is not writing into the file. After a while the code runs, I get an timeout error. It used to work fine before, but when I tried to go deeper in the use of sockets by using this code it does not work anymore. Indeed, as the files I am trying to download are huge I want to have more control with the connection to prevent timeout error while downloading the files. I am not very familar with sockets so I may have misused it. I have been searching online but did not find any problems like this. (I tried with smaller files too for test but still have the same issues)
Here are the function that I tried but both have problems (method 1 is not writing to file, method 2 donwloads file but I can't unzip it)
import time
import socket
import ftplib
import threading
# To complete
filename = ''
local_folder = ''
ftp_folder = ''
host = ''
user = ''
mp = ''
# timeout error in method 1
def downloadFile_method_1(filename, local_folder, ftp_folder, host, user, mp):
try:
ftp = ftplib.FTP(host, user, mp, timeout=1600)
ftp.set_debuglevel(2)
except ftplib.error_perm as error:
print(error)
with open(local_folder + '/' + filename, "wb") as f:
ftp.retrbinary("RETR" + ftp_folder + '/' + filename, f.write)
# method 2 works to download zip file, but header error when unziping it
def downloadFile_method_2(filename, local_folder, ftp_folder, host, user, mp):
try:
ftp = ftplib.FTP(host, user, mp, timeout=1600)
ftp.set_debuglevel(2)
sock = ftp.transfercmd('RETR ' + ftp_folder + '/' + filename)
except ftplib.error_perm as error:
print(error)
def background():
f = open(local_folder + '/' + filename, 'wb')
while True:
block = sock.recv(1024*1024)
if not block:
break
f.write(block)
sock.close()
t = threading.Thread(target=background)
t.start()
while t.is_alive():
t.join(60)
ftp.voidcmd('NOOP')
def unzip_file(filename, local_folder):
local_filename = local_folder + '/' + filename
with ZipFile(local_filename, 'r') as zipObj:
zipObj.extractall(local_folder)
And the error I get for method 1:
ftplib.error_temp: 421 Timeout - try typing a little faster next time
And the error I get when I try to unzip after using method 2:
zipfile.BadZipFile: Bad magic number for file header
Alos, regarding this code If anyone could explain what this does concerning socketopt too would be helpful:
ftp.sock.setsockopt(socket.SOL_SOCKET, socket.SO_KEEPALIVE, 1)
ftp.sock.setsockopt(socket.IPPROTO_TCP, socket.TCP_KEEPINTVL, 75)
ftp.sock.setsockopt(socket.IPPROTO_TCP, socket.TCP_KEEPIDLE, 60)
Thanks for your help.
I've been trying to use the Mafft alignment tool from Bio.Align.Applications. Currently, I've had success writing my sequence information out to temporary text files that are then read by MafftCommandline(). However, I'd like to avoid redundant steps as much as possible, so I've been trying to write to a memory file instead using io.StringIO(). This is where I've been having problems. I can't get MafftCommandline() to read internal files made by io.StringIO(). I've confirmed that the internal files are compatible with functions such as AlignIO.read(). The following is my test code:
from Bio.Align.Applications import MafftCommandline
from Bio import SeqIO
from Bio.Seq import Seq
from Bio.SeqRecord import SeqRecord
import io
from Bio import AlignIO
sequences1 = ["AGGGGC",
"AGGGC",
"AGGGGGC",
"AGGAGC",
"AGGGGG"]
longest_length = max(len(s) for s in sequences1)
padded_sequences = [s.ljust(longest_length, '-') for s in sequences1] #padded sequences used to test compatibilty with AlignIO
ioSeq = ''
for items in padded_sequences:
ioSeq += '>unknown\n'
ioSeq += items + '\n'
newC = io.StringIO(ioSeq)
cLoc = str(newC).strip()
cLocEdit = cLoc[:len(cLoc)] #create string to remove < and >
test1Handle = AlignIO.read(newC, "fasta")
#test1HandleString = AlignIO.read(cLocEdit, "fasta") #fails to interpret cLocEdit string
records = (SeqRecord(Seq(s)) for s in padded_sequences)
SeqIO.write(records, "msa_example.fasta", "fasta")
test1Handle1 = AlignIO.read("msa_example.fasta", "fasta") #alignIO same for both #demonstrates working AlignIO
in_file = '.../msa_example.fasta'
mafft_exe = '/usr/local/bin/mafft'
mafft_cline = MafftCommandline(mafft_exe, input=in_file) #have to change file path
mafft_cline1 = MafftCommandline(mafft_exe, input=cLocEdit) #fails to read string (same as AlignIO)
mafft_cline2 = MafftCommandline(mafft_exe, input=newC)
stdout, stderr = mafft_cline()
print(stdout) #corresponds to MafftCommandline with input file
stdout1, stderr1 = mafft_cline1()
print(stdout1) #corresponds to MafftCommandline with internal file
I get the following error messages:
ApplicationError: Non-zero return code 2 from '/usr/local/bin/mafft <_io.StringIO object at 0x10f439798>', message "/bin/sh: -c: line 0: syntax error near unexpected token `newline'"
I believe this results due to the arrows ('<' and '>') present in the file path.
ApplicationError: Non-zero return code 1 from '/usr/local/bin/mafft "_io.StringIO object at 0x10f439af8"', message '/usr/local/bin/mafft: Cannot open _io.StringIO object at 0x10f439af8.'
Attempting to remove the arrows by converting the file path to a string and indexing resulted in the above error.
Ultimately my goal is to reduce computation time. I hope to accomplish this by calling internal memory instead of writing out to a separate text file. Any advice or feedback regarding my goal is much appreciated. Thanks in advance.
I can't get MafftCommandline() to read internal files made by
io.StringIO().
This is not surprising for a couple of reasons:
As you're aware, Biopython doesn't implement Mafft, it simply
provides a convenient interface to setup a call to mafft in
/usr/local/bin. The mafft executable runs as a separate process
that does not have access to your Python program's internal memory,
including your StringIO file.
The mafft program only works with an input file, it doesn't even
allow stdin as a data source. (Though it does allow stdout as a
data sink.) So ultimately, there must be a file in the file system
for mafft to open. Thus the need for your temporary file.
Perhaps tempfile.NamedTemporaryFile() or tempfile.mkstemp() might be a reasonable compromise.
My main task is to have the user press a Download button and download file "A.zip" from the query directory.
The reason I have a elif request.POST..... is because I have another condition checking if the "Execute" button was pressed. This execute button runs a script. Both POST actions work, and the dir_file is C:\Data\Folder.
I followed and read many tutorials and responses as to how to download a file from Django, and I cannot figure out why my simple code does not download a file.
What am I missing? The code does not return any errors. Does anybody have any documentation that can explain what I am doing wrong?
I am expecting an automatic download of the file, but does not occur.
elif request.POST['action'] == 'Download':
query = request.POST['q']
dir_file = query + "A.zip"
zip_file = open(dir_file, 'rb')
response = HttpResponse(zip_file, content_type='application/zip')
response['Content-Disposition'] = 'attachment; filename=%s' % 'foo_zip'
zip_file.close()
I found out my answer.
After reading through many documentation about this, I left out the most important aspect of this feature which is the url.
Basically, the function download_zip is called by the POST and runs script where the zip is downloaded.
Here is what I ended up doing:
elif request.POST['action'] == 'Download':
return(HttpResponseRedirect('/App/download'))
Created a view:
def download_zip(request):
zip_path = root + "A.zip"
zip_file = open(zip_path, 'rb')
response = HttpResponse(zip_file, content_type='application/zip')
response['Content-Disposition'] = 'attachment; filename=%s' % 'A.zip'
response['Content-Length'] = os.path.getsize(zip_path)
zip_file.close()
return response
Finally in urls.py:
url(r'^download/$', views.download_zip, name='download_zip'),