Django file object always 0 bytes when uploaded from python requests - python-3.x

I have been trying to upload a file to django REST using python requests.
I put the file, and some other data, to the server.
r = self.session.put(
f"{hello_url}/shadow_pbem/savefile_api/",
files=test_files,
data={"hash": test_file_hash, 'leader': 78},
headers=good_token_header,
)
I get a 200 response, the model saves all the data correctly as expected, including a correctly named save file in /media, except the save file in /media is always 0 bytes.
This is how I create the file object...
with open(testfile_path, "rb") as testfile:
...and verify the length, which is not 0.
testfile.seek(0, os.SEEK_END)
filesize = testfile.tell()
I create the files object for upload...
test_files = {
"file": ("testfile.zip", testfile, "application/zip")
}
I put some code in the view to verify, and the file object in the view is there, but it is 0 bytes.
here is the relevent part of the view. It seems to work fine, but all files are 0 bytes.
class SaveFileUploadView(APIView):
parser_class = (FileUploadParser,)
def put(self, request):
if "file" not in request.data:
raise ParseError("Empty content")
f = request.data["file"]
print(f"file {f} size:{f.size}")
# prints file testfile.zip size:0
# rest of view works fine...
I have tried with various files and formats, also using post. Files are always 0 bytes.
Any help appreciated I am going crazy....

If you do
testfile.seek(0, os.SEEK_END)
filesize = testfile.tell()
as you say,
you'll need to also rewind back to the start – otherwise there is indeed zero bytes for Requests to read anymore.
testfile.seek(0)

Related

Vertex AI scheduled notebooks doesn't recognize existence of folders

I have a managed jupyter notebook in Vertex AI that I want to schedule. The notebook works just fine as long as I start it manually, but as soon as it is scheduled, it fails. There are in fact many things that go wrong when scheduled, some of them are fixable. Before explaining what my trouble is, let me first give some details of the context.
The notebook gathers information from an API for several stores and saves the data in different folders before processing it, saving csv-files to store-specific folders and to bigquery. So, in the location of the notebook, I have:
The notebook
Functions needed for the handling of data (as *.py files)
A series of folders, some of which have subfolders which also have subfolders
When I execute this manually, no problem. Everything works well and all files end up exactly where they should, as well as in different bigQuery tables.
However, when scheduling the execution of the notebook, everything goes wrong. First, the files *.py cannot be read (as import). No problem, I added the functions in the notebook.
Now, the following error is where I am at a loss, because I have no idea why it does work or how to fix it. The code that leads to the error is the following:
internal = "https://api.************************"
df_descriptions = []
storess = internal
response_stores = requests.get(storess,auth = HTTPBasicAuth(userInternal, keyInternal))
pathlib.Path("stores/request_1.json").write_bytes(response_stores.content)
filepath = "stores"
files = os.listdir(filepath)
for file in files:
with open(filepath + "/"+file) as json_string:
jsonstr = json.load(json_string)
information = pd.json_normalize(jsonstr)
df_descriptions.append(information)
StoreINFO = pd.concat(df_descriptions)
StoreINFO = StoreINFO.dropna()
StoreINFO = StoreINFO[StoreINFO['storeIdMappings'].map(lambda d: len(d)) > 0]
cloud_store_ids = list(set(StoreINFO.cloudStoreId))
LastWeek = datetime.date.today()- timedelta(days=2)
LastWeek =np.datetime64(LastWeek)
and the error reported is:
FileNotFoundError Traceback (most recent call last)
/tmp/ipykernel_165/2970402631.py in <module>
5 storess = internal
6 response_stores = requests.get(storess,auth = HTTPBasicAuth(userInternal, keyInternal))
----> 7 pathlib.Path("stores/request_1.json").write_bytes(response_stores.content)
8
9 filepath = "stores"
/opt/conda/lib/python3.7/pathlib.py in write_bytes(self, data)
1228 # type-check for the buffer interface before truncating the file
1229 view = memoryview(data)
-> 1230 with self.open(mode='wb') as f:
1231 return f.write(view)
1232
/opt/conda/lib/python3.7/pathlib.py in open(self, mode, buffering, encoding, errors, newline)
1206 self._raise_closed()
1207 return io.open(self, mode, buffering, encoding, errors, newline,
-> 1208 opener=self._opener)
1209
1210 def read_bytes(self):
/opt/conda/lib/python3.7/pathlib.py in _opener(self, name, flags, mode)
1061 def _opener(self, name, flags, mode=0o666):
1062 # A stub for the opener argument to built-in open()
-> 1063 return self._accessor.open(self, flags, mode)
1064
1065 def _raw_open(self, flags, mode=0o777):
FileNotFoundError: [Errno 2] No such file or directory: 'stores/request_1.json'
I would gladly find another way to do this, for instance by using GCS buckets, but my issue is the existence of sub-folders. There are many stores and I do not wish to do this operation manually because some retailers for which I am doing this have over 1000 stores. My python code generates all these folders and as I understand it, this is not feasible in GCS.
How can I solve this issue?
GCS uses a flat namespace, so folders don't actually exist, but can be simulated as given in this documentation.For your requirement, you can either use absolute path (starting with "/" -- not relative) or create the "stores" directory (with "mkdir"). For more information you can check this blog.

Rails 5.2 Rest API + Active Storage - Upload file blob received from an external service

We are receiving a POST call from an external service, which contains the file blob (in Base64 encoding), and some other parameters.
# POST call to /document/:id/document_data
param = {
file: <base64 encoded file blob>
}
We would want to process the file and upload it to the following model
# MODELS
# document.rb
class Document < ApplicationRecord
has_one_attached :file
end
In the Controller method handling the POST call
# documents_controller.rb - this method handles POST calls on /document/:id/document_data
def document_data
# Process the file, decode the base64 encoded file
#decoded_file = Base64.decode64(params["file"])
#filename = "document_data.pdf" # this will be used to create a tmpfile and also, while setting the filename to attachment
#tmp_file = Tempfile.new(#filename) # When a Tempfile object is garbage collected, or when the Ruby interpreter exits, its associated temporary file is automatically deleted.
#tmp_file.binmode # This helps writing the file in binary mode.
#tmp_file.write #decoded_file
#tmp_file.rewind()
# We create a new model instance
#document = Document.new
#document.file.attach(io: #tmp_file, filename: #filename) # attach the created in-memory file, using the filename defined above
#document.save
#tmp_file.unlink # deletes the temp file
end
Hope this helps.
More about Tempfile can be found here.

MafftCommandline and io.StringIO

I've been trying to use the Mafft alignment tool from Bio.Align.Applications. Currently, I've had success writing my sequence information out to temporary text files that are then read by MafftCommandline(). However, I'd like to avoid redundant steps as much as possible, so I've been trying to write to a memory file instead using io.StringIO(). This is where I've been having problems. I can't get MafftCommandline() to read internal files made by io.StringIO(). I've confirmed that the internal files are compatible with functions such as AlignIO.read(). The following is my test code:
from Bio.Align.Applications import MafftCommandline
from Bio import SeqIO
from Bio.Seq import Seq
from Bio.SeqRecord import SeqRecord
import io
from Bio import AlignIO
sequences1 = ["AGGGGC",
"AGGGC",
"AGGGGGC",
"AGGAGC",
"AGGGGG"]
longest_length = max(len(s) for s in sequences1)
padded_sequences = [s.ljust(longest_length, '-') for s in sequences1] #padded sequences used to test compatibilty with AlignIO
ioSeq = ''
for items in padded_sequences:
ioSeq += '>unknown\n'
ioSeq += items + '\n'
newC = io.StringIO(ioSeq)
cLoc = str(newC).strip()
cLocEdit = cLoc[:len(cLoc)] #create string to remove < and >
test1Handle = AlignIO.read(newC, "fasta")
#test1HandleString = AlignIO.read(cLocEdit, "fasta") #fails to interpret cLocEdit string
records = (SeqRecord(Seq(s)) for s in padded_sequences)
SeqIO.write(records, "msa_example.fasta", "fasta")
test1Handle1 = AlignIO.read("msa_example.fasta", "fasta") #alignIO same for both #demonstrates working AlignIO
in_file = '.../msa_example.fasta'
mafft_exe = '/usr/local/bin/mafft'
mafft_cline = MafftCommandline(mafft_exe, input=in_file) #have to change file path
mafft_cline1 = MafftCommandline(mafft_exe, input=cLocEdit) #fails to read string (same as AlignIO)
mafft_cline2 = MafftCommandline(mafft_exe, input=newC)
stdout, stderr = mafft_cline()
print(stdout) #corresponds to MafftCommandline with input file
stdout1, stderr1 = mafft_cline1()
print(stdout1) #corresponds to MafftCommandline with internal file
I get the following error messages:
ApplicationError: Non-zero return code 2 from '/usr/local/bin/mafft <_io.StringIO object at 0x10f439798>', message "/bin/sh: -c: line 0: syntax error near unexpected token `newline'"
I believe this results due to the arrows ('<' and '>') present in the file path.
ApplicationError: Non-zero return code 1 from '/usr/local/bin/mafft "_io.StringIO object at 0x10f439af8"', message '/usr/local/bin/mafft: Cannot open _io.StringIO object at 0x10f439af8.'
Attempting to remove the arrows by converting the file path to a string and indexing resulted in the above error.
Ultimately my goal is to reduce computation time. I hope to accomplish this by calling internal memory instead of writing out to a separate text file. Any advice or feedback regarding my goal is much appreciated. Thanks in advance.
I can't get MafftCommandline() to read internal files made by
io.StringIO().
This is not surprising for a couple of reasons:
As you're aware, Biopython doesn't implement Mafft, it simply
provides a convenient interface to setup a call to mafft in
/usr/local/bin. The mafft executable runs as a separate process
that does not have access to your Python program's internal memory,
including your StringIO file.
The mafft program only works with an input file, it doesn't even
allow stdin as a data source. (Though it does allow stdout as a
data sink.) So ultimately, there must be a file in the file system
for mafft to open. Thus the need for your temporary file.
Perhaps tempfile.NamedTemporaryFile() or tempfile.mkstemp() might be a reasonable compromise.

Golang excel file reading

I'm using tealeg xlsx library to read an excel file https://github.com/tealeg/xlsx . They have documentation here https://godoc.org/github.com/tealeg/ . It works perfectly fine if I call the OpenFile() by local directory, but I wanted to use an http.Request.FormFile() return object which is of type multipart.Form. How do I use this file to be read by the tealeg package?
Tealeg's OpenReaderAt() looks like something I should use, but the multipart. Form object returned from http.Request.FormFile() returns a file interface but I'm not sure how to access the readerAt object? https://golang.org/pkg/mime/multipart/#File
func OpenReaderAt(r io.ReaderAt, size int64) (*File, error)
xlsx.OpenReaderAt takes in an io.ReaderAt interface and multipart.File also implements io.ReaderAt.
So you can directly pass it to xlsx.OpenReaderAt
var (
file multipart.File
size int64
err error
)
file, _,err = req.FormFile("key")
// size = // Calculate size
xlsx.OpenReaderAt(file,size)

JAudioTagger Deleting First Few Seconds of Track

I've written a simple Groovy script (below) to set the values of four of the ID3v1 and ID3v2 tag fields in mp3 files using the JAudioTagger library. The script successfully makes the changes but it also deletes the first 5 to 10 seconds of some of the files, other files are unaffected. It's not a big problem, but if anyone knows a simple fix, I would be grateful. All the files are from the same source, all have v1 and v2 tags, I can find no obvious difference in the source files to explain it.
import org.jaudiotagger.*
java.util.logging.Logger.getLogger("org.jaudiotagger").setLevel(java.util.logging.Level.OFF)
Integer trackNum = 0
Integer totalFiles = 0
Integer invalidFiles = 0
validMP3File = true
def dir = new File(/D:\Users\Jeremy\Music\Speech Radio\Unlistened\Z Temp Files to MP3 Tagged/)
dir.eachFile({curFile ->
totalFiles ++
try {
mp3File = org.jaudiotagger.audio.AudioFileIO.read(curFile)
} catch (org.jaudiotagger.audio.exceptions.CannotReadException e) {
validMP3File = false
invalidFiles ++
}
// Get the file name excluding the extension
baseFilename = org.jaudiotagger.audio.AudioFile.getBaseFilename(curFile)
// Check that it is an MP3 file
if (validMP3File) {
if (mp3File.getAudioHeader().getEncodingType() != 'mp3') {
validMP3File = false
invalidFiles ++
}
}
if (validMP3File) {
trackNum ++
if (mp3File.hasID3v1Tag()) {
curTagv1 = mp3File.getID3v1Tag()
} else {
curTagv1 = new org.jaudiotagger.tag.id3.ID3v1Tag()
}
if (mp3File.hasID3v2Tag()) {
curTagv2 = mp3File.getID3v2TagAsv24()
} else {
curTagv2 = new org.jaudiotagger.tag.id3.ID3v23Tag()
}
curTagv1.setField(org.jaudiotagger.tag.FieldKey.TITLE, baseFilename)
curTagv2.setField(org.jaudiotagger.tag.FieldKey.TITLE, baseFilename)
curTagv1.setField(org.jaudiotagger.tag.FieldKey.ARTIST, "BBC Radio")
curTagv2.setField(org.jaudiotagger.tag.FieldKey.ARTIST, "BBC Radio")
curTagv1.setField(org.jaudiotagger.tag.FieldKey.ALBUM, "BBC Radio - 20130205")
curTagv2.setField(org.jaudiotagger.tag.FieldKey.ALBUM, "BBC Radio - 20130205")
curTagv1.setField(org.jaudiotagger.tag.FieldKey.TRACK, trackNum.toString())
curTagv2.setField(org.jaudiotagger.tag.FieldKey.TRACK, trackNum.toString())
mp3File.setID3v1Tag(curTagv1)
mp3File.setID3v2Tag(curTagv2)
mp3File.save()
}
})
println """$trackNum tracks created from $totalFiles files with $invalidFiles invalid files"""
I'm still investigating and it appears that there is no problem with JAudioTagger. Before setting the tags, I use Total Recorder to reduce the quality of the download from 128kbps, 44,100Hz to 56kbps, 22,050Hz. This reduces the file size to less than half and the quality is fine for speech radio.
If I run my script on the original files, none of the audio track is deleted. The deletion of the first part of the audio track only occurs with the files that have been processed by Total Recorder.
Looking at the JAudioTagger logging for these files, there does appear to be a problem with the header:
Checking further because the ID3 Tag ends at 0x23f9 but the mp3 audio doesnt start until 0x7a77
Confirmed audio starts at 0x7a77 whether searching from start or from end of ID3 tag
This check is not performed for files that have not been processed by Total Recorder.
The log of the header read operation also shows (for a 27 minute track):
trackLength:06:52
It looks as though I shall have to find a new MP3 file editor!
Instead of
mp3File.save()
could you try:
mp3File.commit()
No idea if it will help, but that seems to be the documented method?

Resources