Why is my attempt to manipulate this 1GB file in Node.js deleting its contents? - node.js

I'm simply trying to overwrite the contents of a pre-generated (written with allocUnsafe(size)) 1GB file via a 4 byte buffer at an iterating offset, and before I open the file descriptor, fs.stat and the Windows file system show the correct size. As soon as I open the file descriptor, it appears both in fs.stat and in the file system the file is empty:
let stats = fs.statSync(dataPath)
let fileSizeInBytes = stats["size"]
let fileSizeInMegabytes = fileSizeInBytes / 1000000
console.log("fileSizeInMegabytes", fileSizeInMegabytes) // => fileSizeInMegabytes 1000
fd = fs.openSync(dataPath, 'w')
stats = fs.statSync(dataPath)
fileSizeInBytes = stats["size"]
fileSizeInMegabytes = fileSizeInBytes / 1000000
console.log("fileSizeInMegabytes", fileSizeInMegabytes) // => fileSizeInMegabytes 0
Why is opening the file descriptor emptying my file? Surely I'm missing something obvious, but I can't see it.

Opening the file using the w flag truncates the file, i.e. removes any contents.
You should use r+ to read and write to the file without wiping it clean.
For more info, check out the Node docs and the answers on this question.

Related

log line number in vim whenever a line is deleted

I have an application that generates a txt file with thousands of lines. I have to delete some lines manually by going through the file (using vim). However, I might need to generate the same file again if a change in format is required. That will make me go through the file again to delete the same lines.
The solution to avoid deleting manually repeatedly is that vim somehow logs the line number when I delete a line. I can then use some script to remove those lines. Is it possible to get this behavior in vim?
Otherwise, is there any other editor to get this behavior? There are many lines I have to delete and it's not feasible for me to log each line number manually.
As suggested by phd and wxz, I was able to use git-diff of the file to extract the deleted lines by using node package gitdiff-parser for parsing the diff.
const gitDiffParser = require('gitdiff-parser')
const { exec } = require("child_process");
let p = new Promise( (res,rej) => {
exec("git diff -U0 file.txt", (error, stdout) => {
res(stdout)
});
});
p.then(s=>{
diff = gitDiffParser.parse(s);
diff[0].hunks.forEach(element => {
console.log(`start: ${element.oldStart}, end: ${element.oldStart + element.oldLines - 1}`)
});
})
Another solution or say hack was to append line number in each line of the file and extract the undeleted line numbers after removing the required lines.

Django file object always 0 bytes when uploaded from python requests

I have been trying to upload a file to django REST using python requests.
I put the file, and some other data, to the server.
r = self.session.put(
f"{hello_url}/shadow_pbem/savefile_api/",
files=test_files,
data={"hash": test_file_hash, 'leader': 78},
headers=good_token_header,
)
I get a 200 response, the model saves all the data correctly as expected, including a correctly named save file in /media, except the save file in /media is always 0 bytes.
This is how I create the file object...
with open(testfile_path, "rb") as testfile:
...and verify the length, which is not 0.
testfile.seek(0, os.SEEK_END)
filesize = testfile.tell()
I create the files object for upload...
test_files = {
"file": ("testfile.zip", testfile, "application/zip")
}
I put some code in the view to verify, and the file object in the view is there, but it is 0 bytes.
here is the relevent part of the view. It seems to work fine, but all files are 0 bytes.
class SaveFileUploadView(APIView):
parser_class = (FileUploadParser,)
def put(self, request):
if "file" not in request.data:
raise ParseError("Empty content")
f = request.data["file"]
print(f"file {f} size:{f.size}")
# prints file testfile.zip size:0
# rest of view works fine...
I have tried with various files and formats, also using post. Files are always 0 bytes.
Any help appreciated I am going crazy....
If you do
testfile.seek(0, os.SEEK_END)
filesize = testfile.tell()
as you say,
you'll need to also rewind back to the start – otherwise there is indeed zero bytes for Requests to read anymore.
testfile.seek(0)

Python 3.73 inserting into bytearray = "object cannot be re-sized"

I'm working with a bytearray from file data. I'm opening it as 'r+b', so can change as binary.
In the Python 3.7 docs, it explains that a RegEx's finditer() can use m.start() and m.end() to identify the start and end of a match.
In the question Insert bytearray into bytearray Python, the answer says an insert can be made to a bytearray by using slicing. But when this is attempted, the following error is given: BufferError: Existing exports of data: object cannot be re-sized.
Here is an example:
pat = re.compile(rb'0.?\d* [nN]') # regex, binary "0[.*] n"
with open(file, mode='r+b') as f: # updateable, binary
d = bytearray(f.read()) # read file data as d [as bytes]
it = pat.finditer(d) # find pattern in data as iterable
for match in it: # for each match,
m = match.group() # bytes of the match string to binary m
...
val = b'0123456789 n'
...
d[match.start():match.end()] = bytearray(val)
In the file, the match is 0 n and I'm attempting to replace it with 0123456789 n so would be inserting 9 bytes. The file can be changed successfully with this code, just not increased in size. What am I doing wrong? Here is output showing all non-increasing-filesize operations working, but it failing on inserting digits:
*** Changing b'0.0032 n' to b'0.0640 n'
len(d): 10435, match.start(): 607, match.end(): 615, len(bytearray(val)): 8
*** Found: "0.0126 n"; set to [0.252] or custom:
*** Changing b'0.0126 n' to b'0.2520 n'
len(d): 10435, match.start(): 758, match.end(): 766, len(bytearray(val)): 8
*** Found: "0 n"; set to [0.1] or custom:
*** Changing b'0 n' to b'0.1 n'
len(d): 10435, match.start(): 806, match.end(): 809, len(bytearray(val)): 5
Traceback (most recent call last):
File "fixV1.py", line 190, in <module>
main(sys.argv)
File "fixV1.py", line 136, in main
nchanges += search(midfile) # perform search, returning count
File "fixV1.py", line 71, in search
d[match.start():match.end()] = bytearray(val)
BufferError: Existing exports of data: object cannot be re-sized
This is a simple case, much like modifying an iterable during iteration:
it = pat.finditer(d) creates a buffer from the bytearray object. This in turn "locks" the bytearray object from being changed in size.
d[match.start():match.end()] = bytearray(val) attempts to modify the size on the "locked" bytearray object.
Just like attempting to change a list's size while iterating over it will fail, an attempt to change a bytearray size while iterating over it's buffer will also fail.
You can give a copy of the object to finditer().
For more information about buffers and how Python works under the hood, see the Python docs.
Also, do keep in mind, you're not actually modifying the file. You'll nee to either write the data back to the file, or use memory mapped files. I suggest the latter if you're looking for efficiency.

MafftCommandline and io.StringIO

I've been trying to use the Mafft alignment tool from Bio.Align.Applications. Currently, I've had success writing my sequence information out to temporary text files that are then read by MafftCommandline(). However, I'd like to avoid redundant steps as much as possible, so I've been trying to write to a memory file instead using io.StringIO(). This is where I've been having problems. I can't get MafftCommandline() to read internal files made by io.StringIO(). I've confirmed that the internal files are compatible with functions such as AlignIO.read(). The following is my test code:
from Bio.Align.Applications import MafftCommandline
from Bio import SeqIO
from Bio.Seq import Seq
from Bio.SeqRecord import SeqRecord
import io
from Bio import AlignIO
sequences1 = ["AGGGGC",
"AGGGC",
"AGGGGGC",
"AGGAGC",
"AGGGGG"]
longest_length = max(len(s) for s in sequences1)
padded_sequences = [s.ljust(longest_length, '-') for s in sequences1] #padded sequences used to test compatibilty with AlignIO
ioSeq = ''
for items in padded_sequences:
ioSeq += '>unknown\n'
ioSeq += items + '\n'
newC = io.StringIO(ioSeq)
cLoc = str(newC).strip()
cLocEdit = cLoc[:len(cLoc)] #create string to remove < and >
test1Handle = AlignIO.read(newC, "fasta")
#test1HandleString = AlignIO.read(cLocEdit, "fasta") #fails to interpret cLocEdit string
records = (SeqRecord(Seq(s)) for s in padded_sequences)
SeqIO.write(records, "msa_example.fasta", "fasta")
test1Handle1 = AlignIO.read("msa_example.fasta", "fasta") #alignIO same for both #demonstrates working AlignIO
in_file = '.../msa_example.fasta'
mafft_exe = '/usr/local/bin/mafft'
mafft_cline = MafftCommandline(mafft_exe, input=in_file) #have to change file path
mafft_cline1 = MafftCommandline(mafft_exe, input=cLocEdit) #fails to read string (same as AlignIO)
mafft_cline2 = MafftCommandline(mafft_exe, input=newC)
stdout, stderr = mafft_cline()
print(stdout) #corresponds to MafftCommandline with input file
stdout1, stderr1 = mafft_cline1()
print(stdout1) #corresponds to MafftCommandline with internal file
I get the following error messages:
ApplicationError: Non-zero return code 2 from '/usr/local/bin/mafft <_io.StringIO object at 0x10f439798>', message "/bin/sh: -c: line 0: syntax error near unexpected token `newline'"
I believe this results due to the arrows ('<' and '>') present in the file path.
ApplicationError: Non-zero return code 1 from '/usr/local/bin/mafft "_io.StringIO object at 0x10f439af8"', message '/usr/local/bin/mafft: Cannot open _io.StringIO object at 0x10f439af8.'
Attempting to remove the arrows by converting the file path to a string and indexing resulted in the above error.
Ultimately my goal is to reduce computation time. I hope to accomplish this by calling internal memory instead of writing out to a separate text file. Any advice or feedback regarding my goal is much appreciated. Thanks in advance.
I can't get MafftCommandline() to read internal files made by
io.StringIO().
This is not surprising for a couple of reasons:
As you're aware, Biopython doesn't implement Mafft, it simply
provides a convenient interface to setup a call to mafft in
/usr/local/bin. The mafft executable runs as a separate process
that does not have access to your Python program's internal memory,
including your StringIO file.
The mafft program only works with an input file, it doesn't even
allow stdin as a data source. (Though it does allow stdout as a
data sink.) So ultimately, there must be a file in the file system
for mafft to open. Thus the need for your temporary file.
Perhaps tempfile.NamedTemporaryFile() or tempfile.mkstemp() might be a reasonable compromise.

Node.js: How to know real filename which matches an internal filename in different case on Windows

My Node.js program wants to read the contents of the file "test.txt" on a Windows machine. It checks with fs.existsSync() that the file exists and reads its content. But now I want the program instead to give an error or warning if the name of the file on disk is actually "TEST.txt" or any other name which differs in case from the name my program is looking for, e.g. "test.txt".
Is there a straightforward way to figure out that even though existsSync() tells me a file exists, the file on disk has a name which differs in case from the file-name I am using to look for it?
You can use fs.readdir to get a list of all files in directory and then compare the filename to see if matches as is.
var fs = require('fs');
var path = __dirname;
var filename = 'test.txt';
var files = fs.readdirSync(path);
var exists = files.includes(filename);
// true if file on disk is "test.txt",
// false if file on disk is "TEST.txt"
console.log(exists);

Resources