File much larger after copying with it.bytes - groovy

I wanted to copy a file from one location to another using a Groovy script. I found that the copied files was orders of magnitude larger than the original file after copying.
After some trial and error I found the correct way to copy but am still puzzled as to why it should be bigger.
def existingFile = new File("/x/y/x.zip")
def newFile1 = new File("/x/y/y.zip")
def newFile2 = new File("/x/y/z.zip")
new File(newFile1) << new File(existingFile).bytes
new File(newFile2).bytes = new File(existingFile).bytes
If you run this code, newFile1 will be much larger than existingFile, while newFile2 will be the same size as existingFile.
Note that both zip files are valid afterwards.
Does anyone know why this happens? Am I use the first copy incorrectly? Or is it something odd in my setup?

If the file already exists before this code is called then you'll get different behaviour from << and .bytes = ...
file << byteArray
will append the contents of byteArray to the end of the file, whereas
file.bytes = byteArray
will overwrite the file with the specified content. When the byteArray is ZIP data, both versions will give you a result that is a valid ZIP file, because the ZIP format can cope with arbitrary data prepended to the beginning of the file before the actual ZIP data without invalidating the file (typically this is used for things like self-extracting .exe stubs).

Related

Formation of folder redundantly

I have the following structure. I want to iterate through sub folders (machine, gunshot) and process .wav files and build mfccresult folder in each category and the .csv file in it. I have the following code and the MFCC folder is keep forming in already formed MFCC folder.
parent_dir = 'sound'
for subdirs, dirs, files in os.walk(parent_dir):
resultsDirectory = subdirs + '/MFCC/'
if not os.path.isdir("resultsDirectory"):
os.makedirs(resultsDirectory)
for filename in os.listdir(subdirs):
if filename.endswith('.wav'):
(rate,sig) = wav.read(subdirs + "/" +filename)
mfcc_feat = mfcc(sig,rate)
fbank_feat = logfbank(sig,rate)
outputFile = resultsDirectory + "/" + os.path.splitext(filename)[0] + ".csv"
file = open(outputFile, 'w+')
numpy.savetxt(file, fbank_feat, delimiter=",")
file.close()
What version of python are you using? Not sure if this has changed in the past, but os.walk does not return "subdirs" as the first of the tuple, but the dirpath. See here for python 3.6.
I don't know your absolute path, but seeing as you are passing in the path sound as a relative reference, I assume it is a folder inside the directory where you run your python code. So for example, lets say you are running this file (lets call it mycode.py) from
/home/username/myproject/mycode.py
and you have some subdirectory:
/home/username/myproject/sound/
So:
resultsDirectory = subdirs + '/MFCC/'
as written in your code above would resolve to:
/home/username/myproject/sound/MFCC/
So your first if statement will be entered since this is not an existing directory. Thereby you create a new directory:
/home/username/myproject/sound/MFCC/
From there, you take
filename in os.listdir(subdirs)
This is also appears to be a misunderstanding of the output of this function. os.listdir() will return directories not files. See here for the man on that.
So now you are looping through the directories in:
/home/username/myproject/sound/
Here, I assume you have some of the directories from your diagram already made. So I assume you have:
/home/username/myproject/sound/machine_sound
/home/username/myproject/sound/gun_shot_sound
or something along those lines.
So the if statement will never be entered, since your directory names to not end with '.wav'.
Even if it did enter, you'd still have issues asfilename will actually be equal to machine_sound on the first loop, and gun_shot_sound in the second time through.
Maybe you are using some other wav library, but the python built-in is called wave and you need to call the wave.open() on the file not wav.read(). See here for the docs.
I'm not sure what you were trying to achieve with the call to os.path.splitext(filename)[0], but you can read about it here You will end up with the same thing that went in in this case though, so machine_sound and gun_shot_sound.
Your output file will thus result in:
/home/username/myproject/sound/MFCC/machine_sound.csv
on the first loop, and
/home/username/myproject/sound/MFCC/gun_shot_sound.csv
the second time through.
So in conclusion, I'm not sure what is happening when you say "MFCC folder is keep forming in already formed MFCC folder" but you definitely have a lot of reading ahead of you before you can understand your own code, and have any hope of fixing it to do what you want. Assuming you read through the links I provided, you should be able to do that though. Good luck!
Additionally, you had quite few typos in your code that I edited, include the immensely important whitespace characters. You should clean that up and ensure your code runs before posting it here, then double check that your copy/paste action did not result in any errors. People will be much more willing to help if you clean up your presentation a bit.
for subdir,dirs,files in os.walk(parent_dir):
for folder in next(os.walk(parent_dir))[1]:
resultsDirectory= folder + '/MFCC'
absPath = os.path.join(parent_dir, resultsDirectory)
if not os.path.isdir(absPath):
os.makedirs(absPath)
for filename in os.listdir(subdir):
print('listdir')
if filename.endswith('.wav'):
print("csv file writing")
(rate,sig) = wav.read(subdir + "/" +filename)
mfcc_feat = mfcc(sig,rate)
fbank_feat = logfbank(sig,rate)
print("fbank_feat")
outputFile =subdir + "/MFCC"+"/" + os.path.splitext(filename)[0] + ".csv"
file = open(outputFile, "w+")
numpy.savetxt(file, fbank_feat, delimiter=",")
file.close()
Here the csv file is stored in the subdirectory not in mfcc folder for each category.
I have issue with output path file.

Avoid overwriting of files with "for" loop

I have a list of dataframes (df_cleaned) created from multiple csv files chosen by the user.
My objective is to save each dataframe within the df_cleaned list as a separate csv file locally.
I have the following code done which saves the file with its original title. But I see that it overwrites and manages to save a copy of only the last dataframe.
How can I fix it? According to my very basic knowledge perhaps I could use a break-continue statement in the loop? But I do not know how to implement it correctly.
for i in range(len(df_cleaned)):
outputFile = df_cleaned[i].to_csv(r'C:\...\Data Docs\TrainData\{}.csv'.format(name))
print('Saving of files as csv is complete.')
You can create a different name for each file, as an example in the following I attach the index to name:
for i in range(len(df_cleaned)):
outputFile = df_cleaned[i].to_csv(r'C:\...\Data Docs\TrainData\{0}_{1}.csv'.format(name,i))
print('Saving of files as csv is complete.')
this will create a list of files named <name>_N.csv with N = 0, ..., len(df_cleaned)-1.
A very easy way of solving. Just figured out the answer myself. Posting to help someone else.
fileNames is a list I created at the start of the code to save the
names of the files chosen by the user.
for i in range(len(df_cleaned)):
outputFile = df_cleaned[i].to_csv(r'C:\...\TrainData\{}.csv'.format(fileNames[i]))
print('Saving of files as csv is complete.')
Saves a separate copy for each file in the defined directory.

adm-zip doesn't compress data

I'm trying to use adm-zip to add files from memory to a zip file also in memory. It seems that the zip file is created correctly (the result of saving zipData can be unzipped in Windows), but the compression ratio is always zero.
This is a model of the code that I expected to work but doesn't. As can be seen from the output, "compressedData" is null and "size" and "compressedSize" are the same whatever value is passed as the file content.
var admzip = require("adm-zip")
var zip = new admzip();
zip.addFile("tmp.txt", "aaaaaaaaaaaaaaaaaaaa");
var zipData = zip.toBuffer();
console.log(zip.getEntries()[0].toString());
https://runkit.com/embed/pn5kaiir12b0
How do I get it to compress the files as well as just zipping?
This is an old question but to anyone who is also experiencing this issue, the reason is that the adm-zip does not compress the data until the compressedData field is accessed for the first time.
Quote from the docs
[Buffer] Buffer compressedData When setting compressedData, the LOC
Data Header must also be present at the beginning of the Buffer. If
the compressedData was set for a ZipEntry anf no other change was made
to its properties (comment, extra etc), reading this property will
return the same value. If changes had been made, reading this property
will recompress the data and recreate the entry headers.
If no compressedData was specified, reading this property will
compress the data and create the required headers.
The output of the compressedData Buffer contains the LOC Data Header

nodejs - change contents of zip file without re-generating the whole archive

I'm using node-zip (which uses JSZip under the hood). I need to modify the contents of a zip file, and I'd love to be able to modify it without generating the entire zip again, because it can take a long time for large archives. Here's an example:
var zip = new JSZip()
// Add some files to the zip
zip.file('file1', "file 1 contents\n")
zip.file('file2', "file 2 contents\n")
zip.file('file3', "file 3 contents\n")
// Generate the zip file
buffer = zip.generate()
// Make some changes
zip.file('file1', "changed file contents\n")
// Now I have to generate the entire zip again to get the buffer
buffer = zip.generate()
How can I do something like
updatedBuffer = zip.updateFile(buffer, 'file1', 'changed file contents\n')
Where I get an updated archive buffer but I only have to spend CPU cycles updating the one file
Assuming JSZip v2 here (zip.generate()):
You can get the buffer with asNodeBuffer(), modify it and update the content for your file:
var buffer = zip.file("file1").asNodeBuffer();
// change buffer
zip.file("file1", buffer);
Edit: if you mean editing in place a zip file stored on disk: no, JSZip can't do that.

Matlab: filesystem, string manipulation and figures saving

In the workspace I have many m-files containing data I'd like to plot.
I have to read them all and save their plot without showing the results (I'll see them after all is done).
The last part can be done this way?
f = figure('Visible', 'off');
plot(x,y);
saveas(f,'figure.fig');
but I don't want to load manually each m-file where x and y are stored.
So I need a way to explore the filesystem and run these statements for each file, manipulate their name and save a jpg with the same name of its m-file.
The dir function will return a structure containing info on the Folders and Files in the current directory
>> FileInfo = dir
Then you need to write code to use that info to automatically navigate the directory structure (using cd for instance), and select the files you want to read.
The function what can also be useful if you're wanting to only look for certain file types, e.g. .mat files.
Not surprisingly, similar questions to this have been asked before, for instance see here

Resources