Is it better to combine existing zip files vs creating new ones?

Is it better to combine existing zip files vs creating new ones? - zip

I have an app server which has bunch of files and I may need dynamically combine a subset of them during run-time and send them in a single HTTP response as a zipped file. Let's pretend these files are A and B. I can either pre-store them as A.zip and B.zip and combine them into C.zip which has A and B inside them or just store them A and B and zip them up into C.zip. Which one is faster?

They are both a bad idea. Making C.zip with two files inside, A.zip and B.zip, would be trying to compress largely incompressible files, and would require three decompression steps to extract instead of one. (You can avoid wasting time trying to compress with appropriate options to zip.) Extracting A and B and zipping up a new C throws away all the compression effort that went into making A and B, and repeats all that while making C.
Instead you want to merge the two zip files, assuming that there are no colliding filename/paths therein. You can use zipmerge to combine two zip files.
Update:
Funny, I just remembered that I wrote one of these about a year ago. It is called zipknit.

Related

automatically gathering multiple variables with same name into an array

I have a python program which reads a word and its meanings (one or more) from a json file and creates audio for each meaning separately.
As a result there are dynamically generated multiple audio files such as M-1.mp3, M-2.mp3, ... M-n.mp3
later on i want to concatenate all the audio files into one audio clip with a function which requires the names of each audio files listed one by one.
Is there a way i can pass the filenames as a list to the function provided that I know the number of audio files that I want to concatenate.
I want something like this:
One_Audio=concatenate(listnames("M-",3))
to get the this:
One_Audio=concatenate("M-1.mp3","M-2.mp3","M-3.mp3")
effect.

How to merge two checked out branches of Subversion ( Offline Merge )?

I am using Subversion with rabbit-vcs on Linux:
Under merge it shows only the option to browse my branches on online svn url
There is no option to give a offline svn folder as branch.
Since, I am pretty new to Subversion, so is it actually possible to merge 2 branches offline on svn ?
I have two branches already checked-out :
/home/user/branch1
/home/user/trunk

First of all, read this. Better yet, read this as well. Arguably, understanding merging is the most important part of knowing how to use SVN correctly (for one, you'll think thousands of times before creating a new branch :) ).
Note that you merge two committed sources into a working copy. That is, even if you specify one of the sources as a working copy it will still take its URL for merge purposes. So this is sort of syntactic sugar that a client may or may not support. The reason for it is that the merge operation needs to identify the common ancestor of the sources and merge them change by change. That information is not present in a working copy.
Note a source for some possible confusion here: in many (most?) instances the working copy argument may specify both a source to be merged and the working copy to merge into).
Here's an example of what I mean: suppose you merge S1 and S2 into W. S1 and W contain file F. S2 does not. Now, there are at least two possibilities: (1) the common ancestor S of S1 and S2 contained the file and it was deleted in S2. Then merge should delete it from W; (2) S did not contain F and it was added in S1. Then F should remain in W. The information about S in simply not present locally, so the repository has to be contacted.
To find out exactly branch URLs your offline working copies come from run svn info in branch1 and trunk.

Large amount of dataURIs compared to images

I'm trying to compare (for performance) the use of either dataURIs compared to a large number of images. What I've done is setup two tests:
Regular Images (WPT)
Base64 (WPT)
Both pages are exactly the same, other than "how" these images/resources are being offered. I've ran a WebPageTest against each (noted above - WPT) and it looks the average load time for base64 is a lot faster -- but the cached view of regular view is faster. I've implemented HTML5 Boilerplate's .htaccess to make sure resources are properly gzipped, but as you can see I'm getting an F for base64 for not caching static resources (which I'm not sure if this is right or not). What I'm ultimately trying to figure out here is which is the better way to go (assuming let's say there'd be that many resources on a single page, for arguments sake). Some things I know:
The GET request for base64 is big
There's 1 resource for base64 compared to 300 some-odd for the regular (which is the bigger downer here... GET request or number of resources)? The thing to remember about the regular one is that there are only so many resources that can be loaded in parallel due to restrictions -- and for base64 - you're really only waiting until the HTML can be read - so nothing is technically loaded than the page itself.
Really appreciate any help - thanks!

For comparison I think you need to run a test with the images sharded across multiple hostnames.
Another option would be to sprite the images into logical sets.
If you're going to go down the BASE64 route, then perhaps you need to find a way to cache them on the client.
If these are the images you're planning on using then there's plenty of room for optimisation in them, for example: http://yhmags.com/profile-test/img_scaled15/interior-flooring.jpg
I converted this to a PNG and ran it through ImageOptim and it came out as 802 bytes (vs 1.7KB for the JPG)
I'd optimise the images and then re-run the tests, including one with multiple hostnames.

How to quickly infer start/end time of files that only show start time?

I have a huge list of video files from a webcam that have that look like this:
video_123
video_456
video_789
...
Where each number (123, 456, and 789) represents the start time of the file in seconds since epoch. The files are created based on file size and are not always the same duration. There may also be gaps in the files (eg camera goes down for an hour). It is a custom file format that I cannot change.
I have a tool that can extract out portions of the video given a time range and a set of files. However, it will run MUCH faster if I only give the tool the files that have frames within the given range. It's very costly to determine the duration of each file. Instead, I'd like to use the start timestamp to rule out most files. For example, if I wanted video for 500-600, I know video_123 will not be needed because video_456 is larger. Also, video_789 is larger than 600 so it will not be needed either.
I could do an ls and iterate through each file, converting the timestamp to an int and comparing until we hit a file bigger than the desired range. I have a LOT of files and this is slow. Is there a faster method? I was thinking of having some sort of binary-tree that could get log2n search time and already have the timestamps parsed out. I am doing most of this work in bash and would prefer to use simple, common tools like grep, awk, etc. However, I will consider Perl or some other large scripting language if there is a compelling reason.

If you do several searchs with the files, you can pre-process the files, in the sense of loading them into a bash array (note, bash, not sh), order them, and then do a binary search. Assume for a second that the name of the file is just the time tag, this will ease the examples (you can always do ${variable/video_/} to remove the prefix.)
First, you can use an array to load all the files sorted:
files=(`echo * | sort -n`)
Then implement the binary search (just a sketch, searching for the pos $min-$max):
nfiles=${#files[*]}
nfiles2=`expr $nfiles / 2`
if test ${files[$nfiles2]} -gt $max
then
nfiles2=`expr $nfiles2 - $nfiles2/2`
else
#check $min, etc.
fi
And so on. Searching several times once you have the files ordered in the array would make faster lookups.

Because of a quirk of UNIX design, there is no way to search for the name of a file in a directory other than stepping through the filenames one by one. So if you keep all your files in one directory, you're not going to get much faster than using ls.
That said, if you're willing to move your files around, you could turn your flat directory into a tree by splitting on the most significant digits. Instead of:
video_12301234
video_12356789
video_12401234
video_13579123
You could have:
12/video_12301234
12/video_12356789
12/video_12401234
13/video_13579123
or even:
12/30/video_12301234
12/35/video_12356789
12/40/video_12401234
13/57/video_13579123
For best results with this method, you'll want to have your files named with leading zeros so the numbers are all the same length.

How to sanitize user created filenames for a networked application?

I'm working on an instant messaging app, where users can receive files from their friends.
The names of the files received are set by the sender of the file, and multiple files can be sent together with the possibility of subdirectories. For example, two files sent together might be "1" and "sub/2" such that the downloaded results should be like "downloads/1" and "downloads/sub/2".
I'm worried about the security implications of this. Right off the top of my head, two potentially dangerous filnames would be something like "../../../somethingNasty" or "~/somethingNasty" for Unix-like users. Other potential issues that cross my mind are filenames with characters that are unsupported on the target filesystem, but that seems much harder and may just be better to ignore?
I'm considering stripping received filenames for ".." and "~" but this type of blacklist approach where I individually think of problem cases hardly seems like the recipe for good security. What's the recommended way to sanitize filenames to ensure nothing sinister happens?
If it makes a difference, my app is running on C++ with the QT framework.

It's wiser to replace ".." with say XXX and ~ with say YYY. This way you convert any invalid path to a perfectly valid path. I.e. if the user wants to upload "../../../somethingNasty" - no problems, let him upload the file and store it in XXX/XXX/XXX/somethingNasty.
Or even better, you can encode all nonalphanumeric characters (except slashes) with %XY where XY is hexidecimal code of the character. This way you would have %2E%2E/%2E%2E/%2E%2E/SomethingNasty

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Is it better to combine existing zip files vs creating new ones? - zip

Related

automatically gathering multiple variables with same name into an array

How to merge two checked out branches of Subversion ( Offline Merge )?

Large amount of dataURIs compared to images

How to quickly infer start/end time of files that only show start time?

How to sanitize user created filenames for a networked application?

Categories

Resources