Pipe files included in tar.gz file to c++ program in bash - linux

I have a C++ program that can be run in the following format (two cases):
./main 10000 file1.txt
OR
./main 10000 file1.txt file2.txt
where file1.txt and file2.txt are huge text files. I have the file.tar.gz that basically may include:
Just one file (file1.txt)
Two files file1.txt and file2.txt
Is there a way in bash to use pipe to read the files from the gz file directly, for both cases, i.e., even if the gz file contains one or two files? I have checked at Pipe multiple files (gz) into C program but I am not too savvy at bash and thus, I have trouble understanding the answers there.

This isn't going to be particularly simple. Your question is really too broad, as it stands. One approach would be:
Determine whether the archive contains one or two files
Set up named pipes (fifos) for each of the files ("mkfifo" command)
Run commands to output the content of the files in the archive to the appropriate fifo, as a background process
Run the primary command, specifying the fifos as the filename arguments
Giving a full rundown of all of this is, I think, beyond what should be the scope of a Stackoverflow question. For (1), you could probably do something like:
FILECOUNT=`tar -vzf (filename.tar.gz) | wc -l`
This lists the files within the archive (tar -vzf) and counts the number of lines of output from that command (wc -l). It's not foolproof but should work if the filenames are simple names like the ones you suggested (file1.txt, file2.txt).
For (2), make either one or two fifos as appropriate:
mkfifo file1-fifo.txt
if [ $FILECOUNT = 2 ]; then
mkfifo file2-fifo.txt
fi
For (3), use tar with -O to extract file contents from the archive, and redirect it to the fifo(s), as a background process:
tar -O -xf (filename.tar.gz) file1.txt > file1-fifo.txt &
if [ $FILECOUNT = 2 ]; then
tar -O -xf (filename.tar.gz) file2.txt > file2-fifo.txt &
fi
And then (4) is just:
SECONDFILE=""
if [ $FILECOUNT = 2 ]; then
SECONDFILE=file2-fifo.txt
fi
./main 1000 file1-fifo.txt $SECONDFILE
Finally, you should delete the fifo nodes:
rm file1-fifo.txt
rm file2-fifo.txt
Note that this will involve extracting the archive contents twice (in parallel), once for each file. There's no way (that I can think of) of getting around this.

Related

pasting many files to a single large file

i have many text files in a directory like 1.txt 2.txt 3.txt 4.txt .......2000.txt and i want to paste them to make a large file.
In this regard i did something like
paste *.txt > largefile.txt
but the above command reads the .txt file randomly, so i need to read the files sequentially and paste as 1.txt 2.txt 3.txt....2000.txt
please suggest a better solution for pasting many files.
Thanks and looking forward to hearing from you.
Sort the file names numerically yourself then.
printf "%s\n" *.txt | sort -n | xargs -d '\n' paste
When dealing with many files, you may hit ulimit -n. On my system ulimit -n is 1024, but this is a soft limit and can be raised with just like ulimit -n 99999.
Without raising the soft limit, go with a temporary file that would accumulate results each "round" of ulimit -n count of files, like:
touch accumulator.txt
... | xargs -d '\n' -n $(($(ulimit -n) - 1)) sh -c '
paste accumulator.txt "$#" > accumulator.txt.sav;
mv accumulator.txt.sav accumulator.txt
' _
cat accumulator.txt
Instead use the wildcard * to enumerate all your files in a directory, if your file names pattern are sequentially ordered, you can manually list all files in order and concatenate to a large file. The output order of * enumeration might look different in different environment, as it not works as you expect.
Below is a simple example
$ for i in `seq 20`;do echo $i > $i.txt;done
# create 20 test files, 1.txt, 2.txt, ..., 20.txt with number 1 to 20 in each file respectively
$ cat {1..20}.txt
# show content of all file in order 1.txt, 2.txt, ..., 20.txt
$ cat {1..20}.txt > 1_20.txt
# concatenate them to a large file named 1_20.txt
In bash or any other shell, glob expansions are done in lexicographical order. When having files numberd, this sadly means that 11.txt < 1.txt < 2.txt. This weird ordering comes from the fact that, lexicographically, 1 < . (<dot>-character (".")).
So here are a couple of ways to operate on your files in order:
rename all your files:
for i in *.txt; do mv "$i" "$(sprintf "%0.5d.txt" ${i%.*}"); done
paste *.txt
use brace-expansion:
Brace expansion is a mechanism that allows for the generation of arbitrary strings. For integers you can use {n..m} to generate all numbers from n to m or {n..m..s} to generate all numbers from n to m in steps of s:
paste {1..2000}.txt
The downside here is that it is possible that a file is missing (eg. 1234.txt). So you can do
shopt -s extglob; paste ?({1..2000}.txt)
The pattern ?(pattern) matches zero or one glob-matches. So this will exclude the missing files but keeps the order.

Sort files according to their filetype

After an HD problem and some work, I have a bunch of files with names like "f1234", "f1235", etc.
My goal is to sort this files according to their filetype. For example, I want to move all the PDF files in the "pdfs" directory.
For one file, I can do : "file f1234", and if it's a PDF, I can "mv f1234 pdfs/". But I have thousands of file... Can you help me with a bash or zsh command for sort all the PDF in one pass ? Thanks
The hard part here is reliably turning the output of file into a directory name. I think probably the best candidate for that is the mime-type of the file rather than the human readable output of file. I'd use something like:
mkdir sorted
for f in f*
do
d=$(file -b --mime-type "$f" | tr / -)
mkdir -p "sorted/$d"
mv "$f" "sorted/$d/"
done
Obviously I'd test that out a bit before running it on your files, but something pretty close to that should work.

Copy text from multiple files, same names to different path in bash (linux)

I need help copying content from various files to others (same name and format, different path).
For example, $HOME/initial/baby.desktop has text which I need to write into $HOME/scripts/baby.desktop. This is very simple for a single file, but I have 2500 files in $HOME/initial/ and the same number in $HOME/scripts/ with corresponding names (same names and format). I want append (copy) the content of file in path A to path B (which have the same name and format), to the end of file in path B without erase the content of file in path B.
Example content of $HOME/initial/*.desktop to final $HOME/scripts/*.desktop. I tried the following, but it don't work:
cd $HOME/initial/
for i in $( ls *.desktop ); do egrep "Icon" $i >> $HOME/scripts/$i; done
Firstly, I would backup $HOME/initial and $HOME/scripts, because there is lots of scope for people misunderstanding your question. Like this:
cd $HOME
tar -cvf initial.tar initial
tar -cvf scripts.tar scripts
That will put all the files in $HOME/initial into a single tarfile called initial.tar and all the files in $HOME/scripts into a single tarfile called scripts.tar.
Now for your question... in general, if you want to put the contents of FileB onto the end of FileA, the command is
cat FileB >> FileA
Note the DOUBLE ">>" which means "append" rather than single ">" which means overwrite.
So, I think you want to do this:
cd $HOME/initial/baby.desktop
cat SomeFile >> $HOME/scripts/baby.desktop/SomeFile
where SomeFile is the name of any file you choose to test with. I would test that has worked and then, if you are happy with that, go ahead and run the same command inside a loop:
cd $HOME/initial/baby.desktop
for SOURCE in *
do
DESTINATION="$HOME/scripts/baby.desktop/$SOURCE"
echo Appending "$SOURCE" to "$DESTINATION"
#cat "$SOURCE" >> "$DESTINATION"
done
When the output looks correct, remove the "#" at the start of the penultimate line and run it again.
I solved it, if some people want learn how to resolve is very simple:
using Sed
I need only the match (or pattern) line "Icon=/usr/share/some_picture.png into $HOME/initial/example.desktop to other with same name and format $HOME/scripts/example.desktop, but I had a lot of .desktop files (2500 files)
cd $HOME/initial
STRING_LINE=`grep -l -R "Icon=" *.desktop`
for i in $STRING_LINE; do sed -ne '/Icon=/ p' $i >> $HOME/scripts/$i ; done
_________
If you need only copy all to other file with same name and format
using cat
cd $HOME/initial
STRING_LINE=`grep -l -R "Icon=" *.desktop`
for i in $STRING_LINE; do cat $i >> $HOME/scripts/$i ; done

Launching program several times

I am using Mac Os. This is command line code to lauch my programm (two parts)
nucmer --mum file1.txt file2.txt
show-snps -Clr -x 2 out.delta > out_file1.snps
First part of the programm creates file out.delta. My file2.txt is always the same, but I want to launch this both parts 35000 times whith different file1.txt. All the file1s are located in the same directory.
Is it possible to do it using BASH?
Keep all the input files in a directory. Create a wrapper script to invoke nucmer script and then show-snps script. Your wrapper script will accept path to file directory as input. Iterate over all files in the directory and call your two scripts.
You could do something along these lines:
find . -maxdepth 1 -type f -print | grep -v './out_' | while read f
do
b=$(basename ${f})
nucmer --mum ${f} file2.txt
show-snps -Clr -x 2 out.delta > out_${b}.snps
done
The find bit finds all files in the current directory. grep filters out any previous output files, in case you've run some previously. The basename line strips off the leading ./ and trailing extension, and then your two programs get run with the input file name and an output filename based on the basename output.
If you don't get an argument list too long error, you could just use for:
for f in file*.txt; do nucmer --mum $f second.txt; show-snps -Clr -x 2 out.delta > out_${f%.txt}.snps; done

How to check for an exploding zip file in bash?

I have a bash shell script that unzips a zip file, and manipulates the resulting files. Because of the process, I expect all the content I am interested to be within a single folder like so:
file.zip
/file
/contentFolder1
/contentFolder2
stuff1.txt
stuff2.txt
...
I've noticed users on Windows typically don't create a sub folder but instead submit an exploding zip file that looks like:
file.zip
/contentFolder1
/contentFolder2
stuff1.txt
stuff2.txt
...
How can I detect these exploding zips, so that I may handle them accordingly? Is it possible without unzipping the file first?
If you want to check, unzip -l will print the contents of the zip file without extracting them. You'll have to massage the output a bit, though, since it's printing all sorts of additional crud.
Unzip to a directory first, and then remove the extra layer if the zip is not a bomb.
tempdir=`mktemp -d`
unzip -d $tempdir file.zip
if [ $(ls $tempdir | wc -l) = 1 ]; then
mv $tempdir/* .
rmdir $tempdir
else
mv $tempdir file
fi
I wouldn't try to detect it. I'd just force unzip to do what I want. With InfoZip:
$ unzip -j -d unzip-output-dir FileFromUntrustedSource.zip
-j makes it ignore any directory structure within the file, and -d tells it to put files in a particular directory, creating it if necessary.
If there are two files with the same name but in different subdirectories, the above command will make unzip ask if you want to overwrite the first with the second. You can add -o to force it to overwrite without asking, or -f to only overwrite if the second file is newer.

Resources