Linux Shell Script to unzip and split file outputs unreadable files - linux

I have a zip folder which contains multiple files of same format. Each file with around 50 mb size. I need to split each file into multiple chuncks (say 1000 lines per spllited output file).
I have written a shell script which which unzips the folder and saves the split files output in a directory.
The problem is that the output chunks are in unreadable format containing symbols and random characters.
When I do it for each file individually, it outputs perfect txt split files. But it is not happening for whole zip folder.
Anyone knows how to can I get those files in txt format.
Here is my script.
for z in input.zip ; do
if unzip -p "$z" | split -l 1000 $z output_dir ; then
echo "$z"
fi
done

Problem
You need to unzip the files first. Otherwise, you are just chunking the original binary ZIP file.
Solution
The following is untested because I don't have your source file. However, it should work for you with a little tweaking.
unzip -d /tmp/unzipped input.zip
mkdir /tmp/split_files
for file in /tmp/unzipped/*txt do;
split -l 1000 "$file" "/tmp/split_files/$(basename "$file" .txt)"
done

Related

renaming *.txt.csv files to .csv in bash script

I have .txt files, which I am taking as $item and then I am changing the encoding with
iconv -f $currentEncoding -t $targetEncoding "$item" -o "$item.tmp"
then I am saving it again to txt file using
mv "$item.tmp" "$item.txt";
next I am trimming a few things in txt file and saving it as a csv file with
tr -d '"' < "$item.txt" > "$item.csv";
but eventually my files are getting stored with extension "*.txt.csv" - I want them to be just .csv - can anyone help me please what I am doing wrong or what could I change. Thanks
Run:
for f in *.txt.csv; do mv $f ${f/.txt./.}; done
If the variable $f contains the string item.txt.csv, the expression ${f/.txt./.} removes .txt from the file name and gives only the string item.csv.
Caution: if one of the filenames contain spaces, the for statement will not work as expected.
From what I understand, your $item already has .txt in the value.
So, if you list your files after each command, you should see intermediate files like
xyz.txt.tmp
xyz.txt.txt
xyz.txt.csv
So, when you set the item variable, just do ${item%%.*} as shown below and it should work as expected.
item=xyz.txt
item="${item%%.*}"
echo $item
xyz

Sort files according to their filetype

After an HD problem and some work, I have a bunch of files with names like "f1234", "f1235", etc.
My goal is to sort this files according to their filetype. For example, I want to move all the PDF files in the "pdfs" directory.
For one file, I can do : "file f1234", and if it's a PDF, I can "mv f1234 pdfs/". But I have thousands of file... Can you help me with a bash or zsh command for sort all the PDF in one pass ? Thanks
The hard part here is reliably turning the output of file into a directory name. I think probably the best candidate for that is the mime-type of the file rather than the human readable output of file. I'd use something like:
mkdir sorted
for f in f*
do
d=$(file -b --mime-type "$f" | tr / -)
mkdir -p "sorted/$d"
mv "$f" "sorted/$d/"
done
Obviously I'd test that out a bit before running it on your files, but something pretty close to that should work.

Merge some parts of a split tar.gz file Linux command

I have a large tar.gz file (approximately 63 GB) on a linux server. This file has about 1000 compressed csv files. I need to save the data of csv files in a database.
I can't extract whole file in one go due to limited space on the server. So I split the tar.gz file into 5 parts (4 parts of 15 GB and 1 of 3GB) but did not merge all of them as the server won't have any space left when extraction would be done. I merged the first two parts to make a new tar.gz file and extracted the csv files from that.
When I tried to merge the last 3 parts, it did not make a valid tar.gz file and that file could not be extracted. This problem was not because of server space because I deleted the files that were no longer required after extraction from first two parts.
Is there any way through which the last 3 parts of the split tar.gz file can be merged in a valid tar.gz format and then extracted?
Command used to split :
split -b 15G file.tar.gz parts
Command used to merge :
cat parts* > combined.tar.gz
Command used to extract :
tar zxvf file.tar.gz -C folderwhereextracted
You can use short shell script:
#/bin/sh
path='./path'
list="$path/*.tar.gz"
for file in `ls ./da/*.tar.gz.*`
do
let i++
if [[ -f $(find $path/*.tar.gz.$i) ]]
then
echo "file $path/*.tar.gz.$i found."
list="$list $path/*.tar.gz.$i"
else
echo "file $path/*.tar.gz.$i not found!"
fi
done
cat $list > full.tar.gz
tar zxvf ./full.tar.gz -C $path
# rm -rf $list
Put your path to variable with the same name.
Uncomment last line to remove source files after untar.

Copy text from multiple files, same names to different path in bash (linux)

I need help copying content from various files to others (same name and format, different path).
For example, $HOME/initial/baby.desktop has text which I need to write into $HOME/scripts/baby.desktop. This is very simple for a single file, but I have 2500 files in $HOME/initial/ and the same number in $HOME/scripts/ with corresponding names (same names and format). I want append (copy) the content of file in path A to path B (which have the same name and format), to the end of file in path B without erase the content of file in path B.
Example content of $HOME/initial/*.desktop to final $HOME/scripts/*.desktop. I tried the following, but it don't work:
cd $HOME/initial/
for i in $( ls *.desktop ); do egrep "Icon" $i >> $HOME/scripts/$i; done
Firstly, I would backup $HOME/initial and $HOME/scripts, because there is lots of scope for people misunderstanding your question. Like this:
cd $HOME
tar -cvf initial.tar initial
tar -cvf scripts.tar scripts
That will put all the files in $HOME/initial into a single tarfile called initial.tar and all the files in $HOME/scripts into a single tarfile called scripts.tar.
Now for your question... in general, if you want to put the contents of FileB onto the end of FileA, the command is
cat FileB >> FileA
Note the DOUBLE ">>" which means "append" rather than single ">" which means overwrite.
So, I think you want to do this:
cd $HOME/initial/baby.desktop
cat SomeFile >> $HOME/scripts/baby.desktop/SomeFile
where SomeFile is the name of any file you choose to test with. I would test that has worked and then, if you are happy with that, go ahead and run the same command inside a loop:
cd $HOME/initial/baby.desktop
for SOURCE in *
do
DESTINATION="$HOME/scripts/baby.desktop/$SOURCE"
echo Appending "$SOURCE" to "$DESTINATION"
#cat "$SOURCE" >> "$DESTINATION"
done
When the output looks correct, remove the "#" at the start of the penultimate line and run it again.
I solved it, if some people want learn how to resolve is very simple:
using Sed
I need only the match (or pattern) line "Icon=/usr/share/some_picture.png into $HOME/initial/example.desktop to other with same name and format $HOME/scripts/example.desktop, but I had a lot of .desktop files (2500 files)
cd $HOME/initial
STRING_LINE=`grep -l -R "Icon=" *.desktop`
for i in $STRING_LINE; do sed -ne '/Icon=/ p' $i >> $HOME/scripts/$i ; done
_________
If you need only copy all to other file with same name and format
using cat
cd $HOME/initial
STRING_LINE=`grep -l -R "Icon=" *.desktop`
for i in $STRING_LINE; do cat $i >> $HOME/scripts/$i ; done

How to check for an exploding zip file in bash?

I have a bash shell script that unzips a zip file, and manipulates the resulting files. Because of the process, I expect all the content I am interested to be within a single folder like so:
file.zip
/file
/contentFolder1
/contentFolder2
stuff1.txt
stuff2.txt
...
I've noticed users on Windows typically don't create a sub folder but instead submit an exploding zip file that looks like:
file.zip
/contentFolder1
/contentFolder2
stuff1.txt
stuff2.txt
...
How can I detect these exploding zips, so that I may handle them accordingly? Is it possible without unzipping the file first?
If you want to check, unzip -l will print the contents of the zip file without extracting them. You'll have to massage the output a bit, though, since it's printing all sorts of additional crud.
Unzip to a directory first, and then remove the extra layer if the zip is not a bomb.
tempdir=`mktemp -d`
unzip -d $tempdir file.zip
if [ $(ls $tempdir | wc -l) = 1 ]; then
mv $tempdir/* .
rmdir $tempdir
else
mv $tempdir file
fi
I wouldn't try to detect it. I'd just force unzip to do what I want. With InfoZip:
$ unzip -j -d unzip-output-dir FileFromUntrustedSource.zip
-j makes it ignore any directory structure within the file, and -d tells it to put files in a particular directory, creating it if necessary.
If there are two files with the same name but in different subdirectories, the above command will make unzip ask if you want to overwrite the first with the second. You can add -o to force it to overwrite without asking, or -f to only overwrite if the second file is newer.

Resources