Merge some parts of a split tar.gz file Linux command - linux

I have a large tar.gz file (approximately 63 GB) on a linux server. This file has about 1000 compressed csv files. I need to save the data of csv files in a database.
I can't extract whole file in one go due to limited space on the server. So I split the tar.gz file into 5 parts (4 parts of 15 GB and 1 of 3GB) but did not merge all of them as the server won't have any space left when extraction would be done. I merged the first two parts to make a new tar.gz file and extracted the csv files from that.
When I tried to merge the last 3 parts, it did not make a valid tar.gz file and that file could not be extracted. This problem was not because of server space because I deleted the files that were no longer required after extraction from first two parts.
Is there any way through which the last 3 parts of the split tar.gz file can be merged in a valid tar.gz format and then extracted?
Command used to split :
split -b 15G file.tar.gz parts
Command used to merge :
cat parts* > combined.tar.gz
Command used to extract :
tar zxvf file.tar.gz -C folderwhereextracted

You can use short shell script:
#/bin/sh
path='./path'
list="$path/*.tar.gz"
for file in `ls ./da/*.tar.gz.*`
do
let i++
if [[ -f $(find $path/*.tar.gz.$i) ]]
then
echo "file $path/*.tar.gz.$i found."
list="$list $path/*.tar.gz.$i"
else
echo "file $path/*.tar.gz.$i not found!"
fi
done
cat $list > full.tar.gz
tar zxvf ./full.tar.gz -C $path
# rm -rf $list
Put your path to variable with the same name.
Uncomment last line to remove source files after untar.

Related

How to catch files with same names but different extensions in Linux and delete them

OS - Ubuntu 16.04
My goal is simple. There are multiple images in a folder say
A.jpg
A.jpeg
B.jpg
B.jpeg
C.jpg
I want the names of the jpg files who have a .jpeg copy. So in this case I want the names A.jpg and B.jpg. Note that I do not want C.jpg because it does not have a jpeg copy.
Next, I want to delete the .jpg files whose names I have. So in this case I want to delete A.jpg and B.jpg ONLY.
Is there a set of commands to do it ?
(I indented the file names because for some reason stackoverflow was telling me it is code.)
EDIT
Is there someway I could put those filenames in a txt file and then read from it to delete the required files?
You can try:
for file in *.jpeg; do echo ${file%.*}.jpg; done
If this is what you want, just replace 'echo' by 'rm'
(Edited) This way you won't see errors for non-existing files:
for file in *.jpeg; do [[ -f ${file%.*}.jpg ]] && rm ${file%.*}.jpg; done
This may be:
#!/bin/sh
JPG_LIST="*.jpg"
#JPG_LIST=$(cat JPG_filenames_to_test.txt)
for filename in $JPG_LIST; do
BASENAME="${filename%.*}"
echo -n "Basename '$BASENAME': "
if [ -e $BASENAME.jpg -a -e $BASENAME.jpeg ] ; then
echo "$BASENAME.jpeg found. Removing $BASENAME.jpg"
rm $BASENAME.jpg
else
echo "No $BASENAME.jpeg file found. Keeping original $BASENAME.jpg"
fi
done

Create directories and download files by reading input from a file

cat paste_output.txt | while read -r file_name path_name file;
do mkdir -p -- "$path_name";
wget "$file_name";
mv "$file" "$path_name";
done;
Hi! I have this piece of code that reads field by field from the file specified. What I am trying to do here is I am creating a directory that is specified in second field and then I am downloading file specified in first field and then after having that file downloaded I am that file in the directory specified in second field.
Output: I am getting the desired directory structure and files downloaded however files are downloading in the directory I am executing the commands from.
How to move files in the desired directories?
You can use the -P flag of wget to put the file in the target directory.
If the directory doesn't exist, it will create it,
so this also let's you save on the mkdir.
while read -r file_name path_name file; do
wget -P "$path_name" "$file_name"
done < paste_output.txt
I made some other improvements to the script:
The cat is useless, input redirection is better
The semicolons at end of lines are unnecessary
It's good to indent the body of loops, for readability

Linux Shell Script to unzip and split file outputs unreadable files

I have a zip folder which contains multiple files of same format. Each file with around 50 mb size. I need to split each file into multiple chuncks (say 1000 lines per spllited output file).
I have written a shell script which which unzips the folder and saves the split files output in a directory.
The problem is that the output chunks are in unreadable format containing symbols and random characters.
When I do it for each file individually, it outputs perfect txt split files. But it is not happening for whole zip folder.
Anyone knows how to can I get those files in txt format.
Here is my script.
for z in input.zip ; do
if unzip -p "$z" | split -l 1000 $z output_dir ; then
echo "$z"
fi
done
Problem
You need to unzip the files first. Otherwise, you are just chunking the original binary ZIP file.
Solution
The following is untested because I don't have your source file. However, it should work for you with a little tweaking.
unzip -d /tmp/unzipped input.zip
mkdir /tmp/split_files
for file in /tmp/unzipped/*txt do;
split -l 1000 "$file" "/tmp/split_files/$(basename "$file" .txt)"
done

Copy text from multiple files, same names to different path in bash (linux)

I need help copying content from various files to others (same name and format, different path).
For example, $HOME/initial/baby.desktop has text which I need to write into $HOME/scripts/baby.desktop. This is very simple for a single file, but I have 2500 files in $HOME/initial/ and the same number in $HOME/scripts/ with corresponding names (same names and format). I want append (copy) the content of file in path A to path B (which have the same name and format), to the end of file in path B without erase the content of file in path B.
Example content of $HOME/initial/*.desktop to final $HOME/scripts/*.desktop. I tried the following, but it don't work:
cd $HOME/initial/
for i in $( ls *.desktop ); do egrep "Icon" $i >> $HOME/scripts/$i; done
Firstly, I would backup $HOME/initial and $HOME/scripts, because there is lots of scope for people misunderstanding your question. Like this:
cd $HOME
tar -cvf initial.tar initial
tar -cvf scripts.tar scripts
That will put all the files in $HOME/initial into a single tarfile called initial.tar and all the files in $HOME/scripts into a single tarfile called scripts.tar.
Now for your question... in general, if you want to put the contents of FileB onto the end of FileA, the command is
cat FileB >> FileA
Note the DOUBLE ">>" which means "append" rather than single ">" which means overwrite.
So, I think you want to do this:
cd $HOME/initial/baby.desktop
cat SomeFile >> $HOME/scripts/baby.desktop/SomeFile
where SomeFile is the name of any file you choose to test with. I would test that has worked and then, if you are happy with that, go ahead and run the same command inside a loop:
cd $HOME/initial/baby.desktop
for SOURCE in *
do
DESTINATION="$HOME/scripts/baby.desktop/$SOURCE"
echo Appending "$SOURCE" to "$DESTINATION"
#cat "$SOURCE" >> "$DESTINATION"
done
When the output looks correct, remove the "#" at the start of the penultimate line and run it again.
I solved it, if some people want learn how to resolve is very simple:
using Sed
I need only the match (or pattern) line "Icon=/usr/share/some_picture.png into $HOME/initial/example.desktop to other with same name and format $HOME/scripts/example.desktop, but I had a lot of .desktop files (2500 files)
cd $HOME/initial
STRING_LINE=`grep -l -R "Icon=" *.desktop`
for i in $STRING_LINE; do sed -ne '/Icon=/ p' $i >> $HOME/scripts/$i ; done
_________
If you need only copy all to other file with same name and format
using cat
cd $HOME/initial
STRING_LINE=`grep -l -R "Icon=" *.desktop`
for i in $STRING_LINE; do cat $i >> $HOME/scripts/$i ; done

How to extract first few lines from a csv file inside a tar file without extracting it in linux?

I have a tar file which has lot of csv files in it.
How to get the first few lines of each csv file without extracting it?
I tried:
$(tar -Oxf $tarfile $file | head -n "$NL") >> cdn.log
But got error saying:
time(http:index: command not found
This is some line in one of the csv files. Similar errors are reported for all csv files...
Any idea??
Using -O you can tell tar to extract a file to standard output instead of to file. So you should be able to first use tar tf <YOUR_FILE> to list the files from archive and filter it using grep to find the CSV files, and then for each file use tar xf <YOUR_FILE> <NAME_OF_CSV> -O | head to get the file's beginning to stdout. This may be a bit ineffective since you unpack the archive as many tiems as there are CSV files, but should work.
You can use perl and its Archive::Tar module. Here a one-liner that extract the first two lines of each one:
perl -MArchive::Tar -E '
for (Archive::Tar->new(shift)->get_files) {
say (join qq|\n|, (split /\n/, $_->get_content, 3)[0..1])
}
' file.tar
It assumes that the tar file only has text files and they are csv. Otherwise you will have to grep the list to filter those you want.

Resources