Determine how to uncompress an archive based on the file command - linux

Im new to bash scripting, and I want to write a script called unpack, something like this:
unpack [-r] [-v] file [file...]
-v - verbose
-r - recursive - will traverse contents of folders recursively, performing unpack on each.
I need to determine what kind of compression was used and perform unpacking for those compression types.
Assuming file names and extensions could have no meaning - the only way to know what method to use is through the file command.
I have 4 unpacking options gunzip, bunzip2, unzip, uncompress
so I wrote a function called execute_unpacking
exectute_unpacking(){
for FILE in "${#}"
do
local FILE_TYPE=$(file "${FILE}")
# How to get the compression type of the file?
case "${FILE_TYPE}" in
*bzip2) bunzip2 ${RECURSIVE} "${FILE}" ;;
*gzip) gunzip ${RECURSIVE} "${FILE}" ;;
*Zip) unzip ${RECURSIVE} ${FILE} ;;
*compress) uncomprees ${RECURSIVE} ${FILE} ;;
?) echo "${FILE} cannot be extarcted" ;;
esac
done
}
So based on the $(file ${FILE}) i need to check for Zip, bzip2, compress, gzip
Is this the correct way to do it? (i don't want to use external tools like dtrx )
EDIT:
For example if I have 4 files:
$(file -i archive) => archive: text/plain; charset=us-ascii
$(file -i archive.bz2) => archive.bz2: application/x-bzip2; charset=binary
$(file -i archvive.gz) =>archive.gz: application/x-gzip; charset=binary
$(file -i archive.cmpr) => archive.cmpr: application/x-compress; charset=binary
So i need to assign to the FILE_TYPE variable 4 options gzip,compress,bzip2,txt and then match those pattern accordingly inside of my case statement

What I would do :
#!/bin/bash
set -v
case $1 in
*.tar) tar xvf "$1";;
*.tgz) tar zxvf "$1";;
*.tar.gz) tar zxvf "$1";;
*.xz) tar xJvf "$1";;
*.gz) gunzip "$1";;
*.zip) unzip "$1";;
*.rar) unrar x "$1";;
*tar.bz2) tar xjvf "$1";;
*.bz2) bzip2 -d "$1";;
*) echo >&2 "unknow $1"
exit 1
;;
esac
Could be enhanced using file -i :
case $(file -i "$1") in
*/x-bzip2*) bzip2 -d "$1";;
*/gzip*) gunzip "$1";;
*/zip*) unzip "$1"";;
*/x-xz*) tar xJvf "$1";;
?) echo "File $1 cannot be extracted";;
esac

With file -i as Gilles Quenot suggested:
for file; do
local file_type=$(file -i "$file")
case "$file_type" in
*application/x-bzip2*) echo "bzip2 file found";;
*application/gzip*) echo "gzip file found";;
*application/zip*) echo "zip file found";;
*application/x-xz*) echo "xz file found";;
*application/x-compress*) echo "compressed file found";;
?) echo "${file} cannot be extarcted";;
esac
done

Related

How to check if file is tar file in Bash shell?

My question is about Bash, Shell. I am writing a script and I have the following problem:
I have a case when user declares that he or she will extract a file into a dir. But I have to test if the existence and if exist a need to check if that file is a *.tar file. I searched for similar like when checking if the file is executable:
if [ -x "file" ]; then
echo "file is executable"
else
echo "file is not executable"
# will this if test work?
case $1
"--extract")
if [ -e $2 ] && [ tar -tzf $2 >/dev/null ]; then
echo "file exists and is tar archive"
else
echo "file either does not exists or it is not .tar arcive"
fi
;;
esac
Code from above doesn't work it is totally ignored. Any ideas?
file command can determine file type:
file my.tar
if it is a tar file it will output:
my.tar: POSIX tar archive (GNU)
Then you can use grep to check the output (whether or not contains tar archive):
file my.tar | grep -q 'tar archive; && echo "I'm tar" || echo "I'm not tar"
In case the file does not exis, file output will be (with exit code 0):
do-not-exist.txt: cannot open `do-not-exist.txt' (No such file or directory).
You could use a case statement to handle several types of files.
I would just see if tar can list the file:
if ! { tar ztf "$file" || tar tf "$file"; } >/dev/null 2>&1; then
echo "$file is not a tar file"
fi
I usually use a construct like this based off of the file command.
gzipped tarballs
$ file somefile1.tar.gz | grep -q 'gzip compressed data' && echo yes || echo no
yes
$ file somefile2.tar.gz | grep -q 'gzip compressed data' && echo yes || echo no
no
tarballs
The above handles gzipped tarball files, for uncompressed change out the string that grep detects:
$ file somefile1.tar | grep -q 'POSIX tar archive' && echo yes || echo no
yes
$ file somefile2.tar | grep -q 'POSIX tar archive' && echo yes || echo no
no
OK, I found the answer. I know that this is not most optimal, however, it works as I intended.
I put case $1 from user into a variable and create another variable equal to *.tar.gz then in if statement I compare var1 (string from user input) with var2 equal to *tar.gz and it works.

Linux/sh: How to list files one by one, compress each (by p7zip without save file on disk) and upload to ftp server (by curl/ncftp)?

Linux/sh: How to list all files one by one in specific folder,
compress each (by p7zip without save file on disk) and
upload to ftp server (by curl/ncftp) with same folder structure?
This script below work perfect but
I don't want to save 7z file on a disk each time. Because I always need to delete them all after uploaded.
I prefer stio from 7zip to curl, how to do that?
#!/bin/sh
FOLDER="/volume3/backup_3/kopia_nas/tmp"
BACKUP_DIR="/volume3/backup_3/kopia_nas/tmp2"
FTP_HOST=""
FTP_USER=""
FTP_PASS=""
FTP_PORT="21"
PASSWORD="abc123"
FTP_FOLDER="/backup2"
#####################################################################
echo "[$(date +'%d-%m-%Y %H:%M:%S')] starting..."
echo ""
/usr/bin/find "${FOLDER}" -type f | while read line; do
# echo "$line" #path+file
# echo "${line##*/}" #file
# echo "${line%/*}" #path
#
/usr/bin/p7zip/7za a "${BACKUP_DIR}${line}.7z" "${line}" -t7z -ms=off -m0=Copy -mhe -mmt -mx0 -p"${PASSWORD}"
curl -s --disable-epsv -v -T "${BACKUP_DIR}${line}.7z" -u "${FTP_USER}:${FTP_PASS}" "ftp://${FTP_HOST}/${FTP_FOLDER}${line%/*}/" --ftp-create-dirs;
#-S -show errors
#-s -silent mode
#-an - no file name
#v- verbose
#/usr/bin/ncftp/ncftpput -m -u -c "${FTP_USER}" -p "${FTP_PASS}" -P "${FTP_PORT}" "${FTP_HOST}" "${FTP_FOLDER}${line%/*}/" "${line##*/}.7z"
# if [ $? -ne 0 ]; then echo "[$(date +'%d-%m-%Y %H:%M:%S')] Upload failed"; fi
done
#rm -rf "${BACKUP_DIR}/" #delete temporary folder
echo ""
echo "[$(date +'%d-%m-%Y %H:%M:%S')] completed..."
exit 0
I try this but it doesn't work for me...
/usr/bin/p7zip/7za a -an -t7z -ms=off -m0=Copy -mhe -mmt -mx0 -so -p"${PASSWORD}" | curl -S --disable-epsv -v -T - -u "${FTP_USER}:${FTP_PASS}" "ftp://${FTP_HOST}/${FTP_FOLDER}${line}/" --ftp-create-dirs;

tar command does not produce the .tar.gz file

I am trying to iterate in a loop, tar a couple of directories with each iteration and then compare the md5 sums of both of them. I notice that my first tar statement produces the tar files one level above the actual path of the directory. i.e. the statement:
tar -czvf ${folder_name}.tar.gz /tmp/psk1/hadoop_validation$ENV/${folder_name}
produces the ${folder_name}.tar.gz in /tmp/psk1/ rather than /tmp/psk1/hadoop_validation$ENV/
and the second tar statement:
tar -czvf ${folder_name}.tar.gz ${edge_base_dir}/wlossf$ENV/app/${folder_name}
doesn't produce the tar file at all. I can't find it even on one level above the actual path.
hdfs dfs -ls /haas/wlf/wlossf$ENV/app | while read rec; do
echo $rec
folder_path=`echo ${rec} | awk -F ' ' '{print $8}'`
folder_name=`echo ${folder_path} | awk -F '/' '{print $6}'`
if [ ! -z ${folder_name} ] && [ ! -z ${folder_path} ]; then
hdfs dfs -get ${folder_path} /tmp/psk1/hadoop_validation$ENV/
if [ $? -eq 0 ]; then
echo "Hadoop to local copy job Successful"
else
echo "Hadoop to local copy job Failed"
fi
tar -czvf ${folder_name}.tar.gz /tmp/psk1/hadoop_validation$ENV/${folder_name}
hadoop_md5=$(md5sum /tmp/psk1/hadoop_validation$ENV/${folder_name}.tar.gz)
tar -czvf ${folder_name}.tar.gz ${edge_base_dir}/wlossf$ENV/app/${folder_name}
edge_md5=$(md5sum ${edge_base_dir}/wlossf$ENV/app/${folder_name}.tar.gz)
if [ ${hadoop_md5} == ${edge_md5} ]; then
echo "${folder_name} is good"
else
echo "${folder_name} is bad"
fi
fi
echo ${folder_name}
echo ${folder_path}
done
What am I missing here? Any help would be appreciated.
Thank you.
As mouviciel said in the comments, tar by default creates the file in the current working directory.
Simply prefix the tar.gz file with the folder and it will create it where you want it:
tar -czvf /tmp/psk1/hadoop_validation$ENV/${folder_name}.tar.gz /tmp/psk1/hadoop_validation$ENV/${folder_name}
Note that as you will be creating the tar inside the same folder that you are archiving, you'll get a file changed as we read it warning as part of the output. Nothing to worry about.

How to move a single file with (.JPEG, .JPG, .jpeg, .jpg) extensions) and change the extension to .jpg with Linux bash

I have an inotify wait script that will move a file from one location to another whenever it detects that a file has been uploaded to the source directory.
The challenge I am facing is that i need to retain the basename of the file and convert the following extensions: .JPEG, .JPG, .jpeg to .jpg so that the file is renamed with the .jpg extension only.
Currently I have this:
TARGET="/target"
SRC="/source"
( while [ 1 ]
do inotifywait -m -r -e close_write --format %f -q \
$SRC | while read F
do mv "$SRC/$F" $TARGET
done
done ) &
So I need a way to split out and test for those non standard extensions and move the file with the correct extension. All files not having those 4 extensions just get moved as is.
Thanks!
Dave
if [[ "$F" =~ .JPEG\|jpg\|jpeg\|jpg ]];then
echo mv $F ${F%.*}.jpg
fi
Using extglob option with some parameter expansion:
#! /bin/bash
shopt -s extglob
TARGET=/target
SRC=/source
( while : ; do
inotifywait -m -r -r close_write --format %f -q \
$SRC | while read F ; do
basename=${F##*/} # Remove everything before /
ext=${basename##*.} # Remove everything before .
basename=${basename%.$ext} # Remove .$ext at the end
if [[ $ext == #(JPG|JPEG|jpeg) ]] ; then # Match any of the words
ext=jpg
fi
echo mv "$F" "$TARGET/$basename.$ext"
done
done ) &
Try this format. (Updated)
TARGET="/target"
SRC="/source"
(
while :; do
inotifywait -m -r -e close_write --format %f -q "$SRC" | while IFS= read -r F; do
case "$F" in
*.jpg)
echo mv "$SRC/$F" "$TARGET/" ## Move as is.
;;
*.[jJ][pP][eE][gG]|*.[jJ][pP][gG])
echo mv "$SRC/$F" "$TARGET/${F%.*}.jpg" ## Move with new proper extension.
;;
esac
done
done
) &
Remove echo from the mv commands if you find it correct already. Also it's meant for bash but could also be compatible with other shells. If you get an error with the read command try to remove the -r option.

How to extract archive from this script (using tar)

I have absolutly no idea how to unpack the created archive. I give you the complete Script.
A Debian based Distibution named Univention uses this to backup several files in an tar archive.
The real archive is packed in an function. The main Content where they create the actual tar file is:
cat "$TMPDIR/freeinfo.txt" >> "$TMPDIR/Installinfo.txt" 2>/dev/null
echo >$TMPDIR/endtag.txt
echo "%%%%OXBACKUP_${DATE}_HEADER_ENDTAG" >> "$TMPDIR/endtag.txt"
BACKUPINFO="$BACKUPINFO endtag.txt"
cat 2>/dev/null << EOF > "$TMPDIR/Installinfo.sh"
BACKUPHOSTNAME="$hostname"
BACKUPDOMAINNAME="$domainname"
BACKUPBASEDN="$ldap_base"
BACKUPTIMEZONE="$(cat /etc/timezone)"
BACKUPLANG="$(echo $locale_default)"
BACKUPSAMBADOM="$windows_domain"
BACKUPSAMBAINSTALLED="$SAMBAINSTALLED"
BACKUPOXINTEGRATIONVERSION="$INTEGRATIONVERSION"
BACKUPSECLEVEL="$(univention-config-registry get version/security-patchlevel)"
BACKUPVERSION=2
SECRETFILES="$SECRETFILES"
OTHERFILES="$OTHERFILES"
OXCONFIG="$OXCONFIG"
CRONTABS="$CRONTABS"
CERTFILES="$CERTFILES"
EOF
pstatus=()
#
# the actual backup to stdout
#
sync ; sync ; sync
RETVAL=$(
(tar cO $BACKUPINFO 2>/dev/null
tar cO $SECRETFILES 2>/dev/null
tar cO $OTHERFILES 2>/dev/null
tar cO $OXCONFIG 2>/dev/null
tar cO $CRONTABS 2>/dev/null
tar cO $CERTFILES 2>/dev/null
[ -f $EXTRAFILES ] && tar --no-recursion -T $EXTRAFILES -cO 2>/dev/null
tar --no-recursion --null -T dirlist_mailandfilestore -cO 2>/dev/null
tar --null -T filelist_mailandfilestore -cO 2>/dev/null
tar --no-recursion --null -T dirlist_shares -cO 2>/dev/null
tar --null -T filelist_shares -cO 2>/dev/null
) |
#help us out with smbclient, perl, scp until we get a working curl
case "$BACKUPPROTOCOL" in
##stripped protocol specific stuff ... (*) is the way to go!
(*) dd 2>>${LOGFILE}_${DATE} > ${BACKUPPATH:-$DEFAULTBACKUPPATH}/backup_$DATE && echo "201"
chmod 640 "${BACKUPPATH}/backup_$DATE" >/dev/null 2>&1
chown root:www-data "${BACKUPPATH}/backup_$DATE" >/dev/null 2>&1
if [[ x"$BACKUPPATH" != x && "$BACKUPPATH" != "$DEFAULTBACKUPPATH" ]] ; then
# temporary permissions fix
ln -sf "${BACKUPPATH}/backup_$DATE" "$DEFAULTBACKUPPATH/"
fi
;;
esac
)
the archive is 54 GB on the system, tar xvf extract only the first level of the archive. Sorry hard to explain. All in all I only get 40MB out of this 54GB. All the Dirs that should be in the archive are not extracted.
The use of
$((tar ...
tar ... ) | dd > foo)
is also totally unknown to me, what does this script do?
I think I found a solution myself ( I updated the script a little bit):
The Script generates a tag which marks the end of the first archive.
I used grep -A1 -a -b "HEADER_ENDTAG" backup.tar
Value was 41247795
dd skip=41247795 if=../../backup of=test
Looks like I could now extract the "real" archive. Is there another way to automatically jump to this byte offset, e.g. without using grep manually?
Your script appears to concatenate several tar files together into a single large file.
To extract a single section, I use a shell function / script like this:
File tarsection:
#!/bin/sh
tar_section() {
local x=1;
while [ $x -lt $1 ]; do
tar t > /dev/null || echo "Error in section $x" >&2
x=$(( $x+1 ))
done
shift
tar f - "$#"
}
tarfile="$1"
shift
tar_section "$#" < "$tarfile"
Then you can do (for example, for part 3 of the big file):
tarsection YOUR_54GB_BACKUP_FILE 3 -t | less
cd ...extractlocation
tarsection YOUR_54GB_BACKUP_FILE 3 -x

Resources