How to merge zcat and bzcat in a single function - linux

I would like to build a little helper function that can deal with fastq.gz and fastq.bz2 files.
I want to merge zcat and bzcat into one transparent function which can be used on both sorts of files:
zbzcat example.fastq.gz
zbzcat example.fastq.bz2
zbzcat() {
file=`echo $1 | `
## Not working
ext=${file##*/};
if [ ext == "fastq.gz" ]; then
exec gzip -cd "$#"
else
exec bzip -cd "$#"
fi
}
The extension extraction is not working correctly. Are you aware of other solutions

These are quite a lot of problems:
file=`echo $1 | ` gives a syntax error because there is no command after |. But you don't need the command substitution anyways. Just use file=$1.
ext=${file##*/} is not extracting the extension, but the filename. To extract the extension use ext=${file##*.}.
In your check you didn't use the variable $ext but the literal string ext.
Usually, only the string after the last dot in a filename is considered to be the extension. If you have file.fastq.gz, then the extension is gz. So use the check $ext = gz. That the uncompressed files are fastq files is irrelevant to the function anyways.
exec replaces the shell process with the given command. So after executing your function, the shell would exit. Just execute the command.
By the way: You don't have to extract the extension at all, when using pattern matchting:
zbzcat() {
file="$1"
case "$file" in
*.gz) gzip -cd "$#";;
*.bz2) bzip -cd "$#";;
*) echo "Unknown file format" >&2;;
esac
}
Alternatively, use 7z x which supports a lot of formats. Most distributions name the package p7zip.

ext=${1##*.}
Why are you throwing in an echo and try to strip a /?
Also, the string ext (3 characters) will never be equal to the string fastq.gz (7 characters). If you want to check that the extension equals gz, just do a
if [[ $ext == gz ]]
Having said this, relying on the extension to get an idea of the content of a file is a bit brave. Perhaps a more reliable way would be to use the file to determine the most likely file type. The probably safest approach would be to just try a bzip extraction first, and if it fails, do the gzip extraction.

I think it would be better if you would use mimetype.
File extensions are not always correct.
decomp() {
case $(file -b --mime-type $1) in
"application/gzip")
gzip -cd "$#"
;;
"application/x-bzip2")
bzcat "$#"
;;
"application/x-xz")
xzcat "$#"
;;
*)
echo "Unknown file format" >&2
;;
esac
}

Related

Using grep command inside case statement

So I have this script which im trying to determine the type of the file and act accordingly, I am determining the type of the file using file command and then grep for specific string , for example if the file is zipped then unzip it, if its gzipped then gunzip it, I want to add a lot of different types of file.
I am trying to replace the if statements with case and can't figure it out
My script looks like this:
##$arg is the file itself
TYPE="$(file $arg)"
if [[ $(echo $TYPE|grep "bzip2") ]] ; then
bunzip2 $arg
elif [[ $(echo $TYPE|grep "Zip") ]] ; then
unzip $arg
fi
Thanks to everyone that help :)
The general syntax is
case expr in
pattern) action;;
other) otheraction;;
*) default action --optional;;
esac
So for your snippet,
case $(file "$arg") in
*bzip2*) bunzip2 "$arg";;
*Zip*) unzip "$arg";;
esac
If you want to capture the file output into a variable first, do that, of course; but avoid upper case for your private variables.
bzip2 and unzip by default modify their input files, though. Perhaps you want to avoid that?
case $(file "$arg") in
*bzip2*) bzip2 -dc <"$arg";;
*Zip*) unzip -p "$arg";;
esac |
grep "stuff"
Notice also how the shell conveniently lets you pipe out of (and into) conditionals.

Get current directory (not full path) with filename only when sub folder is present in Linux bash

I have prepared a bash script to get only the directory (not full path) with file name where file is present. It has to be done only when file is located in sub directory.
For example:
if input is src/email/${sub_dir}/Bank_Casefeed.email, output should be ${sub_dir}/Bank_Casefeed.email.
If input is src/layouts/Bank_Casefeed.layout, output should be Bank_Casefeed.layout. I can easily get this using basename command.
src/basefolder is always constant. In some cases (after src/email(basefolder) directory), sub_directories will be there.
This script will work. I can use this script (only if module is email) to get output. but script should work even if sub directory is present in other modules. Maybe should I count the directories? if there are more than two directories (src/basefolder), script should get sub directories. Is there any better way to handle both scenarios?
#!/bin/bash
filename=`basename src/email/${sub_dir}/Bank_Casefeed.email`
echo "filename is $filename"
fulldir=`dirname src/email/${sub_dir}/Bank_Casefeed.email`
dir=`basename $fulldir`
echo "subdirectory name: $dir"
echo "concatenate $filename $dir"
Entity=$dir/$filename
echo $Entity
Using shell parameter expansion:
sub_dir='test'
files=( "src/email/${sub_dir}/Bank_Casefeed.email" "src/email/Bank_Casefeed.email" )
for f in "${files[#]}"; do
if [[ $f == *"/$sub_dir/"* ]]; then
echo "${f/*\/$sub_dir\//$sub_dir\/}"
else
basename "$f"
fi
done
test/Bank_Casefeed.email
Bank_Casefeed.email
I know there might be an easier way to do this. But I believe you can just manipulate the input string. For example:
#!/bin/bash
sub_dir='test'
DIRNAME1="src/email/${sub_dir}/Bank_Casefeed.email"
DIRNAME2="src/email/Bank_Casefeed.email"
echo $DIRNAME1 | cut -f3- -d'/'
echo $DIRNAME2 | cut -f3- -d'/'
This will remove the first two directories.

How to remove the extension of a file?

I have a folder that is full of .bak files and some other files also. I need to remove the extension of all .bak files in that folder. How do I make a command which will accept a folder name and then remove the extension of all .bak files in that folder ?
Thanks.
To remove a string from the end of a BASH variable, use the ${var%ending} syntax. It's one of a number of string manipulations available to you in BASH.
Use it like this:
# Run in the same directory as the files
for FILENAME in *.bak; do mv "$FILENAME" "${FILENAME%.bak}"; done
That works nicely as a one-liner, but you could also wrap it as a script to work in an arbitrary directory:
# If we're passed a parameter, cd into that directory. Otherwise, do nothing.
if [ -n "$1" ]; then
cd "$1"
fi
for FILENAME in *.bak; do mv "$FILENAME" "${FILENAME%.bak}"; done
Note that while quoting your variables is almost always a good practice, the for FILENAME in *.bak is still dangerous if any of your filenames might contain spaces. Read David W.'s answer for a more-robust solution, and this document for alternative solutions.
There are several ways to remove file suffixes:
In BASH and Kornshell, you can use the environment variable filtering. Search for ${parameter%word} in the BASH manpage for complete information. Basically, # is a left filter and % is a right filter. You can remember this because # is to the left of %.
If you use a double filter (i.e. ## or %%, you are trying to filter on the biggest match. If you have a single filter (i.e. # or %, you are trying to filter on the smallest match.
What matches is filtered out and you get the rest of the string:
file="this/is/my/file/name.txt"
echo ${file#*/} #Matches is "this/` and will print out "is/my/file/name.txt"
echo ${file##*/} #Matches "this/is/my/file/" and will print out "name.txt"
echo ${file%/*} #Matches "/name.txt" and will print out "/this/is/my/file"
echo ${file%%/*} #Matches "/is/my/file/name.txt" and will print out "this"
Notice this is a glob match and not a regular expression match!. If you want to remove a file suffix:
file_sans_ext=${file%.*}
The .* will match on the period and all characters after it. Since it is a single %, it will match on the smallest glob on the right side of the string. If the filter can't match anything, it the same as your original string.
You can verify a file suffix with something like this:
if [ "${file}" != "${file%.bak}" ]
then
echo "$file is a type '.bak' file"
else
echo "$file is not a type '.bak' file"
fi
Or you could do this:
file_suffix=$(file##*.}
echo "My file is a file '.$file_suffix'"
Note that this will remove the period of the file extension.
Next, we will loop:
find . -name "*.bak" -print0 | while read -d $'\0' file
do
echo "mv '$file' '${file%.bak}'"
done | tee find.out
The find command finds the files you specify. The -print0 separates out the names of the files with a NUL symbol -- which is one of the few characters not allowed in a file name. The -d $\0means that your input separators are NUL symbols. See how nicely thefind -print0andread -d $'\0'` together?
You should almost never use the for file in $(*.bak) method. This will fail if the files have any white space in the name.
Notice that this command doesn't actually move any files. Instead, it produces a find.out file with a list of all the file renames. You should always do something like this when you do commands that operate on massive amounts of files just to be sure everything is fine.
Once you've determined that all the commands in find.out are correct, you can run it like a shell script:
$ bash find.out
rename .bak '' *.bak
(rename is in the util-linux package)
Caveat: there is no error checking:
#!/bin/bash
cd "$1"
for i in *.bak ; do mv -f "$i" "${i%%.bak}" ; done
You can always use the find command to get all the subdirectories
for FILENAME in `find . -name "*.bak"`; do mv --force "$FILENAME" "${FILENAME%.bak}"; done

Linux: Move 1 million files into prefix-based created Folders

I have a directory called "images" filled with about one million images. Yep.
I want to write a shell command to rename all of those images into the following format:
original: filename.jpg
new: /f/i/l/filename.jpg
Any suggestions?
Thanks,
Dan
for i in *.*; do mkdir -p ${i:0:1}/${i:1:1}/${i:2:1}/; mv $i ${i:0:1}/${i:1:1}/${i:2:1}/; done;
The ${i:0:1}/${i:1:1}/${i:2:1} part could probably be a variable, or shorter or different, but the command above gets the job done. You'll probably face performance issues but if you really want to use it, narrow the *.* to fewer options (a*.*, b*.* or what fits you)
edit: added a $ before i for mv, as noted by Dan
You can generate the new file name using, e.g., sed:
$ echo "test.jpg" | sed -e 's/^\(\(.\)\(.\)\(.\).*\)$/\2\/\3\/\4\/\1/'
t/e/s/test.jpg
So, you can do something like this (assuming all the directories are already created):
for f in *; do
mv -i "$f" "$(echo "$f" | sed -e 's/^\(\(.\)\(.\)\(.\).*\)$/\2\/\3\/\4\/\1/')"
done
or, if you can't use the bash $( syntax:
for f in *; do
mv -i "$f" "`echo "$f" | sed -e 's/^\(\(.\)\(.\)\(.\).*\)$/\2\/\3\/\4\/\1/'`"
done
However, considering the number of files, you may just want to use perl as that's a lot of sed and mv processes to spawn:
#!/usr/bin/perl -w
use strict;
# warning: untested
opendir DIR, "." or die "opendir: $!";
my #files = readdir(DIR); # can't change dir while reading: read in advance
closedir DIR;
foreach my $f (#files) {
(my $new_name = $f) =~ s!^((.)(.)(.).*)$!$2/$3/$4/$1/;
-e $new_name and die "$new_name already exists";
rename($f, $new_name);
}
That perl is surely limited to same-filesystem only, though you can use File::Copy::move to get around that.
You can do it as a bash script:
#!/bin/bash
base=base
mkdir -p $base/shorts
for n in *
do
if [ ${#n} -lt 3 ]
then
mv $n $base/shorts
else
dir=$base/${n:0:1}/${n:1:1}/${n:2:1}
mkdir -p $dir
mv $n $dir
fi
done
Needless to say, you might need to worry about spaces and the files with short names.
I suggest a short python script. Most shell tools will balk at that much input (though xargs may do the trick). Will update with example in a sec.
#!/usr/bin/python
import os, shutil
src_dir = '/src/dir'
dest_dir = '/dest/dir'
for fn in os.listdir(src_dir):
os.makedirs(dest_dir+'/'+fn[0]+'/'+fn[1]+'/'+fn[2]+'/')
shutil.copyfile(src_dir+'/'+fn, dest_dir+'/'+fn[0]+'/'+fn[1]+'/'+fn[2]+'/'+fn)
Any of the proposed solutions which use a wildcard syntax in the shell will likely fail due to the sheer number of files you have. Of the current proposed solutions, the perl one is probably the best.
However, you can easily adapt any of the shell script methods to deal with any number of files thus:
ls -1 | \
while read filename
do
# insert the loop body of your preference here, operating on "filename"
done
I would still use perl, but if you're limited to only having simple unix tools around, then combining one of the above shell solutions with a loop like I've shown should get you there. It'll be slow, though.

How can I re-add a unicode byte order marker in linux?

I have a rather large SQL file which starts with the byte order marker of FFFE. I have split this file using the unicode aware linux split tool into 100,000 line chunks. But when passing these back to windows, it does not like any of the parts other than the first one as only it has the FFFE byte order marker on.
How can I add this two byte code using echo (or any other bash command)?
Based on sed's solution of Anonymous, sed -i '1s/^/\xef\xbb\xbf/' foo adds the BOM to the UTF-8 encoded file foo. Usefull is that it also converts ASCII files to UTF8 with BOM
To add BOMs to the all the files that start with "foo-", you can use sed. sed has an option to make a backup.
sed -i '1s/^\(\xff\xfe\)\?/\xff\xfe/' foo-*
straceing this shows sed creates a temp file with a name starting with "sed". If you know for sure there is no BOM already, you can simplify the command:
sed -i '1s/^/\xff\xfe/' foo-*
Make sure you need to set UTF-16, because i.e. UTF-8 is different.
For a general-purpose solution—something that sets the correct byte-order mark regardless of whether the file is UTF-8, UTF-16, or UTF-32—I would use vim’s 'bomb' option:
$ echo 'hello' > foo
$ xxd < foo
0000000: 6865 6c6c 6f0a hello.
$ vim -e -s -c ':set bomb' -c ':wq' foo
$ xxd < foo
0000000: efbb bf68 656c 6c6f 0a ...hello.
(-e means runs in ex mode instead of visual mode; -s means don’t print status messages; -c means “do this”)
Try uconv
uconv --add-signature
Something like (backup first)):
for i in $(ls *.sql)
do
cp "$i" "$i.temp"
printf '\xFF\xFE' > "$i"
cat "$i.temp" >> "$i"
rm "$i.temp"
done
Matthew Flaschen's answer is a good one, however it has a couple of flaws.
There's no check that the copy succeeded before the original file is truncated. It would be better to make everything contingent on a successful copy, or test for the existence of the temporary file, or to operate on the copy. If you're a belt-and-suspenders kind of person, you'd do a combo as I've illustrated below
The ls is unnecessary.
I'd use a better variable name than "i" - perhaps "file".
Of course, you could be very paranoid and check for the existence of the temporary file at the beginning so you don't accidentally overwrite it and/or use a UUID or a generated file name. One of mktemp, tempfile or uuidgen would do the trick.
td=TMPDIR
export TMPDIR=
usertemp=~/temp # set this to use a temp directory on the same filesystem
# you could use ./temp to ensure that it's one the same one
# you can use mktemp -d to create the dir instead of mkdir
if [[ ! -d $usertemp ]] # if this user temp directory doesn't exist
then # then create it, unless you can't
mkdir $usertemp || export TMPDIR=$td # if you can't create it and TMPDIR is/was
fi # empty then mktemp automatically falls
# back to /tmp
for file in *.sql
do
# TMPDIR if set overrides the argument to -p
temp=$(mktemp -p $usertemp) || { echo "$0: Unable to create temp file."; exit 1; }
{ printf '\xFF\xFE' > "$temp" &&
cat "$file" >> "$temp"; } || { echo "$0: Write failed on $file"; exit 1; }
{ rm "$file" &&
mv "$temp" "$file"; } || { echo "$0: Replacement failed for $file; exit 1; }
done
export TMPDIR=$td
Traps might be better than all the separate error handlers I've added.
No doubt all this extra caution is overkill for a one-shot script, but these techniques can save you when push comes to shove, especially in a multi-file operation.
$ printf '\xEF\xBB\xBF' > bom.txt
Then check:
$ grep -rl $'\xEF\xBB\xBF' .
./bom.txt

Resources