how to convert filename.bz2.gz file to filename.gz - linux

I have a bunch of files with filename.bz2.gz which I want to convert to filename.gz.
any help ?
thanks

Having your filename *.bz2.gz I assume the file had been created using the following order of compressions:
echo test | bzip2 | gzip -f > file.bz2.gz
Meaning it is a gzipped bzip2 file (for whatever reason). If my assumption is correct you can change it's compression to gzip-only, using the following commands:
gunzip < file.bz2.gz | bunzip2 | gzip > file.gz

If you just want to rename then do this.
for i in `ls|awk -F. '{print $1}'`
do
mv "$i".bz2.gz "$i".gz
done

I would refine Ajit's solution in this way:
for i in *.bz2.gz; do
i=${i%.bz2.gz}
mv "$i.bz2.gz" "$i.gz"
done
Using a glob rather than command subsitution avoids problems with word-splitting for filenames with whitespace. It also avoids the extra ls process, which is marginally more efficient, particularly on platforms like Cygwin with slow process forking. For the same reason, the awk command can be replaced with the ${parameter%[word]} parameter expansion syntax. (Quoting style of "$i".gz vs "$i.gz" makes no difference and is just personal preference.)

Related

Improve performance of Bash loop that removes windows line endings

Editor's note: This question was always about loop performance, but the original title led some answerers - and voters - to believe it was about how to remove Windows line endings.
The below bash loop below just remove the windows line endings and converts them to unix and appears to be running, but it is slow. The input files are small (4 files ranging from 167 bytes - 1 kb), and are all the same structure (list of names) and the only thing that varies is the length (ie. some files are 10 names others are 50). Is it supposed to take over 15 minutes to complete this task using a xeon processor? Thank you :)
for f in /home/cmccabe/Desktop/files/*.txt ; do
bname=`basename $f`
pref=${bname%%.txt}
sed 's/\r//' $f - $f > /home/cmccabe/Desktop/files/${pref}_unix.txt
done
Input .txt files
AP3B1
BRCA2
BRIP1
CBL
CTC1
EDIT
This is not a duplicate as I was more asking for why my bash loop that uses sed to remove windows line endings was running so slow. I did not mean to imply how to remove them, was asking for ideas that might speed up the loop and I got many. Thank you :). I hope this helps.
Use the utilities dos2unix and unix2dos to convert between unix and windows style line endings.
Your 'sed' command looks wrong. I believe the trailing $f - $f should simply be $f. Running your script as written hangs for a very long time on my system, but making this change causes it to complete almost instantly.
Of course, the best answer is to use dos2unix, which was designed to handle this exact thing:
cd /home/cmccabe/Desktop/files
for f in *.txt ; do
pref=$(basename -s '.txt' "$f")
dos2unix -q -n "$f" "${pref}_unix.txt"
done
This always works for me:
perl -pe 's/\r\n/\n/' inputfile.txt > outputfile.txt
you can use dos2unix as stated before or use this small sed:
sed 's/\r//' file
The key to performance in Bash is to avoid loops in general, and in particular those that call one or more external utilities in each iteration.
Here is a solution that uses a single GNU awk command:
awk -v RS='\r\n' '
BEGINFILE { outFile=gensub("\\.txt$", "_unix&", 1, FILENAME) }
{ print > outFile }
' /home/cmccabe/Desktop/files/*.txt
-v RS='\r\n' sets CRLF as the input record separator, and by virtue of leaving ORS, the output record separator at its default, \n, simply printing each input line will terminate it with \n.
the BEGINFILE block is executed every time processing of a new input file starts; in it, gensub() is used to insert _unix before the .txt suffix of the input file at hand to form the output filename.
{print > outFile} simply prints the \n-terminated lines to the output file at hand.
Note that use of a multi-char. RS value, the BEGINFILE block, and the gensub() function are GNU extensions to the POSIX standard.
Switching from the OP's sed solution to a GNU awk-based one was necessary in order to provide a single-command solution that is both simpler and faster.
Alternatively, here's a solution that relies on dos2unix for conversion of Window line-endings (for instance, you can install dos2unix with sudo apt-get install dos2unix on Debian-based systems); except for requiring dos2unix, it should work on most platforms (no GNU utilities required):
It uses a loop only to construct the array of filename arguments to pass to dos2unix - this should be fast, given that no call to basename is involved; Bash-native parameter expansion is used instead.
then uses a single invocation of dos2unix to process all files.
# cd to the target folder, so that the operations below do not need to handle
# path components.
cd '/home/cmccabe/Desktop/files'
# Collect all *.txt filenames in an array.
inFiles=( *.txt )
# Derive output filenames from it, using Bash parameter expansion:
# '%.txt' matches '.txt' at the end of each array element, and replaces it
# with '_unix.txt', effectively inserting '_unix' before the suffix.
outFiles=( "${inFiles[#]/%.txt/_unix.txt}" )
# Create an interleaved array of *input-output filename pairs* to be passed
# to dos2unix later.
# To inspect the resulting array, run `printf '%s\n' "${fileArgs[#]}"`
# You'll see pairs like these:
# file1.txt
# file1_unix.txt
# ...
fileArgs=(); i=0
for inFile in "${inFiles[#]}"; do
fileArgs+=( "$inFile" "${outFiles[i++]}" )
done
# Now, use a *single* invocation of dos2unix, passing all input-output
# filename pairs at once.
dos2unix -q -n "${fileArgs[#]}"

Pass sed output as if it's file

I have ruby script that receives name of config file as argument.
I need to run it in loop changing some param inside the config each iteration.
Everything ok with sed, however I have no idea how can I pass the sed's script output to ruby, so that ruby will think that it's file? Is it possible?
It might be clearer with code:
That is how it's usually launched:
ruby script.rb config.conf
What I want is:
sed 's/one_param/another_param/' config.conf | ruby script.rb ???????
What should I put so that ruby script were think that it received file with content as sed's output?
I thought about workaround with saving temporary file as sed's output and then passing the file to script.rb, but I sure there is better way to achieve it
See this answer on how to use process substitution.
In short:
cat <( echo "yo")
Or in your case:
ruby script.rb <(sed 's/one_param/another_param/' config.conf)
To create a process substitution you enclose the command with <(...) like: <(COMMAND)
Check http://mywiki.wooledge.org/ProcessSubstitution
Conventionally, UNIX programs accept - as a filename to mean "read from standard input":
echo foo | wc -
This is a convention that works basically everywhere.
However, script writers who don't know Unix may not think to implement this. This is a bug that should be fixed by them, but you can work around it using /dev/stdin instead:
echo foo | wc /dev/stdin
In your example, this would be one of
sed 's/one_param/another_param/' config.conf | ruby script.rb -
sed 's/one_param/another_param/' config.conf | ruby script.rb /dev/stdin

Alternative to ls in shell-script compatible with nohup

I have a shell-script which lists all the file names in a directory and store them in a new file.
The problem is that when I execute this script with the nohup command, it lists the first name four times instead of listing the correct names.
Commenting the problem with other programmers they think that the problem may be the ls command.
Part of my code is the following:
for i in $( ls -1 ./Datasets/); do
awk '{print $1}' ./genes.txt | head -$num_lineas | tail -1 >> ./aux
let num_lineas=$num_lineas-1
done
Do you know an alternative to ls that works well with nohup?
Thanks.
Don't use ls to feed the loop, use:
for i in ./Datasets/*; do
or if subdirectories are of interest
for i in ./Datasets/*/*; do
Lastly, and more correctly, use find if you need the entire tree below Datasets:
find ./Datasets -type f | while IFS= read -r file; do
(do stuff with $file)
done
Others frown, but there is nothing wrong with also using find as:
for file in $(find ./Datasets -type f); do
(do stuff with $file)
done
Just choose the syntax that most closely meets your needs.
First of all, don't parse ls! A simple glob will suffice. Secondly, your awk | head | tail chain can be simplified by only printing the first column of the line that you're interested in using awk. Thirdly, you can redirect the output of your loop to a file, rather than using >>.
Incorporating all of those changes into your script:
for i in Datasets/*; do
awk -v n="$(( num_lineas-- ))" 'NR==n{print $1}' genes.txt
done > aux
Every time the loop goes round, the value of $num_lineas will decrease by 1.
In terms of your problem with nohup, I would recommend looking into using something like screen, which is known to be a better solution for maintaining a session between logins.

Regarding grep command

I am trying to change the copyright headers in my assignment. I was able to list all the files with the copyright headers by using following commmand:
grep -rni copyright *
By the above command, I got around 1000 files.
Can anyone please help me how to change all the files in one go?
This will apply a text change to files with the word "copyright" in them (case insensitive):
for filename in *; do
if grep -qi "copyright" "$filename"; then
sed -i'' -e 's/old text/new text/' "$filename"
fi
done
Note that this only works on the current directory. To handle files in subdirectories, you'll probably want to use the find command.
If you can describe the text change you want to make, we may be able to suggest more precise methods to achieve your goal.
grep -ril copyright * | xargs sed -i 's/old text/new text/'
There's a simple tool called headache I've found quite useful for dealing with this sort of problem. Available on Debian and Ubuntu at least.

How can I randomize the lines in a file using standard tools on Red Hat Linux?

How can I randomize the lines in a file using standard tools on Red Hat Linux?
I don't have the shuf command, so I am looking for something like a perl or awk one-liner that accomplishes the same task.
Um, lets not forget
sort --random-sort
shuf is the best way.
sort -R is painfully slow. I just tried to sort 5GB file. I gave up after 2.5 hours. Then shuf sorted it in a minute.
And a Perl one-liner you get!
perl -MList::Util -e 'print List::Util::shuffle <>'
It uses a module, but the module is part of the Perl code distribution. If that's not good enough, you may consider rolling your own.
I tried using this with the -i flag ("edit-in-place") to have it edit the file. The documentation suggests it should work, but it doesn't. It still displays the shuffled file to stdout, but this time it deletes the original. I suggest you don't use it.
Consider a shell script:
#!/bin/sh
if [[ $# -eq 0 ]]
then
echo "Usage: $0 [file ...]"
exit 1
fi
for i in "$#"
do
perl -MList::Util -e 'print List::Util::shuffle <>' $i > $i.new
if [[ `wc -c $i` -eq `wc -c $i.new` ]]
then
mv $i.new $i
else
echo "Error for file $i!"
fi
done
Untested, but hopefully works.
cat yourfile.txt | while IFS= read -r f; do printf "%05d %s\n" "$RANDOM" "$f"; done | sort -n | cut -c7-
Read the file, prepend every line with a random number, sort the file on those random prefixes, cut the prefixes afterwards. One-liner which should work in any semi-modern shell.
EDIT: incorporated Richard Hansen's remarks.
A one-liner for python:
python -c "import random, sys; lines = open(sys.argv[1]).readlines(); random.shuffle(lines); print ''.join(lines)," myFile
And for printing just a single random line:
python -c "import random, sys; print random.choice(open(sys.argv[1]).readlines())," myFile
But see this post for the drawbacks of python's random.shuffle(). It won't work well with many (more than 2080) elements.
Related to Jim's answer:
My ~/.bashrc contains the following:
unsort ()
{
LC_ALL=C sort -R "$#"
}
With GNU coreutils's sort, -R = --random-sort, which generates a random hash of each line and sorts by it. The randomized hash wouldn't actually be used in some locales in some older (buggy) versions, causing it to return normal sorted output, which is why I set LC_ALL=C.
Related to Chris's answer:
perl -MList::Util=shuffle -e'print shuffle<>'
is a slightly shorter one-liner. (-Mmodule=a,b,c is shorthand for -e 'use module qw(a b c);'.)
The reason giving it a simple -i doesn't work for shuffling in-place is because Perl expects that the print happens in the same loop the file is being read, and print shuffle <> doesn't output until after all input files have been read and closed.
As a shorter workaround,
perl -MList::Util=shuffle -i -ne'BEGIN{undef$/}print shuffle split/^/m'
will shuffle files in-place. (-n means "wrap the code in a while (<>) {...} loop; BEGIN{undef$/} makes Perl operate on files-at-a-time instead of lines-at-a-time, and split/^/m is needed because $_=<> has been implicitly done with an entire file instead of lines.)
When I install coreutils with homebrew
brew install coreutils
shuf becomes available as n.
Mac OS X with DarwinPorts:
sudo port install unsort
cat $file | unsort | ...
FreeBSD has its own random utility:
cat $file | random | ...
It's in /usr/games/random, so if you have not installed games, you are out of luck.
You could consider installing ports like textproc/rand or textproc/msort. These might well be available on Linux and/or Mac OS X, if portability is a concern.
On OSX, grabbing latest from http://ftp.gnu.org/gnu/coreutils/ and something like
./configure
make
sudo make install
...should give you /usr/local/bin/sort --random-sort
without messing up /usr/bin/sort
Or get it from MacPorts:
$ sudo port install coreutils
and/or
$ /opt/local//libexec/gnubin/sort --random-sort

Resources