Make bashscript shorter with pipes - linux

I have some textfiles (all files have this scheme in each line 123:abc) and want to make two seperate files with these. One big file with all lines (but uniq) and with this a file with the strings after the token ":".
This here works:
cat *.txt >> bigtextfile.txt
sort -u bigtextfile.txt -o bigtextfile.txt
cat bigtextfile.txt | cut -d: -f2 >> bigtextfile-filtered.txt
But can i do this much shorter with pipes?

sort accepts multiple file inputs, so you can produce your bigtextfile.txt in one sitting :
sort -u *.txt -o bigtextfile.txt
cut also accepts a file input parameter, no need for cat :
cut -d: -f2 bigtextfile.txt >> bigtextfile-filtered.txt
If you don't need the bigtextfile.txt in itself and just use it as an intermediate to producing bigtextfile-filtered.txt you can do that in one line :
sort -u *.txt | cut -d: -f2 >> bigtextfile-filtered.txt

I suggest:
sort -u *.txt | cut -d: -f2 >> bigtextfile-filtered.txt

Try this:
cat *.txt | sort -u | cut -d: -f2 >> bigtextfile-filtered.txt

Related

Output of wc -l without file-extension

I've got the following line:
wc -l ./*.txt | sort -rn
i want to cut the file extension. So with this code i've got the output:
number filename.txt
for all my .txt-files in the .-directory. But I want the output without the file-extension, like this:
number filename
I tried a pipe with cut for different kinds of parameter, but all i got was to cut the whole filename with this command.
wc -l ./*.txt | sort -rn | cut -f 1 -d '.'
Assuming you don't have newlines in your filename you can use sed to strip out ending .txt:
wc -l ./*.txt | sort -rn | sed 's/\.txt$//'
unfortunately, cut doesn't have a syntax for extracting columns according to an index from the end. One (somewhat clunky) trick is to use rev to reverse the line, apply cut to it and then rev it back:
wc -l ./*.txt | sort -rn | rev | cut -d'.' -f2- | rev
Using sed in more generic way to cut off whatever extension the files have:
$ wc -l *.txt | sort -rn | sed 's/\.[^\.]*$//'
14 total
8 woc
3 456_base
3 123_base
0 empty_base
A better approach using proper mime type (what is the extension of tar.gz or such multi extensions ? )
#!/bin/bash
for file; do
case $(file -b $file) in
*ASCII*) echo "this is ascii" ;;
*PDF*) echo "this is pdf" ;;
*) echo "other cases" ;;
esac
done
This is a POC, not tested, feel free to adapt/improve/modify

How to sort a text file numerically and then store the results in the same text file?

I have tried sort -n test.text > test.txt. However, this leaves me with an empty text file. What is going on here and what can I do to solve this problem?
Sort does not sort the file in-place. It outputs a sorted copy instead.
You need sort -n -k 4 out.txt > sorted-out.txt.
Edit: To get the order you want you have to sort the file with the numbers read in reverse. This does it:
cut -d' ' -f4 out.txt | rev | paste - out.txt | sort -k1 -n | cut -f2- > sorted-out.txt
For more learning -
sort -nk4 file
-n for numerical sort
-k for providing key
or add -r option for reverse sorting
sort -nrk4 file
It is because you are reading and writing to the same file. You can't do that. You can try something a temporary file, as mktemp or even something as:
sort -n test.text > test1.txt
mv test1.txt test
For sort, you can also do the following:
sort -n test.text -o test.text

using linux cat and grep command

I am having following syntax for one of my file.Could you please anyone explain me what is this command doing
path = /document/values.txt
where we have different username specified e.g username1 = john,username2=marry
cat ${path} | grep -e username1 | cut -d'=' -f2`
my question here is cat command is reading from the file value of username1 but why why we need to use cut command?
Cat is printing the file. The file has username1=something in one of the lines. The cut command splits this and prints out the second argument.
your command was not written well. the cat is useless.
you can do:
grep -e pattern "$path"|cut ...
you can of course do it with single process with awk if you like. anyway the line in your question smells not good.
awk example:
awk -F'=' '/pattern/{print $2}' inputFile
cut -d'=' -f2`
This cut uses -d'=' that means you use '=' as 'field delimiter' and -f2 will take only de second field.
So in this case you want only the value after the "=" .

Bash grep output filename and line no without matches

I need to get a list of matches with grep including filename and line number but without the match string
I know that grep -Hl will give only file names and grep -Hno will give filename with only matching string. But those not ideal for me. I need to get a list without match but with line no. For this grep -Hln doesn't work. I tried with grep -Hn 'pattern' | cut -d " " -f 1 But it doesn't cut the filename and line no properly.
awk can do that in single command:
awk '/pattern/ {print FILENAME ":" NR}' *.txt
You were pointing it well with cut, only that you need the : field separator. Also, I think you need the first and second group. Hence, use:
grep -Hn 'pattern' files* | cut -d: -f1,2
Sample
$ grep -Hn a a*
a:3:are
a:10:bar
a:11:that
a23:1:hiya
$ grep -Hn a a* | cut -d: -f1,2
a:3
a:10
a:11
a23:1
I guess you want this, just line numbers:
grep -nh PATTERN /path/to/file | cut -d: -f1
example output:
12
23
234
...
Unfortunately you'll need to use cut here. There is no way to do it with pure grep.
Try
grep -RHn Studio 'pattern' | awk -F: '{print $1 , ":", $2}'

How to extract version from a single command line in linux?

I have a product which has a command called db2level whose output is given below
I need to extract 8.1.1.64 out of it, so far i came up with,
db2level | grep "DB2 v" | awk '{print$5}'
which gave me an output v8.1.1.64",
Please help me to fetch 8.1.1.64. Thanks
grep is enough to do that:
db2level| grep -oP '(?<="DB2 v)[\d.]+(?=", )'
Just with awk:
db2level | awk -F '"' '$2 ~ /^DB2 v/ {print substr($2,6)}'
db2level | grep "DB2 v" | awk '{print$5}' | sed 's/[^0-9\.]//g'
remove all but numbers and dot
sed is your friend for general extraction tasks:
db2level | sed -n -e 's/.*tokens are "DB2 v\([0-9.]*\)".*/\1/p'
The sed line does print no lines (the -n) but those where a replacement with the given regexp can happen. The .* at the beginning and the end of the line ensure that the whole line is matched.
Try grep with -o option:
db2level | grep -E -o "[0-9]+\.[0-9]+\.[0-9]\+[0-9]+"
Another sed solution
db2level | sed -n -e '/v[0-9]/{s/.*DB2 v//;s/".*//;p}'
This one desn't rely on the number being in a particular format, just in a particular place in the output.
db2level | grep -o "v[0-9.]*" | tr -d v
Try s.th. like db2level | grep "DB2 v" | cut -d'"' -f2 | cut -d'v' -f2
cut splits the input in parts, seperated by delimiter -d and outputs field number -f

Resources