What does `cat -` mean in this snippet? - cat

echo -n x | cat - pipe1 > pipe2 &
cat <pipe2 > pipe1
Comes from article http://www.linuxjournal.com/article/2156

cat first produces "x" from stdin (-), then whatever named pipe1 offers.
Following it with that 2nd cat command is pretty wild, as the "x" bounces around between two cat's. Vaught in 1997 explained:
both cat programs are running like crazy copying the letter x back and forth in an endless loop.

Related

Print second last line from variable in bash

VAR="1\n2\n3"
I'm trying to print out the second last line. One liner in bash!
I've gotten so far: printf -- "$VAR" | head -2
It however prints out too much.
I can do this with a file no problem: tail -2 ~/file | head -1
You almost done this task by yourself. Try
VAR="1\n2\n3"; printf -- "$VAR"|tail -2|head -1
Here is one pure bash way of doing this:
readarray -t arr < <(printf -- "$VAR") && echo "${arr[-2]}"
2
You may also use this awk as a single command:
VAR="1\n2\n3"
awk -F '\\\\n' '{print $(NF-1)}' <<< "$VAR"
2
maybe more efficient using a temporary variable and using expansions
var=$'1\n2\n3' ; tmpvar=${var%$'\n'*} ; echo "${tmpvar##*$'\n'}"
Use echo -e for backslash interpretation and to translate \n to newlines and print the interested line number using NR.
$ echo -e "${VAR}" | awk 'NR==2'
2
With multiple lines and do, tail and head can be used to print any particular line number.
$ echo -e "$VAR" | tail -2 | head -1
2
or do a fancy sed, where you keep the previous line in the buffer-space (x) to print and keep deleting until the last line,
$ echo -e "$VAR" | sed 'x;$!d'
2

How to efficiently loop through the lines of a file in Bash?

I have a file example.txt with about 3000 lines with a string in each line. A small file example would be:
>cat example.txt
saudifh
sometestPOIFJEJ
sometextASLKJND
saudifh
sometextASLKJND
IHFEW
foo
bar
I want to check all repeated lines in this file and output them. The desired output would be:
>checkRepetitions.sh
found two equal lines: index1=1 , index2=4 , value=saudifh
found two equal lines: index1=3 , index2=5 , value=sometextASLKJND
I made a script checkRepetions.sh:
#!bin/bash
size=$(cat example.txt | wc -l)
for i in $(seq 1 $size); do
i_next=$((i+1))
line1=$(cat example.txt | head -n$i | tail -n1)
for j in $(seq $i_next $size); do
line2=$(cat example.txt | head -n$j | tail -n1)
if [ "$line1" = "$line2" ]; then
echo "found two equal lines: index1=$i , index2=$j , value=$line1"
fi
done
done
However this script is very slow, it takes more than 10 minutes to run. In python it takes less than 5 seconds... I tried to store the file in memory by doing lines=$(cat example.txt) and doing line1=$(cat $lines | cut -d',' -f$i) but this is still very slow...
When you do not want to use awk (a good tool for the job, parsing the input only once),
you can run through the lines several times. Sorting is expensive, but this solution avoids the loops you tried.
grep -Fnxf <(uniq -d <(sort example.txt)) example.txt
With uniq -d <(sort example.txt) you find all lines that occur more than once. Next grep will search for these (option -f) complete (-x) lines without regular expressions (-F) and show the line it occurs (-n).
See why-is-using-a-shell-loop-to-process-text-considered-bad-practice for some of the reasons why your script is so slow.
$ cat tst.awk
{ val2hits[$0] = val2hits[$0] FS NR }
END {
for (val in val2hits) {
numHits = split(val2hits[val],hits)
if ( numHits > 1 ) {
printf "found %d equal lines:", numHits
for ( hitNr=1; hitNr<=numHits; hitNr++ ) {
printf " index%d=%d ,", hitNr, hits[hitNr]
}
print " value=" val
}
}
}
$ awk -f tst.awk file
found 2 equal lines: index1=1 , index2=4 , value=saudifh
found 2 equal lines: index1=3 , index2=5 , value=sometextASLKJND
To give you an idea of the performance difference using a bash script that's written to be as efficient as possible and an equivalent awk script:
bash:
$ cat tst.sh
#!/bin/bash
case $BASH_VERSION in ''|[123].*) echo "ERROR: bash 4.0 required" >&2; exit 1;; esac
# initialize an associative array, mapping each string to the last line it was seen on
declare -A lines=( )
lineNum=0
while IFS= read -r line; do
(( ++lineNum ))
if [[ ${lines[$line]} ]]; then
printf 'Content previously seen on line %s also seen on line %s: %s\n' \
"${lines[$line]}" "$lineNum" "$line"
fi
lines[$line]=$lineNum
done < "$1"
$ time ./tst.sh file100k > ou.sh
real 0m15.631s
user 0m13.806s
sys 0m1.029s
awk:
$ cat tst.awk
lines[$0] {
printf "Content previously seen on line %s also seen on line %s: %s\n", \
lines[$0], NR, $0
}
{ lines[$0]=NR }
$ time awk -f tst.awk file100k > ou.awk
real 0m0.234s
user 0m0.218s
sys 0m0.016s
There are no differences in the output of both scripts:
$ diff ou.sh ou.awk
$
The above is using 3rd-run timing to avoid caching issues and being tested against a file generated by the following awk script:
awk 'BEGIN{for (i=1; i<=10000; i++) for (j=1; j<=10; j++) print j}' > file100k
When the input file had zero duplicate lines (generated by seq 100000 > nodups100k) the bash script executed in about the same amount of time as it did above while the awk script executed much faster than it did above:
$ time ./tst.sh nodups100k > ou.sh
real 0m15.179s
user 0m13.322s
sys 0m1.278s
$ time awk -f tst.awk nodups100k > ou.awk
real 0m0.078s
user 0m0.046s
sys 0m0.015s
To demonstrate a relatively efficient (within the limits of the language and runtime) native-bash approach, which you can see running in an online interpreter at https://ideone.com/iFpJr7:
#!/bin/bash
case $BASH_VERSION in ''|[123].*) echo "ERROR: bash 4.0 required" >&2; exit 1;; esac
# initialize an associative array, mapping each string to the last line it was seen on
declare -A lines=( )
lineNum=0
while IFS= read -r line; do
lineNum=$(( lineNum + 1 ))
if [[ ${lines[$line]} ]]; then
printf 'found two equal lines: index1=%s, index2=%s, value=%s\n' \
"${lines[$line]}" "$lineNum" "$line"
fi
lines[$line]=$lineNum
done <example.txt
Note the use of while read to iterate line-by-line, as described in BashFAQ #1: How can I read a file line-by-line (or field-by-field)?; this permits us to open the file only once and read through it without needing any command substitutions (which fork off subshells) or external commands (which need to be individually started up by the operating system every time they're invoked, and are likewise expensive).
The other part of the improvement here is that we're reading the whole file only once -- implementing an O(n) algorithm -- as opposed to running O(n^2) comparisons as the original code did.

Copy first row to the last in file

The purpose here is to copy the first row in the file to the last
Here the input file
335418.75,2392631.25,36091,38466,1
335418.75,2392643.75,36092,38466,1
335418.75,2392656.25,36093,38466,1
335418.75,2392668.75,36094,38466,1
335418.75,2392681.25,36095,38466,1
335418.75,2392693.75,36096,38466,1
335418.75,2392706.25,36097,38466,1
335418.75,2392718.75,36098,38466,1
335418.75,2392731.25,36099,38466,1
Using the following code i got the output desired. Is there other easy option?
awk 'NR==1 {print}' FF1-1.csv > tmp1
cat FF1-1.csv tmp1
Output desired
335418.75,2392631.25,36091,38466,1
335418.75,2392643.75,36092,38466,1
335418.75,2392656.25,36093,38466,1
335418.75,2392668.75,36094,38466,1
335418.75,2392681.25,36095,38466,1
335418.75,2392693.75,36096,38466,1
335418.75,2392706.25,36097,38466,1
335418.75,2392718.75,36098,38466,1
335418.75,2392731.25,36099,38466,1
335418.75,2392631.25,36091,38466,1
Thanks in advance.
Save the line in a variable and print at end using the END block
$ seq 5 | awk 'NR==1{fl=$0} 1; END{print fl}'
1
2
3
4
5
1
headcan produce the same output as your awk, so you can cat that instead.
You can use process substitution to avoid the temporary file.
cat FF1-1.csv <(head -n 1 FF1-1.csv)
As mentionned by Sundeep if process substitution isn't available you can simply cat the file then head it sequentially to obtain the same result, putting both in a subshell if you need to redirect the output :
(cat FF1-1.csv; head -n1 FF1-1.csv) > dest
Another alternative would be to pipe the output of head to cat and refer to it with - which for cat represents standard input :
head -1 FF1-1.csv | cat FF1-1.csv -
When you want to overwrite the existing, normal solutions can fail: do not write to a file you are working with.
A solution for editing the file is:
printf "%s\n" 1y $ x w q | ed -s file > /dev/null
Explanation:
printf will help for entering all commands in new lines.
1y will put the first line in a buf.
$ moves to the last line.
x will paste the contents of the buf.
w will write the results.
q will quit the editor.
ed is the editor that performs all work.
-s is suppressing diagnostics.
file is your input file.
> /dev/null is suppressing output to your screen.
With GNU sed:
seq 1 5 | sed '1h;$G'
Output:
1
2
3
4
5
1
1h: In first row: copy current row (pattern space) to sed's hold space
$G: In last row ($): append content from hold space to pattern space
See: man sed
Following solution may also help on same:
Solution 1st: Simply using awk with using RS and FS here(without using variables):
awk -v RS="" -v FS="\n" '{print $0 ORS $1}' Input_file
Solution 2nd: Using cat and head:
cat Input_file && head -n1 Input_file

Adding spaces after each character in a string

I have a string variable in my script, made up of the 9 permission characters from ls -l
eg:
rwxr-xr--
I want to manipulate it so that it displays like this:
r w x r - x r - -
IE every three characters is tab separated and all others are separated by a space. The closest I've come is using a printf
printf "%c %c %c\t%c %c %c\t%c %c %c\t/\n" "$output"{1..9}
This only prints the first character but formatted correctly
I'm sure there's a way to do it using "sed" that I can't think of
Any advice?
Using the Posix-specified utilities fold and paste, split the string into individual characters, and then interleave a series of delimiters:
fold -w1 <<<"$str" | paste -sd' \t'
$ sed -r 's/(.)(.)(.)/\1 \2 \3\t/g' <<< "$output"
r w x r - x r - -
Sadly, this leaves a trailing tab in the output. If you don't want that, use:
$ sed -r 's/(.)(.)(.)/\1 \2 \3\t/g; s/\t$//' <<< "$str"
r w x r - x r - -
Why do u need to parse them? U can access to every element of string by copy needed element. It's a very easy and without any utility, for example:
DATA="rwxr-xr--"
while [ $i -lt ${#DATA} ]; do
echo ${DATA:$i:1}
i=$(( i+1 ))
done
With awk:
$ echo "rwxr-xr--" | awk '{gsub(/./,"& ");gsub(/. . . /,"&\t")}1'
r w x r - x r - -
> echo "rwxr-xr--" | sed 's/\(.\{3,3\}\)/\1\t/g;s/\([^\t]\)/\1 /g;s/\s*$//g'
r w x r - x r - -
( Evidently I didn't put much thought into my sed command. John Kugelman's version is obviously much clearer and more concise. )
Edit: I wholeheartedly agree with triplee's comment though. Don't waste your time trying to parse ls output. I did that for a long time before I figured out you can get exactly what you want (and only what you want) much easier by using stat. For example:
> stat -c %a foo.bar # Equivalent to stat --format %a
0754
The -c %a tells stat to output the access rights of the specified file, in octal. And that's all it prints out, thus eliminating the need to do wacky stuff like ls foo.bar | awk '{print $1}', etc.
So for instance you could do stuff like:
GROUP_READ_PERMS=040
perms=$(stat -c %a foo.bar)
if (( (perms & GROUP_READ_PERMS) != 0 )); then
... # Do some stuff
fi
Sure as heck beats parsing strings like "rwxr-xr--"
sed 's/.../& /2g;s/./& /g' YourFile
in 2 simple step
A version which includes a pure bash version for short strings, and sed for longer strings, and preserves newlines (adding a space after them too)
if [ "${OS-}" = "Windows_NT" ]; then
threshold=1000
else
threshold=100
fi
function escape()
{
local out=''
local -i i=0
local str="${1}"
if [ "${#str}" -gt "${threshold}" ]; then
# Faster after sed is started
sed '# Read all lines into one buffer
:combine
$bdone
N
bcombine
:done
s/./& /g' <<< "${str}"
else
# Slower, but no process to load, so faster for short strings. On windows
# this can be a big deal
while (( i < ${#str} )); do
out+="${str:$i:1} "
i+=1
done
echo "$out"
fi
}
Explanation of sed. "If this is the last line, jump to :done, else append Next into buffer and jump to :combine. After :done is a simple sed replacement expression. The entire string (newlines and all) are in one buffer so that the replace works on newlines too (which are lost in some of the awk -F examples)
Plus this is Linux, Mac, and Git for Windows compatible.
Setting awk -F '' allows each character to be bounded, then you'll want to loop through and print each field.
Example:
ls -l | sed -n 2p | awk -F '' '{for(i=1;i<=NF;i++){printf " %s ",$i;}}'; echo ""
The part seems like the answer to your question:
awk -F '' '{for(i=1;i<=NF;i++){printf " %s ",$i;}}'
I realize, this doesn't provide the trinary grouping you wanted though. hmmm...

A shell command for composing a file from chunks of another file

I have a data file and a file containing a list of positions and I want to generate a file from chunks of the data file. Example:
$ cat data
abcdefghijkl
$ cat positions
0,2
5,8
$ cutter positions data
abcfghi
Is there a (linux) shell command that works like my hypothetical "cutter"?
The particular format for "positions" is not important.
We can assume that the chunks specified in "positions" are in increasing order and do not overlap.
There might be an additional "cutter" mode where the positions count lines not bytes.
I could implement such a program myself easily but I have the gut feeling that such a program already exist.
Just using bash's substring extraction from parameter expansion, and using the positions file format as given:
data=$(< data) # read the entire file into a variable
while IFS=, read start stop; do
printf "%s" "${data:$start:((stop-start+1))}"
done < positions
echo
outputs
abcfghi
If your data file spans multiple lines, you will have to take care with the positions file to account for the newline characters.
This method does not require you to read the data file into memory:
#!/bin/bash
exec 3<data
exec 4<positions
pos=0
while IFS=, read start stop <&4; do
((nskip = start - pos))
((nkeep = stop - start + 1))
((pos += nskip + nkeep))
((nskip > 0)) && read -N $nskip <&3
read -N $nkeep <&3
printf "%s" "$REPLY"
done
echo
cut -c will allow you to specify fixed width columns, which seems to be what you're looking for:
$ echo "abcdefghijkl" | cut -c 1-3,6-9
abcfghi
Note that the character positions start at 1 rather than 0. Individual columns may be specified using commas, e.g. cut -c 1,3,5,7, or ranges can be specified using a dash: cut -c 2-8
This can be done with cut as Barton Chittenden points out with the addition of command substitution:
$ cut -c $(cat positions) data
abcfghi
The particular format for "positions" is not important.
I made the format of positions as expected by cut so no extra processing was required.
$ cat data
abcdefghijkl
$ cat positions
1-3,6-9
You can turn this into the cutter command by adding a function in your ~/.bashrc file
function cutter ()
{
$ cut -c $(cat "$1") "$2"
}
Run source ~/.bashrc then you can use cutter as required:
$ cutter positions data
abcfghi
Use redirection to store the output in a newfile:
$ cut -c $(cat positions) data > newfile
$ cutter positions data > newfile

Resources