summarize with bash

summarize with bash - linux

I have some data getting by SQL. And I need to summarize with bash without sql.
abc=`sudo -u postgres psql admin -t -c "select * from table"`
echo "$abc"
A
B
C
D
dt=`sudo -u postgres psql admin -t -c "slecet * from table"`
echo "$dt"
1
2
3
4
Also, there have contained files
1/x
1/y
2/y
2/z
3/z
4/x
4/y
4/z
and I want to output like this:
1 A | x, y
2 B | x, z
3 C | z
4 D | x, y, z
I trying some codes but I always failed
paste -s -d <(echo "$dt" | while read n; do echo -n " ";done) <(echo "$abc";done ) <(echo find .././ ;done)
A
x
y
z
B
x
y
z
C
x
y
z
D
1
2
3
4
so on..
you are free to ask question, if you need more info/details please ask.

Using GNU awk and expecting the data to be in files (file1 is the abc data, file2 the dt):
$ gawk '
ARGIND==1 { # process abc data
a[FNR]=$0 # hash to a
n++ # abc and dt expected to have
next # same number of records
}
ARGIND==2 { # process dt data
b[FNR]=$0 # hash to b
next
}
split($0,t,/\//) { # process the third set
c[t[1]]=c[t[1]] (c[t[1]]==""?"":", ") t[2] # hash to c appending
}
END { # in the end
for(i=1;i<=n;i++) # n is used
printf "%s %s | %s\n",b[i],a[i],c[i] # output
}' file1 file2 file3
Output:
1 A | x, y
2 B | y, z
3 C | z
4 D | x, y, z
If this is a continuous thing you might want to check out the pgsql extension to GNU awk.

Related

Do some calculation in a text file in shell

I have a text file:
$cat ifile.txt
this is a text file
assign x to 9 and y to 10.0702
define f(x)=x+y
I would like to disable the original line and divide the x-value by 2 and multiply the y-value by 2
My desired output is
$cat ofile.txt
this is a text file
#assign x to 9 and y to 10.0702
assign x to 5 and y to 20.1404
define f(x)=x+y
Here 5 is calculate from 9/2 and rounded to the next integer and 20.14 is calculated from 10.07x2 and not rounded
I am thinking of the following way, but can't write a script.
if [ line contains "assign x to" ]; then new_x_value=[next word]/2
if [ line contains "and y to" ]; then new_y_value=[next word]x2
if [ line contains "assign x to" ];
then disable it and add a line "assign x to new_x_value and y to new_y_value"

Would you please try the following:
#!/bin/bash
pat="(assign x to )([[:digit:]]+)( and y to )([[:digit:].]+)"
while IFS= read -r line; do
if [[ $line =~ $pat ]]; then
echo "#$line"
x2=$(echo "(${BASH_REMATCH[2]} + 1) / 2" | bc)
y2=$(echo "${BASH_REMATCH[4]} * 2" | bc)
echo "${BASH_REMATCH[1]}$x2${BASH_REMATCH[3]}$y2"
else
echo "$line"
fi
done < ifile.txt > ofile.txt
Output:
this is a text file
#assign x to 9 and y to 10.0702
assign x to 5 and y to 20.1404
define f(x)=x+y
The regex (assign x to )([[:digit:]]+)( and y to )([[:digit:].]+) matches
a literal string, followed by digits, followed by a literal string,
and followed by digits including decimal point.
The bc command (${BASH_REMATCH[2]} + 1) / 2 caclulates the ceiling
value of the input divided by 2.
The next bc command ${BASH_REMATCH[4]} * 2 multiplies the input by 2.
The reason I have picked bash is just because it supports back reference in regex and is easier to parse and reuse the input parameters than awk. As often pointed out, bash is not suitable for processing large files due to the performance reason. If you plan to large / multiple files, it will be recommended to use other languages like perl.
With perl you can say:
perl -pe 's|(assign x to )([0-9]+)( and y to )([0-9.]+)|
"#$&\n" . $1 . int(($2 + 1) / 2) . $3 . $4 * 2|ge' ifile.txt > ofile.txt
[EDIT]
If your ifile.txt looks like:
this is a text file
assign x to 9 and y to 10.0702 45
define f(x)=x+y
There are more than one space before the numbers.
One more value exists at the end (after whitespaces).
Then please try the following instead:
pat="(assign x to +)([[:digit:]]+)( and y to +)([[:digit:].]+)( +)([[:digit:].]+)"
while IFS= read -r line; do
if [[ $line =~ $pat ]]; then
echo "#$line"
x2=$(echo "(${BASH_REMATCH[2]} + 1) / 2" | bc)
y2=$(echo "${BASH_REMATCH[4]} * 2" | bc)
y3=$(echo "${BASH_REMATCH[6]} * 2" | bc)
echo "${BASH_REMATCH[1]}$x2${BASH_REMATCH[3]}$y2${BASH_REMATCH[5]}$y3"
else
echo "$line"
fi
done < ifile.txt > ofile.txt
Result:
this is a text file
#assign x to 9 and y to 10.0702 45
assign x to 5 and y to 20.1404 90
define f(x)=x+y
The plus sign after a whitespace is a regex quantifier and defines the number of repetition. In this case it matches one or more whitespace(s).

One in awk:
awk '
/assign/ { # when assign met in record
for(i=1;i<=NF-1;i++) # iterate from the beginning
if($i=="to" && $(i-1)=="x") # if to x
$(i+1)=((v=$(i+1)/2)>(u=int(v))?u+1:u) # ceil of division
else if($i=="to" && $(i-1)=="y") # if to y
$(i+1)*=2 # multiply by 2
}1' file # output
Output:
this is a text file
assign x to 5 and y to 20.1404
define f(x)=x+y
Sanity checking of the ceiling calculation left as homework...

awk '{if(match($0,/^assign/)){b=$0;split($0,a," ");a[8]=a[8]/2;a[4]=a[4]/2; for (x in a) {c = a[x] " " c; $0 = "#" b "\n" c } } { print }}'
Demo :
:>awk ' { if(match ($0, /^assign/)) {b=$0;split($0,a," ");a[8]=a[8]/2; a[4]=a[4]/2; for (x in a) {c = a[x] " " c; $0 = "#" b "\n" c } } { print }}' <ifile
this is a text file
#assign x to 9 and y to 10.0702
to x assign 5.0351 to y and 4.5
define f(x)=x+y
:>
Explanation:
awk ' {
if(match ($0, /^assign/)) <--- $0 is whole input record. ^ is start of line.
We are checking if record is starting with "assign"
{b=$0; <-- Assign input value to variable b
split($0,a," "); <-- Create a array by splitting input record with space as separator
a[8]=a[8]/2; a[4]=a[4]/2; <-- Divide value stored in 8 and 4 index
for (x in a) <-- Loop for getting all values of array
{c = a[x] " " c; <-- Create a variable by concatenating values of a
$0 = "#" b "\n" c <-- Update value of current record. "\n" new line operator
} }
{ print }}'

Get comma separated values in shell script

I have a file with multiple lines like the following string. I need to extract the values of id and obj.
{"Class":"ZONE","id":"DEV100.ZN301","name":"3109 E BTM","zgroup":"EAST FLOOR 3","prog":"","members":[{"obj":"DEV300.WC3"},{"obj":"DEV300.WC4"},{"obj":"DEV300.WC7"},{"obj":"DEV300.WC10"}]}
I am using the following command to obtain the output:
[user#server ~]$ cat file.txt | grep "\"Class\":\"ZONE\"" | while IFS="," read -r a b c d e f ; do echo "$b$f";done | grep ZN301
Output:
"id":"DEV100.ZN301""members":[{"obj":"DEV300.WC3"},{"obj":"DEV300.WC4"},{"obj":"DEV300.WC7"},{"obj":"DEV300.WC10"}]}
My aim is to get the following Output:
DEV100.ZN301 : DEV300.WC3 , DEV300.WC4 , DEV300.WC7 , DEV300.WC10
Please help me out. Thanks!

jq -r 'select(.Class == "ZONE") | (.id + " : " + ([.members[] | .obj] | join(" , ")))'
...or, leaning on a Python interpreter:
parse_json() {
python -c '
import sys, json
for line in sys.stdin:
line = line.strip() # ignore trailing newlines
if not line: continue # skip blank lines
doc = json.loads(line)
if doc.get("Class") != "ZONE":
continue
line_id = doc.get("id")
objs = [m.get("obj") for m in doc.get("members", [])]
sys.stdout.write("%s : %s\n" % (line_id, " , ".join(objs)))
'
}
parse_json

How to add column indices in bash

I have a text file with a number of rows and columns like this:
a b c d ...
e f g h ...
i j k l ...
...
I want to add column indices for each entry with the output look like this
1:a 2:b 3:c 4:d ...
1:e 2:f 3:g 4:h ...
1:i 2:j 3:k 4:l ...
...
I am wondering if there is a simple way to realize this in bash. Thanks!

With awk:
awk '{for (i=1;i<=NF;i++){printf i":"$i" "};printf "\n"}' file
Output:
1:a 2:b 3:c 4:d 5:...
1:e 2:f 3:g 4:h 5:...
1:i 2:j 3:k 4:l 5:...
1:...

With perl:
perl -lane '$, = " "; print map { (1 + $_) . ":$F[$_]" } 0 .. $#F' file
# or
perl -lane '$, = " "; $i = 1 ; print map { $i++ . ":$_" } #F' file

Search word from certain line GREP

File type = "ooTextFile"
Object class = "TextGrid"
xmin = 0
xmax = 82.7959410430839
tiers? <exists>
size = 1
item []:
item [1]:
class = "IntervalTier"
name = "ortho"
xmin = 0
xmax = 82.7959410430839
intervals: size = 6
intervals [1]:a
xmin = 0
xmax = 15.393970521541949
text = "Aj tento rok organizuje Rádio Sud piva. Kto chce súťažiť, nemusí sa nikde registrovať.
intervals [2]:
xmin = 15.393970521541949
xmax = 27.58997052154195
.
.
.
Hi I am working with hundreds of text files like this.
I want to filter all values xmin=... from this text file but only from 16th line because at the start there are xmins which are useless as you can see.
I tried:
cat text.txt | grep xmin
but it shows all lines where xmin is.
Please help me. I can't modify text files because I need to work with hundreds of them so I have to design suitable way how to filter them.

Like this:
awk 'FNR>15 && /xmin/' file*
xmin = 0
xmin = 15.393970521541949
It show all xmin from line 16
You can also print file name of the found xmin
awk 'FNR>15 && /xmin/ {$1=$1;print FILENAME" -> "$0}' file*
file22 -> xmin = 0
file22 -> xmin = 15.393970521541949
Update: Need to be FNR to work with multiple files.

Using sed and grep to look for "xmin" from 16th line till the end of a single file:
sed -n '16,$p' foobar.txt | grep "xmin"
In case of multiple files here is a bash script to get the output:
#!/bin/bash
for file in "$1"/*; do
output=$(sed -n '16,$p' "$file" | grep "xmin")
if [[ -n $output ]]; then
echo -e "$file has follwoing entries:\n$output"; fi; done
Run the script as bash script.sh /directory/containing/the/files/to/be/searched

Extracting several rows with overlap using awk

I have a big file that looks like this (it has actually 12368 rows):
Header
175566717.000
175570730.000
175590376.000
175591966.000
175608932.000
175612924.000
175614836.000
.
.
.
175680016.000
175689679.000
175695803.000
175696330.000
What I want to do is, delete the header, then extract the first 2000 lines (line 1 to 2000), then extract the lines 1500 to 3500, then 3000 to 5000 and so on...
What I mean is: extract a window of 2000 lines with an overlap of 500 lines between contiguous windows until the end of the file.
From a previous post, I got this:
tail -n +2 myfile.txt | awk 'BEGIN{file=1} ++count && count==2000 {print > "window"file; file++; count=500} {print > "window"file}'
But that isn't what I want. I don't have the 500 lines overlap and my first window has 1999 rows instead of 2000.
Any help would be appreciated

awk -v i=1 -v t=2000 -v d=500 'NR>1{a[NR-1]=$0}
END{while(i<NR-1){for(k=i;k<i+t;k++)print a[k] > i".txt"; close(i".txt");i=i+t-d}}' file
try above line, you could change the numbers to fit your new requirement. you can define your own filenames too.
little test with t=10 (your 2000) and d=5 (your 500)
kent$ cat f
header
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
kent$ awk -v i=1 -v t=10 -v d=5 'NR>1{a[NR-1]=$0}END{while(i<NR-1){for(k=i;k<i+t;k++)print a[k] > i".txt"; close(i".txt");i=i+t-d}}' f
kent$ head *.txt
==> 1.txt <==
1
2
3
4
5
6
7
8
9
10
==> 6.txt <==
6
7
8
9
10
11
12
13
14
15
==> 11.txt <==
11
12
13
14
15

awk is not ideal for this. In Python you could do something like
with open("data") as fin:
lines = fin.readlines()
# remove header
lines = lines[1:]
# print the lines
i = 0
while True:
print "\n starting window"
if len(lines) < i+3000:
# we're done. whatever is left in the file will be ignored
break
for line in lines[i:i+3000]:
print line[:-1] # remove \n
i += 3000 - 500

Reading the entire file into memory is usually not a great idea, and in this case is not necessary. Given a line number, you can easily compute which files it should go into. For example:
awk '{
a = int( NR / (t-d));
b = int( (NR-t) / (t-d)) ;
for( f = b; f<=a; f++ ) {
if( f >= 0 && (f * (t-d)) < NR && ( NR <= f *(t-d) + t))
print > ("window"(f+1))
}
}' t=2000 d=500

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

summarize with bash - linux

Related

Do some calculation in a text file in shell

Get comma separated values in shell script

How to add column indices in bash

Search word from certain line GREP

Extracting several rows with overlap using awk

Categories

Resources