Get comma separated values in shell script

Get comma separated values in shell script - linux

I have a file with multiple lines like the following string. I need to extract the values of id and obj.
{"Class":"ZONE","id":"DEV100.ZN301","name":"3109 E BTM","zgroup":"EAST FLOOR 3","prog":"","members":[{"obj":"DEV300.WC3"},{"obj":"DEV300.WC4"},{"obj":"DEV300.WC7"},{"obj":"DEV300.WC10"}]}
I am using the following command to obtain the output:
[user#server ~]$ cat file.txt | grep "\"Class\":\"ZONE\"" | while IFS="," read -r a b c d e f ; do echo "$b$f";done | grep ZN301
Output:
"id":"DEV100.ZN301""members":[{"obj":"DEV300.WC3"},{"obj":"DEV300.WC4"},{"obj":"DEV300.WC7"},{"obj":"DEV300.WC10"}]}
My aim is to get the following Output:
DEV100.ZN301 : DEV300.WC3 , DEV300.WC4 , DEV300.WC7 , DEV300.WC10
Please help me out. Thanks!

jq -r 'select(.Class == "ZONE") | (.id + " : " + ([.members[] | .obj] | join(" , ")))'
...or, leaning on a Python interpreter:
parse_json() {
python -c '
import sys, json
for line in sys.stdin:
line = line.strip() # ignore trailing newlines
if not line: continue # skip blank lines
doc = json.loads(line)
if doc.get("Class") != "ZONE":
continue
line_id = doc.get("id")
objs = [m.get("obj") for m in doc.get("members", [])]
sys.stdout.write("%s : %s\n" % (line_id, " , ".join(objs)))
'
}
parse_json

Related

How to transpose values and output results in new file

My data :
"1,2,3,4,5,64,3,9",,,,,1,aine
"2,3,4,5",,,,,3,bb
"3,4,5,6,6,2",,,,,2,ff
I have to transpose values inside "...." delimiter like this : how to transpose values two by two using shell?
and Output the result (2 columns) in a new file with the filename = (last-1) columns digits. I have to transpose for each lines of my input file.
What I would like :
$ ls
1 2 3 4 5 6 7 8
example : cat 1
1 2
3 4
5 64
3 9
cat 2 :
3 4
5 6
6 2
cat 3 :
2 3
4 5
Bonus : If I can get every last words (last columns) as title of new files It would be perfect.

Ok, it took a time but i finally solved your problem with the code below:
#!/bin/bash
while read -r LINE; do
FILE_NAME=$(echo {$LINE##*,,,,,} | cut -d ',' -f 1 | tr -d "\"")
DATA=$(echo ${LINE%%,,,,,*} | tr -d "\"" | tr "," " ")
touch $FILE_NAME
i=1
for num in $DATA ;do
echo -n "$num"
if [[ $(($i%2)) == 0 ]]; then
echo ""
else
echo -n " "
fi
i=$((i+1))
done > $FILE_NAME
done < input.txt
in my solution i imagine that your input should be placed in file input.txt and all of your input lines have ,,,,, as a separator. Works like a charm with your sample input.

Assuming there are no colons in the input (choose a different temporary delimiter if necessary) the first part can be done with:
awk '{s = ""; n = split($2,k,","); for(i = 1; i <= n; i+=2 ) { s = sprintf( "%s%c%s:%s", s, s ? ":" : "", k[i+1], k[i])} $2 = s}1' FS=\" OFS=\" input | sort -t , -k6n | tr : ,
eg:
$ cat input
"1,2,3,4,5,64,3,9",,,,,1,aine
"2,3,4,5",,,,,3,bb
"3,4,5,6,6,2",,,,,2,ff
$ awk '{s = ""; n = split($2,k,","); for(i = 1; i <= n; i+=2 ) { s = sprintf( "%s%c%s:%s", s, s ? ":" : "", k[i+1], k[i])} $2 = s}1' FS=\" OFS=\" input | sort -t , -k6n | tr : ,
"2,1,4,3,64,5,9,3",,,,,1,aine
"4,3,6,5,2,6",,,,,2,ff
"3,2,5,4",,,,,3,bb
But it's not clear why you want to do the first part at all when you can just skip straight to part 2 with:
awk '{n = split($2,k,","); m = split($3, j, ","); fname = j[6];
for( i = 1; i <= n; i+=2 ) printf("%d %d\n", k[i+1], k[i]) > fname}' FS=\" input
My answer can't keep up with the changes to the question! If you are outputting the lines into files, then there is no need to sort on the penultimate column. If you want the filenames to be the final column, it's not clear why you ever mentioned using the penultimate column at all. Just change fname in the above to j[7] to get the final column.

grep get text from specific section only from bash

I have the following config file:
[general]
a=b
b=c
...
mykey=myvalue
n=X
[prod]
a=b
b=c
mykey=myvalue2
...
I want to get mykey only from [general] section.
What I have been tried is the follow:
cat my.config | grep mykey
And as except I got two results:
mykey=myvalue
mykey=myvalue2
[general] section isn't always appear on the first part of the config file.
How can I get mykey that appear under [general] section using linux commands?

Here's one with awk:
$ awk -v RS="" ' # process empty line separated blocks
$1=="[general]" { # if a block starts with a key string
for(i=2;i<=NF;i++) # iterate records or fields in this case
if($i~/^mykey=/) { # find the key
print $i # and output the field
exit # once found, no point in continuing the search
}
}' file
Output:
mykey=myvalue

You can get the values between [general] and the next squared parameter.
awk '/^\[/{f=0} f; /\[general\]/{f=1}' file.config | grep mykey

you can use python script
ini2arr.py
#!/usr/bin/env python
import sys, ConfigParser
config = ConfigParser.ConfigParser()
config.readfp(sys.stdin)
for sec in config.sections():
print "declare -A %s" % (sec)
for key, val in config.items(sec):
print '%s[%s]="%s"' % (sec, key, val)
then
eval "$(cat t.ini | ./ini2arr.py)"
echo ${general["mykey"]}
EDIT OR:
#!/usr/bin/env python
import sys
import ConfigParser
section_filter = sys.argv[1]
key_filter = sys.argv[2]
config = ConfigParser.ConfigParser()
config.readfp(sys.stdin)
print '%s[%s]="%s"' % (section_filter, key_filter, config.get(section_filter, key_filter))
then
cat t.ini | ./ini2arr.py prod a

Here is another awk solution (using standard linux awk/gawk)
/\[general\]/,/^$/ {if ($0 ~ "mykey") print}
Explanation
/\[general\]/,/^$/ # match lines range : starting with "[general]" and ending with "" (empty line)
{ # for each line in range
if ($0 ~ "mykey") # if line match regex pattern "mykey"
print $0 # print the line
}

String Replace | Shell/bash file

I am trying to replace string using sed from sh file.
Issue: After 'connection' it has blank line and its '-url' string comes in the next line, in addition requires to replace port number and password string as well. Using sed I am not able to remove blank line after connection.
Original String:
connection
-url>jdbc:oracle:thin:#10.10.10.11\:1551/password1 /connection-url
Replace with:
connection-url>jdbc:oracle:thin:#10.10.10.90\:1555/password2 /connection-url
I tried below commands which didn't worked:
sed -i 's/connection[\t ]+/,/g' sed-script.sh
sed 's/\connection*-\connection*/-/g' sed-script.sh

Tested with GNU awk.
awk -v RS="\n+{n}" '{$1=$1} 1' Input_file
connection -url>jdbc:oracle:thin:#10.10.10.11\:1551/password1 /connection-url
Could you please try following once.
awk '/^connection/{val=$0;next} NF && /^-url/{print val $0;val=""}' Input_file
Output will be as follows.
connection-url>jdbc:oracle:thin:#10.10.10.11\:1551/password1 /connection-url

You can remove the blank line after 'connection' using tr.
echo <input string> | tr -d "\n"
Where we can see that we want the \n character by running the string through od -c:
0000000 c o n n e c t i o n \n \n - u r l
0000020 > j d b c : o r a c l e : t h i
0000040 n : # 1 0 . 1 0 . 1 0 . 1 1 \ :
0000060 1 5 5 1 / p a s s w o r d 1 /
0000100 c o n n e c t i o n - u r l \n

With sed -
sed -E '
/connection$/,/^-url/ {
/connection$/ { h; d; }
/^$/ d
/^-url/ { H; s/.*//; x; s/\n//g; }
}
' old > new
Assumes no stray whitespace, and that a connection on a line by itself should be followed by a line that starts with -url...

sed processes a line at a time by default; if you want to check whether an empty line follows another line, you have to write a sed script to implement that.
I would go with Awk or Perl instead for this particular task.
perl -p0777 -i -e 's/connection\n\n-url/connection-url/' file
awk '/^connection/ { c=1; next }
c && /^$/ { c++; next }
c && /^-url/ { $1="connection" $1; c=0 }
c { print "connection-";
while(--c) print "" }
1' file >file.new
Perl, like sed, has an -i option to replace the file in-place. GNU Awk has an extension to do the same (look for -i inplace) but it's not portable to lesser Awks.
The Perl -0777 option causes the whole file to be slurped into memory as a single "line", line feeds (\n) and all. If the file is very big, this will obviously be problematic.
The Awk script takes care to put back the lines it skipped if it turned out to be a false match after all.

Linux: Fix the output content from a file

Really need your help. I have a file which included data like (field:value) in one line
File.A
A:13 B:2 D:5 F:92 G:3 ...
I had created a file which include "A to Z".
File.B
A B C D E F G H I J ...
And trying to use bash script to get content and fix the output which will insert the miss line with 0 value.
A:13 B:2 C:0 D:5 E:0 F:92 G:3 H:0 ...
Think over two days.. but still not thing come out from my head. Is there any way I can solve it?

Let's make brace expansion work: {A..Z} expands as all the list of letters:
$ echo {A..Z}
A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
Then we can loop through all lines greping. In case it matches, we print the line; otherwise, we print letter:0.
for letter in {A..Z}
do
grep "^$letter" file || echo "$letter:0"
done
Test
$ for letter in {A..Z}; do grep "^$letter" file || echo "$letter:0"; done
A:13
B:2
C:0
D:5
E:0
F:92
G:3
H:0
I:0
J:0
K:0
L:0
M:0
N:0
O:0
P:0
Q:0
R:0
S:0
T:0
U:0
V:0
W:0
X:0
Y:0
Z:0
Now that you updated the question and the input file contains everything in the same line, you can use this grep to match:
grep -o "$word:[0-9]*" file
and then replace new lines with spaces:
$ for word in {A..Z}; do grep -o "$word:[0-9]*" file || echo "$word:0"; done | tr '\n' ' '
A:13 B:2 C:0 D:5 E:0 F:92 G:3 H:0 I:0 J:0 K:0 L:0 M:0 N:0 O:0 P:0 Q:0 R:0 S:0 T:0 U:0 V:0 W:0 X:0 Y:0 Z:0

If you fancy a bit of awk, you could try this:
awk -F: -vRS=" " '
{ c[$1] = $2 }
END{
for(i=65;i<91;++i){
a=sprintf("%c", i)
printf("%c:%d ",i,c[a])
}
}' A
where A is your file. The first block builds an array of all the values that have been set. Once all of the file has been read, the loop goes through the ascii values of A (65) to Z (90) and prints out the values that have been set in the array. The ones that are missing are printed as 0.
Output:
A:13 B:2 C:0 D:5 E:0 F:92 G:3 H:0 I:0 J:0 K:0 L:0 M:0 N:0 O:0 P:0 Q:0 R:0 S:0 T:0 U:0 V:0 W:0 X:0 Y:0 Z:0
Since everyone clearly can't get enough from my answer, here's another way you could do it, inspired by the {A..Z} range used in #fedorqui's answer:
awk -F: -vRS=" " '
NR==FNR { a[i++] = $1; next }
{ b[$1] = $2 }
END{for(i=0;i<length(a);++i)printf("%c:%d ",a[i],b[a[i]])}' - <<<$(echo {A..Z}) A
The first block reads in all the letters of the alphabet, thus reducing the need to know their character codes. The second block builds an array from your file A. Once the file has been read, All the values are printed out, resulting in the same output as above.

Pure Bash, no external processes. Print the match if the letter is found in the line or the letter followed by 0 otherwise.
read content < "$infile"
for letter in {A..Z}; do
if [[ $content =~ ${letter}:[[:digit:]]+ ]] ; then
echo "${BASH_REMATCH[0]}"
else
echo "${letter}:0"
fi
done
or shorter
for x in {A..Z}; do
[[ $content =~ ${x}:[0-9]+ ]] && echo "${BASH_REMATCH[0]}" || echo "${x}:0"
done

How to print all the files with the same prefix after searching for them?

I need to search through a directory which contains many sub directories, each which contain files. The files read as follows question1234_01, where 1234 are random digits and the suffix _01 is the number of messages that contain the prefix, meaning they are apart of the same continuing thread.
find . -name 'quest*' | cut -d_ -f1 | awk '{print $1}' | uniq -c | sort -n
example output:
1 quest1234
10 quest1523
This searches for all the files then sorts them in order.
What I want to do is print all the files which end up having the most occurrences, in my example the one with 10 matches.
So it should only output quest1523_01 through quest1523_11.

If I understood what you mean, and you want to get a list of items, sorted by frequency, you can pipe through something like:
| sort | uniq -c | sort -k1nr
Eg:
Input:
file1
file2
file1
file1
file3
file2
file2
file1
file4
Output:
4 file1
3 file2
1 file3
1 file4
Update
By the way, what are you using awk for?
find . -name 'quest*' | cut -d_ -f1 | sort | uniq -c | sort -k1nr | head -n10
Returns the 10 items found more often.
Update
Here it is a much improved version. Only drawback, it's not sorting by number of occurrences. However, I'm going to figure out how to fix it :)
find . -name 'question*' | sort \
| sed "s#\(.*/question\([0-9]\+\)_[0-9]\+\)#\2 \1#" \
| awk '{ cnt[$1]++; files[$1][NR] = $2 } END{for(i in files){ print i" ("cnt[i]")"; for (j in files[i]) { print " "files[i][j] } }}'
Update
After testing on ~1.4M records (it took 23''), I decided that awk was too inefficient to handle all the grouping stuff etc. so I wrote that in Python:
#!/usr/bin/env python
import sys, re
file_re = re.compile(r"(?P<name>.*/question(?P<id>[0-9]+)_[0-9]+)")
counts = {}
files = {}
if __name__ == '__main__':
for infile in sys.stdin:
infile = infile.strip()
m = file_re.match(infile)
_name = m.group('name')
_id = m.group('id')
if not _id in counts:
counts[_id] = 0
counts[_id]+=1
if not _id in files:
files[_id] = []
files[_id].append(_name)
## Calculate groups
grouped = {}
for k in counts:
if not counts[k] in grouped:
grouped[counts[k]] = []
grouped[counts[k]].append(k)
## Print results
for k, v in sorted(grouped.items()):
for fg in v:
print "%s (%s)" % (fg, counts[fg])
for f in sorted(files[fg]):
print " %s" % f
This one does all the job of splitting, grouping and sorting.
And it took just about 3'' to run on the same input file (with all the sorting thing added).
If you need even more speed, you could try compiling with Cython, that is usually at least 30% faster.
Update - Cython
Ok, I just tried with Cython.
Just save the above file as calculate2.pyx. In the same folder, create setup.py:
from distutils.core import setup
from distutils.extension import Extension
from Cython.Distutils import build_ext
setup(
cmdclass = {'build_ext': build_ext},
ext_modules = [Extension("calculate2", ["calculate2.pyx"])]
)
And a launcher script (I named it calculate2_run.py)
import calculate2
import sys
if __name__ == '__main__':
calculate2.runstuff(sys.stdin)
Then, make sure you have cython installed, and run:
python setup.py build_ext --inplace
That should generate, amongst other stuff, a calculate2.so file.
Now, use calculate2_run.py as you normally would (just pipe in the results from find).
I run it, without any further optimization, on the same input file: this time, it took 1.99''.

You can do something like this:
Save your initial search result in a temporary file.
Filter out the prefix with the highest file count
Search for the prefix in that temporary file, then remove the temporary file
.
find -name 'quest*' | sort -o tempf
target=$(awk -F_ '{print $1}' tempf\
| uniq -c | sort -n | tail -1\
| sed 's/[0-9]\+ //')
grep "$target" tempf
rm -f tempf
Note:
I assumed that files with the same prefix are in the same subdirectories.
The output contains the path relative to current directory. If you want just the basename, just do something like sed 's/.*\///' after the grep

your solution is not selecting the basename of the files, but I think you are looking for:
awk 'NF{ b=$(NF-1); v[b]=v[b] (v[b]?",":"") $NF; a = ++c[b]}
a > max {max = a; n=b }
END {split(v[b],d, ","); for(i in d) print b "_" d[i]}' FS='[/_]'
There's no need to sort the data; full sorting is very expensive.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Get comma separated values in shell script - linux

Related

How to transpose values and output results in new file

grep get text from specific section only from bash

String Replace | Shell/bash file

Linux: Fix the output content from a file

How to print all the files with the same prefix after searching for them?

Categories

Resources