Extracting only some keys from a JSON file in Bash

Extracting only some keys from a JSON file in Bash - linux

So, I have a JSON file that looks like this ;
{
"data": { stuff that i need }
"stuff not needed": {more stuff not needed}
"data": {more stuff that i need}
}
In short, the stuff that I need is inside the curly braces of the "data" key. How can I print this in a Linux shell command? Note, there are several "data" objects in my file, and I would like to extract data from all of them each one at a time.
The intended output
would be like this
data {...}
data {...}

as others suggested you should really use jq tool for parsing json format. However if you don't have access to the tool and/or can't install it, below's a very simple way treating the json as raw text (not recommended) and producing the output you want :
grep "\"data\":" json_file | tr -d \"

You can very simply use awk with the field-separator of "{" and the substr and length($2) - 1 to trim the closing "}".
For example with your data:
$ awk -F"{" '/^[ ]*"data"/{print substr($2, 1, length($2)-1)}' json
stuff that i need
more stuff that i need
(note: you can trim the leading space before "stuff" in the 1st line if needed)
Quick Explanation
awk -F"{" invoke awk with a field-separator of '{',
/^[ ]*"data"/ locate only lines beginning with zero-or-more spaces followed by "data",
print substr($2, 1, length($2)-1) print the substring of the 2nd field from the first character to the length-1 character removing the closing '}'.
bash Solution
With bash you can loop over each line looking for a line beginning with "data" and then use a couple of simple parameter expansions to remove the unwanted parts of the line from each end. For instance:
$ while read -r line; do
[[ $line =~ ^\ *\"data\" ]] && {
line="${line#*\{}"
line="${line%\}*}"
echo $line
}
done <json
(With your data in the json filename, you can just copy/paste into a terminal)
Example Use/Output
$ while read -r line; do
> [[ $line =~ ^\ *\"data\" ]] && {
> line="${line#*\{}"
> line="${line%\}*}"
> echo $line
> }
> done <json
stuff that i need
more stuff that i need
(note: bash default word splitting even handles the leading whitespace for you)
While you can do it with awk and bash, any serious JSON manipulation should be done with the jq utility.

With the given input, you can use
sed -rn 's/.*"(data)": (.*)/\1 \2/p' inputfile

Related

Change case of first word of each line

From command line, how to change to uppercase each first word of a line in a text file?
Example input:
hello world
tell me who you are!
Example output:
HELLO world
TELL me who you are!
There are no empty lines, it's ASCII, and each line starts with an alphabetic word followed by a tab.
Tools to use: anything that works on command line on macOS (bash 3.2, BSD sed, awk, tr, perl 5, python 2.7, swift 4, etc.).

You can always just use bash case conversion and a while loop to accomplish what you intend, e.g.
$ while read -r a b; do echo "${a^^} $b"; done < file
HELLO world
HOW are you?
The parameter expansion ${var^^} converts all chars in var to uppercase, ${var^} converts the first letter.
Bash 3.2 - 'tr'
For earlier bash, you can use the same setup with tr with a herestring to handle the case conversion:
$ while read -r a b; do echo "$(tr [a-z] [A-Z] <<<"$a") $b"; done file
HELLO world
HOW are you?
Preserving \t Characters
To preserve the tab separated words, you have to prevent word-splitting during the read. Unfortunately, the -d option to read doesn't allow termination on a set of characters. A way around checking for both spaces or tab delimited words is the read the entire line disabling word-splitting with IFS= and then scanning forward through the line until the first literal $' ' or $'\t' is found. (the literals are bash-only, not POSIX shell) A simple implementation would be:
while IFS= read -r line; do
word=
ct=0
for ((i = 0; i < ${#line}; i++)); do
ct=$i
## check against literal 'space' or 'tab'
[ "${line:$i:1}" = $' ' -o "${line:$i:1}" = $'\t' ] && break
word="${word}${line:$i:1}"
done
word="$(tr [a-z] [A-Z] <<<"$word")"
echo "${word}${line:$((ct))}"
done <file
Output of tab Separated Words
HELLO world
HOW are you?

Use awk one-liner:
awk -F$'\t' -v OFS=$'\t' '{ $1 = toupper($1) }1' file

Using GNU sed:
sed 's/^\S*/\U&/g' file
where \S matches a non-whitespace character and \U& uppercases the matched pattern
UPDATE: in case of BSD sed since it does not support most of those special characters it is still doable but requires a much longer expression
sed -f script file
where the script contains
{
h
s/ .*//
y/abcdefghijklmnopqrstuvwxyz/ABCDEFGHIJKLMNOPQRSTUVWXYZ/
G
s/\(.*\)\n[^ ]* \(.*\)/\1 \2/
}

Bash regexp to find part of string

I have a string like setSuperValue('sdfsdfd') and I need to get the 'sdfsdfd' value from this line. What is way to do this?
First I find line by setSuperValue and then get only string with my target content - setSuperValue('sdfsdfd'). How do I build a regexp to get sdfsdfd from this line?

This should help you
grep setSuperValue myfile.txt | grep -o "'. *'" | tr -d "'"
The grep -o will return all text that start with a single ' and ends with another ', including both quotes. Then use tr to get rid of the quotes.
You could also use cut:
grep setSuperValue myfile.txt | cut -d"'" -f2
Or awk:
grep setSuperValue myfile.txt | awk -F "'" '{print $2}'
This will split the line where the single quotes are and return the second value, that is what you are looking for.

Generally, to locate a string in multiple lines of data, external utilities will be much faster than looping over lines in Bash.
In your specific case, a single sed command will do what you want:
sed -n -r "s/^.*setSuperValue\('([^']+)'\).*$/\1/p" file
Extended (-r) regular expression ^.*setSuperValue\('([^']+)'\).*$ matches any line containing setSuperValue('...') as a whole, captures whatever ... is in capture group \1, replaces the input line with that, and prints p the result.
Due to option -n, nothing else is printed.
Move the opening and closing ' inside (...) to include them in the captured value.
Note: If the input file contains multiple setSuperValue('...') lines, the command will print every match; either way, the command will process all lines.
To only print the 1st match and stop processing immediately after, modify the command as follows:
sed -n -r "/^.*setSuperValue\('([^']+)'\).*$/ {s//\1/;p;q}" file
/.../ only matches lines containing setSuperValue('...'), causing the following {...} to be executed only for matching lines.
s// - i.e., not specifying a regex - implicitly performs substitution based on the same regex that matched the line at hand; p prints the result, and q quits processing altogether, meaning that processing stops once the fist match was found.
If you have already located a line of interest through other methods and are looking for a pure Bash method of extracting a substring based on a regex, use =~, Bash's regex-matching operator, which supports extended regular expressions and capture groups through the special ${BASH_REMATCH[#]} array variable:
$ sampleLine="... setSuperValue('sdfsdfd') ..."
$ [[ $sampleLine =~ "setSuperValue('"([^\']+)"')" ]] && echo "${BASH_REMATCH[1]}"
sdfsdfd
Note the careful quoting of the parts of the regex that should be taken literally, and how ${BASH_REMATCH[1]} refers to the first (and only) captured group.

You can parse the value from the line, using parameter expansion/substring removal without relying on any external tools:
#!/bin/bash
while read -r line; do
value=$(expr "$line" : ".*setSuperValue('\(.*\)')")
if [ "x$value" != "x" ]; then
printf "value : %s\n" "$value"
fi
done <"$1"
Test Input
$ cat dat/supervalue.txt
setSuperValue('sdfsdfd')
something else
setSuperValue('sdfsdfd')
something else
setSuperValue('sdfsdfd')
something else
Example Output
$ bash parsevalue.sh dat/supervalue.txt
value : sdfsdfd
value : sdfsdfd
value : sdfsdfd

Split string at special character in bash

I'm reading filenames from a textfile line by line in a bash script. However the the lines look like this:
/path/to/myfile1.txt 1
/path/to/myfile2.txt 2
/path/to/myfile3.txt 3
...
/path/to/myfile20.txt 20
So there is a second column containing an integer number speparated by space. I only need the part of the string before the space.
I found only solutions using a "for-loop". But I need a function that explicitly looks for the " "-character (space) in my string and splits it at that point.
In principle I need the equivalent to Matlabs "strsplit(str,delimiter)"

If you are already reading the file with something like
while read -r line; do
(and you should be), then pass two arguments to read instead:
while read -r filename somenumber; do
read will split the line on whitespace and assign the first field to filename and any remaining field(s) to somenumber.

Three (of many) solutions:
# Using awk
echo "$string" | awk '{ print $1 }'
# Using cut
echo "$string" | cut -d' ' -f1
# Using sed
echo "$string" | sed 's/\s.*$//g'

If you need to iterate trough each line of the file anyways, you can cut off everything behind the space with bash:
while read -r line ; do
# bash string manipulation removes the space at the end
# and everything which follows it
echo ${line// *}
done < file

This should work too:
line="${line% *}"
This cuts the string at it's last occurrence (from left) of a space. So it will work even if the path contains spaces (as long as it follows by a space at end).

while read -r line
do
{ rev | cut -d' ' -f2- | rev >> result.txt; } <<< $line
done < input.txt
This solution will work even if you have spaces in your filenames.

adding double quotes, commas and removing newlines

I have a file that have a list of integers:
12542
58696
78845
87855
...
I want to change them into:
"12542", "58696", "78845", "87855", "..."
(no comma at the end)
I believe I need to use sed but couldnt figure it out how. Appreciate your help.

You could do a sed multiline trick, but the easy way is to take advantage of shell expansion:
echo $(sed '$ ! s/.*/"&",/; $ s/.*/"&"/' foo.txt)
Run echo $(cat file) to see why this works. The trick, in a nutshell, is that the result of cat is parsed into tokens and interpreted as individual arguments to echo, which prints them separated by spaces.
The sed expression reads
$ ! s/.*/"&",/
$ s/.*/"&"/
...which means: For all but the last line ($ !) replace the line with "line",, and for the last line, with "line".
EDIT: In the event that the file contains not just a line of integers like in OP's case (when the file can contain characters the shell expands), the following works:
EDIT2: Nicer code for the general case.
sed -n 's/.*/"&"/; $! s/$/,/; 1 h; 1 ! H; $ { x; s/\n/ /g; p; }' foo.txt
Explanation: Written in a more readable fashion, the sed script is
s/.*/"&"/
$! s/$/,/
1 h
1! H
$ {
x
s/\n/ /g
p
}
What this means is:
s/.*/"&"/
Wrap every line in double quotes.
$! s/$/,/
If it isn't the last line, append a comma
1 h
1! H
If it is the first line, overwrite the hold buffer with the result of the previous transformation(s), otherwise append it to the hold buffer.
$ {
x
s/\n/ /g
p
}
If it is the last line -- at this point the hold buffer contains the whole line wrapped in double quotes with commas where appropriate -- swap the hold buffer with the pattern space, replace newlines with spaces, and print the result.

Here is the solution,
sed 's/.*/ "&"/' input-file|tr '\n' ','|rev | cut -c 2- | rev|sed 's/^.//'
First change your input text line in quotes
sed 's/.*/ "&"/' input-file
Then, this will convert your new line to commas
tr '\n' ',' <your-inputfile>
The last commands including rev, cut and sed are used for formatting the output according to requirement.
Where,
rev is reversing string.
cut is removing trailing comma from output.
sed is removing the first character in the string to formatting it accordingly.
Output:

With perl without any pipes/forks :
perl -0ne 'print join(", ", map { "\042$_\042" } split), "\n"' file
OUTPUT:
"12542", "58696", "78845", "87855"

Here's a pure Bash (Bash≥4) possibility that reads the whole file in memory, so it won't be good for huge files:
mapfile -t ary < file
((${#ary[#]})) && printf '"%s"' "${ary[0]}"
((${#ary[#]}>1)) && printf ', "%s"' "${ary[#]:1}"
printf '\n'
For huge files, this awk seems ok (and will be rather fast):
awk '{if(NR>1) printf ", ";printf("\"%s\"",$0)} END {print ""}' file

One way, using sed:
sed ':a; N; $!ba; s/\n/", "/g; s/.*/"&"/' file
Results:
"12542", "58696", "78845", "87855", "..."

You can write the column oriented values in a row with no comma following the last as follows:
cnt=0
while read -r line || test -n "$line" ; do
[ "$cnt" = "0" ] && printf "\"%s\"" "$line"
printf ", \"%s\"" "$line"
cnt=$((cnt + 1))
done
printf "\n"
output:
$ bash col2row.sh dat/ncol.txt
"12542", "12542", "58696", "78845", "87855"

A simplified awk solution:
awk '{ printf sep "\"%s\"", $0; sep=", " }' file
Takes advantage of uninitialized variables defaulting to an empty string in a string context (sep).
sep "\"%s\"" synthesizes the format string to use with printf by concatenating sep with \"%s\". The resulting format string is applied to $0, each input line.
Since sep is only initialized after the first input record, , is effectively only inserted between output elements.

lowercase + capitalize + concatenate words of a string in shell (e.g. bash)

How to capitalize+concatenate words of a string?
(first letter uppercase and all other other letters lowercase)
example:
input = "jAMeS bOnD"
output = "JamesBond"

String manipulation available in bash version 4:
${variable,,} to lowercase all letters
${variable^} to uppercase first letter of each word
use ${words[*]^} instead of ${words[#]^} to save some script lines
And other improvements from mklement0 (see his comments):
Variable names in lower-case because upper-case ones may conflict with environment variables
Give meaningful names to variables (e.g. ARRAY -> words)
Use local to avoid impacting IFS outside the function (once is enougth)
Use local for all other local variables ( variable can be first declared, and later assigned)
ARRAY=( $LOWERCASE ) may expands globs (filename wildcards)
temporarily disable Pathname Expansion using set -f or shopt -so noglob
or use read -ra words <<< "$input" instead of words=( $input )
Ultimate function:
capitalize_remove_spaces()
{
local words IFS
read -ra words <<< "${#,,}"
IFS=''
echo "${words[*]^}"
}
If you want to keep alphanumeric characters only, extends the IFS built-in variable just before the read -ra words operation:
capitalize_remove_punctuation()
{
local words IFS=$' \t\n-\'.,;!:*?' #Handle hyphenated names and punctuation
read -ra words <<< "${#,,}"
IFS=''
echo "${words[*]^}"
}
Examples:
> capitalize_remove_spaces 'jAMeS bOnD'
JamesBond
> capitalize_remove_spaces 'jAMeS bOnD *'
JamesBond*
> capitalize_remove_spaces 'Jean-luc GRAND-PIERRE'
Jean-lucGrand-pierre
> capitalize_remove_punctuation 'Jean-luc GRAND-PIERRE'
JeanLucGrandPierre
> capitalize_remove_punctuation 'Jean-luc GRAND-PIERRE *'
JeanLucGrandPierre

Here's a bash 3+ solution that utilizes tr for case conversion (the case conversion operators (,, ^, ...) were introduced in bash 4):
input="jAMeS bOnD"
read -ra words <<<"$input" # split input into an array of words
output="" # initialize output variable
for word in "${words[#]}"; do # loop over all words
# add capitalized 1st letter
output+="$(tr '[:lower:]' '[:upper:]' <<<"${word:0:1}")"
# add lowercase version of rest of word
output+="$(tr '[:upper:]' '[:lower:]' <<<"${word:1}")"
done
Note:
Concatenation (removal of whitespace between words) happens implicitly by always directly appending to the output variable.
It's tempting to want to use words=( $input ) to split the input string into an array of words, but there's a gotcha: the string is subject to pathname expansion, so if a word happens to be a valid glob (e.g., *), it will be expanded (replaced with matching filenames), which is undesired; using read -ra to create the array avoids this problem (-a reads into an array, -r turns off interpretation of \ chars. in the input).

From other posts, I came up with this working script:
str="jAMeS bOnD"
res=""
split=`echo $str | sed -e 's/ /\n/g'` # Split with space as delimiter
for word in $split; do
word=${word,,} # Lowercase
word=${word^} # Uppercase first letter
res=$res$word # Concatenate result
done
echo $res
References:
Converting string to lower case in Bash shell scripting
How do I split a string on a delimiter in Bash?
Troubleshooting bash script to capitalize first letter in every word

Using awk it is little verbose but does the job::
s="jAMeS bOnD"
awk '{for (i=1; i<=NF; i++)
printf toupper(substr($i, 1, 1)) tolower(substr($i,2)); print ""}' <<< "$s"
JamesBond

echo -e '\n' "!!!!! PERMISSION to WRITE in /var/log/ DENIED !!!!!"
echo -e '\n'
echo "Do you want to continue?"
echo -e '\n' "Yes or No"
read -p "Please Respond_: " Response #get input from keyboard "yes/no"
#Capitalizing 'yes/no' with # echo $Response | awk '{print toupper($0)}' or echo $Response | tr [a-z] [A-Z]
answer=$(echo $Response | awk '{print toupper($0)}')
case $answer in
NO)
echo -e '\n' "Quitting..."
exit 1
;;
YES)
echo -e '\n' "Proceeding..."
;;
esac

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Extracting only some keys from a JSON file in Bash - linux

as others suggested you should really use jq tool for parsing json format. However if you don't have access to the tool and/or can't install it, below's a very simple way treating the json as raw text (not recommended) and producing the output you want : grep "\"data\":" json_file | tr -d \"

With the given input, you can use sed -rn 's/."(data)": (.)/\1 \2/p' inputfile

Related

Change case of first word of each line

Bash regexp to find part of string

Split string at special character in bash

adding double quotes, commas and removing newlines

lowercase + capitalize + concatenate words of a string in shell (e.g. bash)

Categories

Resources

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Extracting only some keys from a JSON file in Bash - linux

as others suggested you should really use jq tool for parsing json format. However if you don't have access to the tool and/or can't install it, below's a very simple way treating the json as raw text (not recommended) and producing the output you want : grep "\"data\":" json_file | tr -d \"

With the given input, you can use sed -rn 's/.*"(data)": (.*)/\1 \2/p' inputfile

Related

Change case of first word of each line

Bash regexp to find part of string

Split string at special character in bash

adding double quotes, commas and removing newlines

lowercase + capitalize + concatenate words of a string in shell (e.g. bash)

Categories

Resources

With the given input, you can use sed -rn 's/."(data)": (.)/\1 \2/p' inputfile