Bash regexp to find part of string - linux

I have a string like setSuperValue('sdfsdfd') and I need to get the 'sdfsdfd' value from this line. What is way to do this?
First I find line by setSuperValue and then get only string with my target content - setSuperValue('sdfsdfd'). How do I build a regexp to get sdfsdfd from this line?

This should help you
grep setSuperValue myfile.txt | grep -o "'. *'" | tr -d "'"
The grep -o will return all text that start with a single ' and ends with another ', including both quotes. Then use tr to get rid of the quotes.
You could also use cut:
grep setSuperValue myfile.txt | cut -d"'" -f2
Or awk:
grep setSuperValue myfile.txt | awk -F "'" '{print $2}'
This will split the line where the single quotes are and return the second value, that is what you are looking for.

Generally, to locate a string in multiple lines of data, external utilities will be much faster than looping over lines in Bash.
In your specific case, a single sed command will do what you want:
sed -n -r "s/^.*setSuperValue\('([^']+)'\).*$/\1/p" file
Extended (-r) regular expression ^.*setSuperValue\('([^']+)'\).*$ matches any line containing setSuperValue('...') as a whole, captures whatever ... is in capture group \1, replaces the input line with that, and prints p the result.
Due to option -n, nothing else is printed.
Move the opening and closing ' inside (...) to include them in the captured value.
Note: If the input file contains multiple setSuperValue('...') lines, the command will print every match; either way, the command will process all lines.
To only print the 1st match and stop processing immediately after, modify the command as follows:
sed -n -r "/^.*setSuperValue\('([^']+)'\).*$/ {s//\1/;p;q}" file
/.../ only matches lines containing setSuperValue('...'), causing the following {...} to be executed only for matching lines.
s// - i.e., not specifying a regex - implicitly performs substitution based on the same regex that matched the line at hand; p prints the result, and q quits processing altogether, meaning that processing stops once the fist match was found.
If you have already located a line of interest through other methods and are looking for a pure Bash method of extracting a substring based on a regex, use =~, Bash's regex-matching operator, which supports extended regular expressions and capture groups through the special ${BASH_REMATCH[#]} array variable:
$ sampleLine="... setSuperValue('sdfsdfd') ..."
$ [[ $sampleLine =~ "setSuperValue('"([^\']+)"')" ]] && echo "${BASH_REMATCH[1]}"
sdfsdfd
Note the careful quoting of the parts of the regex that should be taken literally, and how ${BASH_REMATCH[1]} refers to the first (and only) captured group.

You can parse the value from the line, using parameter expansion/substring removal without relying on any external tools:
#!/bin/bash
while read -r line; do
value=$(expr "$line" : ".*setSuperValue('\(.*\)')")
if [ "x$value" != "x" ]; then
printf "value : %s\n" "$value"
fi
done <"$1"
Test Input
$ cat dat/supervalue.txt
setSuperValue('sdfsdfd')
something else
setSuperValue('sdfsdfd')
something else
setSuperValue('sdfsdfd')
something else
Example Output
$ bash parsevalue.sh dat/supervalue.txt
value : sdfsdfd
value : sdfsdfd
value : sdfsdfd

Related

Extracting only some keys from a JSON file in Bash

So, I have a JSON file that looks like this ;
{
"data": { stuff that i need }
"stuff not needed": {more stuff not needed}
"data": {more stuff that i need}
}
In short, the stuff that I need is inside the curly braces of the "data" key. How can I print this in a Linux shell command? Note, there are several "data" objects in my file, and I would like to extract data from all of them each one at a time.
The intended output
would be like this
data {...}
data {...}
as others suggested you should really use jq tool for parsing json format. However if you don't have access to the tool and/or can't install it, below's a very simple way treating the json as raw text (not recommended) and producing the output you want :
grep "\"data\":" json_file | tr -d \"
You can very simply use awk with the field-separator of "{" and the substr and length($2) - 1 to trim the closing "}".
For example with your data:
$ awk -F"{" '/^[ ]*"data"/{print substr($2, 1, length($2)-1)}' json
stuff that i need
more stuff that i need
(note: you can trim the leading space before "stuff" in the 1st line if needed)
Quick Explanation
awk -F"{" invoke awk with a field-separator of '{',
/^[ ]*"data"/ locate only lines beginning with zero-or-more spaces followed by "data",
print substr($2, 1, length($2)-1) print the substring of the 2nd field from the first character to the length-1 character removing the closing '}'.
bash Solution
With bash you can loop over each line looking for a line beginning with "data" and then use a couple of simple parameter expansions to remove the unwanted parts of the line from each end. For instance:
$ while read -r line; do
[[ $line =~ ^\ *\"data\" ]] && {
line="${line#*\{}"
line="${line%\}*}"
echo $line
}
done <json
(With your data in the json filename, you can just copy/paste into a terminal)
Example Use/Output
$ while read -r line; do
> [[ $line =~ ^\ *\"data\" ]] && {
> line="${line#*\{}"
> line="${line%\}*}"
> echo $line
> }
> done <json
stuff that i need
more stuff that i need
(note: bash default word splitting even handles the leading whitespace for you)
While you can do it with awk and bash, any serious JSON manipulation should be done with the jq utility.
With the given input, you can use
sed -rn 's/.*"(data)": (.*)/\1 \2/p' inputfile

search a line that contain a special character using sed or awk

I wonder if there is a command in Linux that can help me to find a line that begins with "*" and contains the special character "|"
for example
* Date | Auteurs
Simply use:
grep -ne '^\*.*|' "${filename}"
Or if you want to use sed:
sed -n '/^\*.*|/{=;p}' "${filename}" | sed '{N;s/\n/:/}'
Or (gnu) awk equivalent (require to backslash the pipe):
awk '/^\*.*\|/' "${filename}"
Where:
^ : start of the line
\*: a literal *
.*: zero or more generic char (not newline)
| : a literal pipe
NB: "${filename}": i've assumed you're using the command in a script with the target file passed in a double quoted variable as "${filename}". In the shell simply use the actual name of the file (or the path to it).
UPDATE (line numbers)
Modify the above commands to obtain also the line number of the matched lines. With grep is simple as to add -n switch:
grep -ne '^\*.*|' "${filename}"
We obtain an output like this:
81806:* Date | Auteurs
To obtain exactly the same output from sed and awk we have to complicate the commands a little bit:
awk '/^\*.*\|/{print NR ":" $0}' "${filename}"
# the = print the line number, p the actual match but it's on two different lines so the second sed call
sed -n '/^\*.*|/{=;p}' "${filename}" | sed '{N;s/\n/:/}'

Split string at special character in bash

I'm reading filenames from a textfile line by line in a bash script. However the the lines look like this:
/path/to/myfile1.txt 1
/path/to/myfile2.txt 2
/path/to/myfile3.txt 3
...
/path/to/myfile20.txt 20
So there is a second column containing an integer number speparated by space. I only need the part of the string before the space.
I found only solutions using a "for-loop". But I need a function that explicitly looks for the " "-character (space) in my string and splits it at that point.
In principle I need the equivalent to Matlabs "strsplit(str,delimiter)"
If you are already reading the file with something like
while read -r line; do
(and you should be), then pass two arguments to read instead:
while read -r filename somenumber; do
read will split the line on whitespace and assign the first field to filename and any remaining field(s) to somenumber.
Three (of many) solutions:
# Using awk
echo "$string" | awk '{ print $1 }'
# Using cut
echo "$string" | cut -d' ' -f1
# Using sed
echo "$string" | sed 's/\s.*$//g'
If you need to iterate trough each line of the file anyways, you can cut off everything behind the space with bash:
while read -r line ; do
# bash string manipulation removes the space at the end
# and everything which follows it
echo ${line// *}
done < file
This should work too:
line="${line% *}"
This cuts the string at it's last occurrence (from left) of a space. So it will work even if the path contains spaces (as long as it follows by a space at end).
while read -r line
do
{ rev | cut -d' ' -f2- | rev >> result.txt; } <<< $line
done < input.txt
This solution will work even if you have spaces in your filenames.

Replace spaces with underscores via BASH

Suppose i have a string, $str.
I want $str to be edited such that all the spaces in it are replaced by underscores.
Example
a="hello world"
I want the final output of
echo "$a"
to be hello_world
You could try the following:
str="${str// /_}"
$ a="hello world"
$ echo ${a// /_}
hello_world
According to bash(1):
${parameter/pattern/string}
Pattern substitution. The pattern is expanded to produce a pattern
just as in pathname expansion. Parameter is expanded and the
longest match of pattern against its value is replaced with string.
If pattern begins with /, all matches of pattern are replaced
with string. Normally only the first match is replaced. If
pattern begins with #, it must match at the beginning of the
expanded value of parameter. If pattern begins with %, it must match
at the end of the expanded value of parameter. If string is null,
matches of pattern are deleted and the / following pattern may be
omitted. If parameter is # or *, the substitution operation is
applied to each positional parameter in turn, and the expansion is the
resultant list. If parameter is an array variable subscripted
with # or *, the substitution operation is applied to each member of
the array in turn, and the expansion is the resultant list.
Pure bash:
a="hello world"
echo "${a// /_}"
OR tr:
tr -s ' ' '_' <<< "$a"
With sed reading directly from a variable:
$ sed 's/ /_/g' <<< "$a"
hello_world
And to store the result you have to use the var=$(command) syntax:
a=$(sed 's/ /_/g' <<< "$a")
For completeness, with awk it can be done like this:
$ a="hello my name is"
$ awk 'BEGIN{OFS="_"} {for (i=1; i<NF; i++) printf "%s%s",$i,OFS; printf "%s\n", $NF}' <<< "$a"
hello_my_name_is
Multiple spaces to one underscore
This can easily be achieved with a GNU shell parameter expansion. In particular:
${parameter/pattern/string}
If pattern begins with /, all matches of pattern are replaced with string.
with +(pattern-list)
Matches one or more occurrences of the given patterns.
Hence:
$ a='hello world example'
$ echo ${a// /_}
hello_world____example
$ echo ${a//+( )/_}
hello_world_example
However, for this to work in a bash script two amendments need to be made:
The parameter expansion requires encapsulation in double quotes " " to prevent word splitting with the input field separator $IFS.
The extglob shell option needs to be enabled using the shopt builtin, for extended pattern matching operators to be recognised.
The bash script finally looks like this:
#!/usr/bin/env bash
shopt -s extglob
a='hello world example'
echo "${a//+( )/_}"

How to extract last part of string in bash?

I have this variable:
A="Some variable has value abc.123"
I need to extract this value i.e abc.123. Is this possible in bash?
Simplest is
echo "$A" | awk '{print $NF}'
Edit: explanation of how this works...
awk breaks the input into different fields, using whitespace as the separator by default. Hardcoding 5 in place of NF prints out the 5th field in the input:
echo "$A" | awk '{print $5}'
NF is a built-in awk variable that gives the total number of fields in the current record. The following returns the number 5 because there are 5 fields in the string "Some variable has value abc.123":
echo "$A" | awk '{print NF}'
Combining $ with NF outputs the last field in the string, no matter how many fields your string contains.
Yes; this:
A="Some variable has value abc.123"
echo "${A##* }"
will print this:
abc.123
(The ${parameter##word} notation is explained in ยง3.5.3 "Shell Parameter Expansion" of the Bash Reference Manual.)
Some examples using parameter expansion
A="Some variable has value abc.123"
echo "${A##* }"
abc.123
Longest match on " " space
echo "${A% *}"
Some variable has value
Longest match on . dot
echo "${A%.*}"
Some variable has value abc
Shortest match on " " space
echo "${A%% *}"
some
Read more Shell-Parameter-Expansion
The documentation is a bit painful to read, so I've summarised it in a simpler way.
Note that the '*' needs to swap places with the ' ' depending on whether you use # or %. (The * is just a wildcard, so you may need to take off your "regex hat" while reading.)
${A% *} - remove shortest trailing * (strip the last word)
${A%% *} - remove longest trailing * (strip the last words)
${A#* } - remove shortest leading * (strip the first word)
${A##* } - remove longest leading * (strip the first words)
Of course a "word" here may contain any character that isn't a literal space.
You might commonly use this syntax to trim filenames:
${A##*/} removes all containing folders, if any, from the start of the path, e.g.
/usr/bin/git -> git
/usr/bin/ -> (empty string)
${A%/*} removes the last file/folder/trailing slash, if any, from the end:
/usr/bin/git -> /usr/bin
/usr/bin/ -> /usr/bin
${A%.*} removes the last extension, if any (just be wary of things like my.path/noext):
archive.tar.gz -> archive.tar
How do you know where the value begins? If it's always the 5th and 6th words, you could use e.g.:
B=$(echo "$A" | cut -d ' ' -f 5-)
This uses the cut command to slice out part of the line, using a simple space as the word delimiter.
As pointed out by Zedfoxus here. A very clean method that works on all Unix-based systems. Besides, you don't need to know the exact position of the substring.
A="Some variable has value abc.123"
echo "$A" | rev | cut -d ' ' -f 1 | rev
# abc.123
More ways to do this:
(Run each of these commands in your terminal to test this live.)
For all answers below, start by typing this in your terminal:
A="Some variable has value abc.123"
The array example (#3 below) is a really useful pattern, and depending on what you are trying to do, sometimes the best.
1. with awk, as the main answer shows
echo "$A" | awk '{print $NF}'
2. with grep:
echo "$A" | grep -o '[^ ]*$'
the -o says to only retain the matching portion of the string
the [^ ] part says "don't match spaces"; ie: "not the space char"
the * means: "match 0 or more instances of the preceding match pattern (which is [^ ]), and the $ means "match the end of the line." So, this matches the last word after the last space through to the end of the line; ie: abc.123 in this case.
3. via regular bash "indexed" arrays and array indexing
Convert A to an array, with elements being separated by the default IFS (Internal Field Separator) char, which is space:
Option 1 (will "break in mysterious ways", as #tripleee put it in a comment here, if the string stored in the A variable contains certain special shell characters, so Option 2 below is recommended instead!):
# Capture space-separated words as separate elements in array A_array
A_array=($A)
Option 2 [RECOMMENDED!]. Use the read command, as I explain in my answer here, and as is recommended by the bash shellcheck static code analyzer tool for shell scripts, in ShellCheck rule SC2206, here.
# Capture space-separated words as separate elements in array A_array, using
# a "herestring".
# See my answer here: https://stackoverflow.com/a/71575442/4561887
IFS=" " read -r -d '' -a A_array <<< "$A"
Then, print only the last elment in the array:
# Print only the last element via bash array right-hand-side indexing syntax
echo "${A_array[-1]}" # last element only
Output:
abc.123
Going further:
What makes this pattern so useful too is that it allows you to easily do the opposite too!: obtain all words except the last one, like this:
array_len="${#A_array[#]}"
array_len_minus_one=$((array_len - 1))
echo "${A_array[#]:0:$array_len_minus_one}"
Output:
Some variable has value
For more on the ${array[#]:start:length} array slicing syntax above, see my answer here: Unix & Linux: Bash: slice of positional parameters, and for more info. on the bash "Arithmetic Expansion" syntax, see here:
https://www.gnu.org/savannah-checkouts/gnu/bash/manual/bash.html#Arithmetic-Expansion
https://www.gnu.org/savannah-checkouts/gnu/bash/manual/bash.html#Shell-Arithmetic
You can use a Bash regex:
A="Some variable has value abc.123"
[[ $A =~ [[:blank:]]([^[:blank:]]+)$ ]] && echo "${BASH_REMATCH[1]}" || echo "no match"
Prints:
abc.123
That works with any [:blank:] delimiter in the current local (Usually [ \t]). If you want to be more specific:
A="Some variable has value abc.123"
pat='[ ]([^ ]+)$'
[[ $A =~ $pat ]] && echo "${BASH_REMATCH[1]}" || echo "no match"
echo "Some variable has value abc.123"| perl -nE'say $1 if /(\S+)$/'

Resources