Extract a value from a file

Extract a value from a file - linux

I have a file with many lines, one is:
COMPOSER_HOME=/home/glen/.composer
I want to extract the string /home/glen/.composer from this file in my shell script. How can I?
I can get the whole line with grep but not sure how to remove the first part.

Here:
grep 'COMPOSER_HOME=' file| cut -d= -f2
cut cut's by delimiter = and the 2nd portion would be whatever is After the = e.g.: /home/glen/.composer , with -f1 you would get COMPOSER_HOME

Since you tagged linux, you have GNU grep which includes PCRE
grep -oP 'COMPOSER_HOME=\K.+' file
The \K means match what comes before, then throw it out and operate on the rest of the line.
You can also use awk
awk -F "=" '$1 == "COMPOSER_HOME" {print $2}' file

Maybe this is enough
sed -nE 's/COMPOSER_HOME=(.*)/\1/p' your_file
It does not print any line unless you explicitly request it (-n), it matches the line starting with COMPOSER_HOME= and captures what follows (.*) (using () instead of \(\), thanks to -E), and puts in the replacement only what is captured. Then requests the printing of the line with the p flag of the substitution command.

Related

How to strip stdout before logging into file? [duplicate]

Without using sed or awk, only cut, how do I get the last field when the number of fields are unknown or change with every line?

You could try something like this:
echo 'maps.google.com' | rev | cut -d'.' -f 1 | rev
Explanation
rev reverses "maps.google.com" to be moc.elgoog.spam
cut uses dot (ie '.') as the delimiter, and chooses the first field, which is moc
lastly, we reverse it again to get com

Use a parameter expansion. This is much more efficient than any kind of external command, cut (or grep) included.
data=foo,bar,baz,qux
last=${data##*,}
See BashFAQ #100 for an introduction to native string manipulation in bash.

It is not possible using just cut. Here is a way using grep:
grep -o '[^,]*$'
Replace the comma for other delimiters.
Explanation:
-o (--only-matching) only outputs the part of the input that matches the pattern (the default is to print the entire line if it contains a match).
[^,] is a character class that matches any character other than a comma.
* matches the preceding pattern zero or more time, so [^,]* matches zero or more non‑comma characters.
$ matches the end of the string.
Putting this together, the pattern matches zero or more non-comma characters at the end of the string.
When there are multiple possible matches, grep prefers the one that starts earliest. So the entire last field will be matched.
Full example:
If we have a file called data.csv containing
one,two,three
foo,bar
then grep -o '[^,]*$' < data.csv will output
three
bar

Without awk ?...
But it's so simple with awk:
echo 'maps.google.com' | awk -F. '{print $NF}'
AWK is a way more powerful tool to have in your pocket.
-F if for field separator
NF is the number of fields (also stands for the index of the last)

There are multiple ways. You may use this too.
echo "Your string here"| tr ' ' '\n' | tail -n1
> here
Obviously, the blank space input for tr command should be replaced with the delimiter you need.

This is the only solution possible for using nothing but cut:
echo "s.t.r.i.n.g." | cut -d'.' -f2-
[repeat_following_part_forever_or_until_out_of_memory:] | cut -d'.' -f2-
Using this solution, the number of fields can indeed be unknown and vary from time to time. However as line length must not exceed LINE_MAX characters or fields, including the new-line character, then an arbitrary number of fields can never be part as a real condition of this solution.
Yes, a very silly solution but the only one that meets the criterias I think.

If your input string doesn't contain forward slashes then you can use basename and a subshell:
$ basename "$(echo 'maps.google.com' | tr '.' '/')"
This doesn't use sed or awk but it also doesn't use cut either, so I'm not quite sure if it qualifies as an answer to the question as its worded.
This doesn't work well if processing input strings that can contain forward slashes. A workaround for that situation would be to replace forward slash with some other character that you know isn't part of a valid input string. For example, the pipe (|) character is also not allowed in filenames, so this would work:
$ basename "$(echo 'maps.google.com/some/url/things' | tr '/' '|' | tr '.' '/')" | tr '|' '/'

the following implements A friend's suggestion
#!/bin/bash
rcut(){
nu="$( echo $1 | cut -d"$DELIM" -f 2- )"
if [ "$nu" != "$1" ]
then
rcut "$nu"
else
echo "$nu"
fi
}
$ export DELIM=.
$ rcut a.b.c.d
d

An alternative using perl would be:
perl -pe 's/(.*) (.*)$/$2/' file
where you may change \t for whichever the delimiter of file is

It is better to use awk while working with tabular data. You don't have to master on command. If it can be achieved by awk, why not use that? I suggest you do not waste your precious time, and use a handful of commands to get the job done.
Example:
# $NF refers to the last column in awk
ll | awk '{print $NF}'

If you have a file named filelist.txt that is a list paths such as the following:
c:/dir1/dir2/file1.h
c:/dir1/dir2/dir3/file2.h
then you can do this:
rev filelist.txt | cut -d"/" -f1 | rev

Adding an approach to this old question just for the fun of it:
$ cat input.file # file containing input that needs to be processed
a;b;c;d;e
1;2;3;4;5
no delimiter here
124;adsf;15454
foo;bar;is;null;info
$ cat tmp.sh # showing off the script to do the job
#!/bin/bash
delim=';'
while read -r line; do
while [[ "$line" =~ "$delim" ]]; do
line=$(cut -d"$delim" -f 2- <<<"$line")
done
echo "$line"
done < input.file
$ ./tmp.sh # output of above script/processed input file
e
5
no delimiter here
15454
info
Besides bash, only cut is used.
Well, and echo, I guess.

choose -1
choose supports negative indexing (the syntax is similar to Python's slices).

I realized if we just ensure a trailing delimiter exists, it works. So in my case I have comma and whitespace delimiters. I add a space at the end;
$ ans="a, b"
$ ans+=" "; echo ${ans} | tr ',' ' ' | tr -s ' ' | cut -d' ' -f2
b

Capturing string between 2 specific letters/words using shell scripting

I am trying to capture the string between 2 specific letters/words using sed/awk. This is what I am trying to do:
The input is a file test.log containing
Owner: CN=abc.samplecerrt.com,o=IN,DC=com
Owner: CN=abc1.samplecerrt.com,o=IN,DC=com
I want to extract only "CN=abc.samplecerrt.com"
I tried
sed 's/.*CN=\(.*\),.*/\1/p' test.log >> result.log
But this returns "abc.samplecerrt.com,o=IN,DC=com"
How do I go about this?

test file:
$ cat logs.txt
CN=abc.samplecerrt.com,o=IN,DC=com Owner: CN=abc1.samplecerrt.com,o=IN,DC=com
command and output:
$ grep -oP 'CN=(?:(?!CN=).)*?.com' logs.txt
CN=abc.samplecerrt.com
CN=abc1.samplecerrt.com

This might work for you (GNU sed):
sed -n 's/.*\(CN=[^,]*\).*/\1/p' file
Or:
sed 's/.*\(CN=[^,]*\).*/\1/p;d' file
The first turns off implicit printing -n so as to act like grep.
Matches and captures the string CN= followed by zero or more non-comma characters and prints the captured group \1 if a match is made.
The second solution is much the same except it deletes all lines and only prints the captured group as above.

With awk you can get the field where is the string you need. For it, you can set FS=:|, Now if you run
awk -v FS=":|," '{print $2}' file
CN=abc.samplecerrt.com
CN=abc1.samplecerrt.com
you get the field. But you only want one, so
awk -v FS=":|," '$2 !~ /abc1/ {print $2}' file
CN=abc.samplecerrt.com

Number lines and hide the empty ones

I am trying to number the lines of a txt file and hide the empty ones . I use this code :
cat -n file.txt | grep . file.txt
But it doesnt work . It ignores the cat command . I want to display all the non-empty lines and number them ( the txt file is not a static one , like a list that a user can type in ).
edit : Given the great solutions below , i would also add that grep . file.txt | cat -n also worked .

I assume you want to number the lines that remain after the empty lines are removed.
Solution #1
Use sed '/^$/d' to delete the empty lines then pipe its output to cat -n to number them:
sed '/^$/d' file.txt | cat -n
The sed program contains only one command: d (delete the line). The sed commands can be prefixed by zero, one or two addresses that tell what lines the command applies to.
In this case there is only one address /^$/. It is a regex (enclosed in /) that selects the empty lines; the lines where start of the line (^) is followed by the end of the line ($).
Solution #2
You can also use grep -v '^$' to filter out the empty lines:
grep -v '^$' file.txt | cat -n
Again, ^$ is a regular expression that matches the empty lines. -v reverses the condition and tells grep to display the lines that do not match the regex.
The commands above do not modify the file. They read the content of file.txt, process it and display the result on screen.
Update
As #robc suggests in a comment, nl is even better than cat -n to number the lines. Thank you #robc, I didn't know about nl until now (I didn't know about cat -n either). It is never too late to learn new things.

This could be easily done with awk. This will print line with line numbers and ignore empty lines.
awk 'NF{print FNR,$0}' file.txt
Explanation: Adding detailed explanation for above code.
awk ' ##Starting awk program from here.
NF{ ##Checking condition if NF(number of fields) is NOT NULL in current line then do following.
print FNR,$0 ##Printing current line number by mentioning FNR and then current line value.
}
' file.txt ##Mentioning Input_file name which we are passing to awk program here.

Trying to use grep to find something, then output a different part of the line

Say for instance I'm searching a line that is like this:
Color asdf
and I use grep to find that line, like grep asdf file.txt
How would I then display Color? Learning linux is hard.

With the command line tool sed you can replace stings by using regular expressions:
echo "Color asdf" | sed 's/\([^ ]*\).*/\1/'
This part: \([^ ]*\).* is a regular expresion. The first part of the regex: [^ ]*, matches any character except a space as many times as possible and what's between the \( and \) is being captured in the variable \1. Then you also match the remaining part of the string with .* and replace all of that with only the first word which was captured by \([^ ]*\) by using \1 in the replace part of the sed command.
Here some more info about sed:
http://linux.about.com/od/commands/a/Example-Uses-Of-Sed-Cmdsedxa.htm

You could use sed:
sed -n 's/[[:space:]][[:space:]]*asdf$//p' file.txt
Details:
The -n option tells sed not to print the pattern space automatically. Basically, it doesn't output anything unless you tell it to.
The s command of sed replaces text. Here, if a line ends with asdf, preceded by at least one whitespace character, we replace all of that with nothing and then print the line (notice the p flag at the end of the s command). The printing is only done if something was actually replaced. More information about the s command can be found e. g. in the GNU sed manual.
Edit for clarity: When using single quotes, parameter expansion does not work and thus, variables won't be replaced. To use variables, use double quotes:
search=asdf
sed -n "s/[[:space:]][[:space:]]*${search}\$//p" file.txt

If you'd really like to use grep here, you could pipe the output from grep into cut:
grep -h asdf *.txt | cut -s -d -f 1
Note that there have to be two spaces after the -d option to cut - the first tells cut to use a blank as the field delimiter (I'm assuming your fields are blank-delimited rather than tab-delimited), while the second separates the -d option from the following option (-f).
But, yeah, sed or awk are probably your friends here... :-)

you can color pattern in the line using grep
grep --colour -o 'asdf' file.txt
edit: the -o option will print only the patterns

How to find the last field using 'cut'

Without using sed or awk, only cut, how do I get the last field when the number of fields are unknown or change with every line?

You could try something like this:
echo 'maps.google.com' | rev | cut -d'.' -f 1 | rev
Explanation
rev reverses "maps.google.com" to be moc.elgoog.spam
cut uses dot (ie '.') as the delimiter, and chooses the first field, which is moc
lastly, we reverse it again to get com

Use a parameter expansion. This is much more efficient than any kind of external command, cut (or grep) included.
data=foo,bar,baz,qux
last=${data##*,}
See BashFAQ #100 for an introduction to native string manipulation in bash.

It is not possible using just cut. Here is a way using grep:
grep -o '[^,]*$'
Replace the comma for other delimiters.
Explanation:
-o (--only-matching) only outputs the part of the input that matches the pattern (the default is to print the entire line if it contains a match).
[^,] is a character class that matches any character other than a comma.
* matches the preceding pattern zero or more time, so [^,]* matches zero or more non‑comma characters.
$ matches the end of the string.
Putting this together, the pattern matches zero or more non-comma characters at the end of the string.
When there are multiple possible matches, grep prefers the one that starts earliest. So the entire last field will be matched.
Full example:
If we have a file called data.csv containing
one,two,three
foo,bar
then grep -o '[^,]*$' < data.csv will output
three
bar

Without awk ?...
But it's so simple with awk:
echo 'maps.google.com' | awk -F. '{print $NF}'
AWK is a way more powerful tool to have in your pocket.
-F if for field separator
NF is the number of fields (also stands for the index of the last)

There are multiple ways. You may use this too.
echo "Your string here"| tr ' ' '\n' | tail -n1
> here
Obviously, the blank space input for tr command should be replaced with the delimiter you need.

This is the only solution possible for using nothing but cut:
echo "s.t.r.i.n.g." | cut -d'.' -f2-
[repeat_following_part_forever_or_until_out_of_memory:] | cut -d'.' -f2-
Using this solution, the number of fields can indeed be unknown and vary from time to time. However as line length must not exceed LINE_MAX characters or fields, including the new-line character, then an arbitrary number of fields can never be part as a real condition of this solution.
Yes, a very silly solution but the only one that meets the criterias I think.

If your input string doesn't contain forward slashes then you can use basename and a subshell:
$ basename "$(echo 'maps.google.com' | tr '.' '/')"
This doesn't use sed or awk but it also doesn't use cut either, so I'm not quite sure if it qualifies as an answer to the question as its worded.
This doesn't work well if processing input strings that can contain forward slashes. A workaround for that situation would be to replace forward slash with some other character that you know isn't part of a valid input string. For example, the pipe (|) character is also not allowed in filenames, so this would work:
$ basename "$(echo 'maps.google.com/some/url/things' | tr '/' '|' | tr '.' '/')" | tr '|' '/'

the following implements A friend's suggestion
#!/bin/bash
rcut(){
nu="$( echo $1 | cut -d"$DELIM" -f 2- )"
if [ "$nu" != "$1" ]
then
rcut "$nu"
else
echo "$nu"
fi
}
$ export DELIM=.
$ rcut a.b.c.d
d

An alternative using perl would be:
perl -pe 's/(.*) (.*)$/$2/' file
where you may change \t for whichever the delimiter of file is

It is better to use awk while working with tabular data. You don't have to master on command. If it can be achieved by awk, why not use that? I suggest you do not waste your precious time, and use a handful of commands to get the job done.
Example:
# $NF refers to the last column in awk
ll | awk '{print $NF}'

If you have a file named filelist.txt that is a list paths such as the following:
c:/dir1/dir2/file1.h
c:/dir1/dir2/dir3/file2.h
then you can do this:
rev filelist.txt | cut -d"/" -f1 | rev

Adding an approach to this old question just for the fun of it:
$ cat input.file # file containing input that needs to be processed
a;b;c;d;e
1;2;3;4;5
no delimiter here
124;adsf;15454
foo;bar;is;null;info
$ cat tmp.sh # showing off the script to do the job
#!/bin/bash
delim=';'
while read -r line; do
while [[ "$line" =~ "$delim" ]]; do
line=$(cut -d"$delim" -f 2- <<<"$line")
done
echo "$line"
done < input.file
$ ./tmp.sh # output of above script/processed input file
e
5
no delimiter here
15454
info
Besides bash, only cut is used.
Well, and echo, I guess.

choose -1
choose supports negative indexing (the syntax is similar to Python's slices).

I realized if we just ensure a trailing delimiter exists, it works. So in my case I have comma and whitespace delimiters. I add a space at the end;
$ ans="a, b"
$ ans+=" "; echo ${ans} | tr ',' ' ' | tr -s ' ' | cut -d' ' -f2
b

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Extract a value from a file - linux

I have a file with many lines, one is: COMPOSER_HOME=/home/glen/.composer I want to extract the string /home/glen/.composer from this file in my shell script. How can I? I can get the whole line with grep but not sure how to remove the first part.

Here: grep 'COMPOSER_HOME=' file| cut -d= -f2 cut cut's by delimiter = and the 2nd portion would be whatever is After the = e.g.: /home/glen/.composer , with -f1 you would get COMPOSER_HOME

Since you tagged linux, you have GNU grep which includes PCRE grep -oP 'COMPOSER_HOME=\K.+' file The \K means match what comes before, then throw it out and operate on the rest of the line. You can also use awk awk -F "=" '$1 == "COMPOSER_HOME" {print $2}' file

Related

How to strip stdout before logging into file? [duplicate]

Capturing string between 2 specific letters/words using shell scripting

Number lines and hide the empty ones

Trying to use grep to find something, then output a different part of the line

How to find the last field using 'cut'

Categories

Resources