How to Substring a string in bash?

How to Substring a string in bash? - linux

I want to cut following string
abcd|xyz
number of characters before and afer pipe "|" symbol can vary.
I know cut command but it requires specific number 'from' and 'to' and i dont think will work here.
Can anyone help?

The command cut has the following parameters:
-d, --delimiter=DELIM use DELIM instead of TAB for field delimiter
-f, --fields=LIST select only these fields; also print any line
that contains no delimiter character, unless
the -s option is specified
Following on from this, the below commands should do what you wish:
echo 'abcd|xyz' | cut -d'|' -f1 # Prints abcd
echo 'abcd|xyz' | cut -d'|' -f2 # Prints xyz

Related

How to strip stdout before logging into file? [duplicate]

Without using sed or awk, only cut, how do I get the last field when the number of fields are unknown or change with every line?

You could try something like this:
echo 'maps.google.com' | rev | cut -d'.' -f 1 | rev
Explanation
rev reverses "maps.google.com" to be moc.elgoog.spam
cut uses dot (ie '.') as the delimiter, and chooses the first field, which is moc
lastly, we reverse it again to get com

Use a parameter expansion. This is much more efficient than any kind of external command, cut (or grep) included.
data=foo,bar,baz,qux
last=${data##*,}
See BashFAQ #100 for an introduction to native string manipulation in bash.

It is not possible using just cut. Here is a way using grep:
grep -o '[^,]*$'
Replace the comma for other delimiters.
Explanation:
-o (--only-matching) only outputs the part of the input that matches the pattern (the default is to print the entire line if it contains a match).
[^,] is a character class that matches any character other than a comma.
* matches the preceding pattern zero or more time, so [^,]* matches zero or more non‑comma characters.
$ matches the end of the string.
Putting this together, the pattern matches zero or more non-comma characters at the end of the string.
When there are multiple possible matches, grep prefers the one that starts earliest. So the entire last field will be matched.
Full example:
If we have a file called data.csv containing
one,two,three
foo,bar
then grep -o '[^,]*$' < data.csv will output
three
bar

Without awk ?...
But it's so simple with awk:
echo 'maps.google.com' | awk -F. '{print $NF}'
AWK is a way more powerful tool to have in your pocket.
-F if for field separator
NF is the number of fields (also stands for the index of the last)

There are multiple ways. You may use this too.
echo "Your string here"| tr ' ' '\n' | tail -n1
> here
Obviously, the blank space input for tr command should be replaced with the delimiter you need.

This is the only solution possible for using nothing but cut:
echo "s.t.r.i.n.g." | cut -d'.' -f2-
[repeat_following_part_forever_or_until_out_of_memory:] | cut -d'.' -f2-
Using this solution, the number of fields can indeed be unknown and vary from time to time. However as line length must not exceed LINE_MAX characters or fields, including the new-line character, then an arbitrary number of fields can never be part as a real condition of this solution.
Yes, a very silly solution but the only one that meets the criterias I think.

If your input string doesn't contain forward slashes then you can use basename and a subshell:
$ basename "$(echo 'maps.google.com' | tr '.' '/')"
This doesn't use sed or awk but it also doesn't use cut either, so I'm not quite sure if it qualifies as an answer to the question as its worded.
This doesn't work well if processing input strings that can contain forward slashes. A workaround for that situation would be to replace forward slash with some other character that you know isn't part of a valid input string. For example, the pipe (|) character is also not allowed in filenames, so this would work:
$ basename "$(echo 'maps.google.com/some/url/things' | tr '/' '|' | tr '.' '/')" | tr '|' '/'

the following implements A friend's suggestion
#!/bin/bash
rcut(){
nu="$( echo $1 | cut -d"$DELIM" -f 2- )"
if [ "$nu" != "$1" ]
then
rcut "$nu"
else
echo "$nu"
fi
}
$ export DELIM=.
$ rcut a.b.c.d
d

An alternative using perl would be:
perl -pe 's/(.*) (.*)$/$2/' file
where you may change \t for whichever the delimiter of file is

It is better to use awk while working with tabular data. You don't have to master on command. If it can be achieved by awk, why not use that? I suggest you do not waste your precious time, and use a handful of commands to get the job done.
Example:
# $NF refers to the last column in awk
ll | awk '{print $NF}'

If you have a file named filelist.txt that is a list paths such as the following:
c:/dir1/dir2/file1.h
c:/dir1/dir2/dir3/file2.h
then you can do this:
rev filelist.txt | cut -d"/" -f1 | rev

Adding an approach to this old question just for the fun of it:
$ cat input.file # file containing input that needs to be processed
a;b;c;d;e
1;2;3;4;5
no delimiter here
124;adsf;15454
foo;bar;is;null;info
$ cat tmp.sh # showing off the script to do the job
#!/bin/bash
delim=';'
while read -r line; do
while [[ "$line" =~ "$delim" ]]; do
line=$(cut -d"$delim" -f 2- <<<"$line")
done
echo "$line"
done < input.file
$ ./tmp.sh # output of above script/processed input file
e
5
no delimiter here
15454
info
Besides bash, only cut is used.
Well, and echo, I guess.

choose -1
choose supports negative indexing (the syntax is similar to Python's slices).

I realized if we just ensure a trailing delimiter exists, it works. So in my case I have comma and whitespace delimiters. I add a space at the end;
$ ans="a, b"
$ ans+=" "; echo ${ans} | tr ',' ' ' | tr -s ' ' | cut -d' ' -f2
b

Extract all unique URL from log using sed

Can you help me with correct regexp from the sed syntaxis point of view? For now every regexp that i can write is marked by terminal as invalid.

If your log syntax is uniform, use this command
cut -f4 -d\" < logfile | sort -u
If you want to skip the query string from uniqness, use this
cut -f4 -d\" < logfile | cut -f1 -d\? | sort -u
Explanation
Filter the output with the cut command, take the 4th field (-f4) using " as separator (-d\"). The same with the second filter, using ? as separator

What do back brackets do in this bash script code?

so i'm doing a problem with bashscript, this one: ./namefreq.sh ANA should return a list of two names (on separate lines) ANA and RENEE, both of which have frequency 0.120.
Basically I have a file from table.csv shown in the code below that have names and a frequency number next to them e.g. Anna, 0.120
I'm still unsure what the `` does for this code, and I'm also struggling to understand how this code is able to print out two names with identical frequencies. The way I read the code is:
grep compares the word (-w) typed by the user (./bashscript.sh Anna) to the value of (a), which then uses the cut command to be able to compare the 2nd field of the line separated by the delimiter "," which is the frequency from the file table.csv and then | cut -f1 -d"," prints out the first fields which are the names with the same frequency
^ would this be correct?
thanks :)
#!/bin/bash
a=`grep -w $1 table.csv | cut -f2 -d','`
grep -w $a table.csv | cut -f1 -d',' | sort -d

When a command is in backticks or $(), the output of the command is subsituted back into the command in place of it. So if the file has Anna,0.120
a=`grep -w Anna table.csv | cut -f2 -d','`
will execute the grep and cut commands, which will output 0.120, so it will be equivalent to
a=0.120
Then the command looks for all the lines that match 0.120, extracts the first field with cut, and sorts them.

How to find the last field using 'cut'

Without using sed or awk, only cut, how do I get the last field when the number of fields are unknown or change with every line?

You could try something like this:
echo 'maps.google.com' | rev | cut -d'.' -f 1 | rev
Explanation
rev reverses "maps.google.com" to be moc.elgoog.spam
cut uses dot (ie '.') as the delimiter, and chooses the first field, which is moc
lastly, we reverse it again to get com

Use a parameter expansion. This is much more efficient than any kind of external command, cut (or grep) included.
data=foo,bar,baz,qux
last=${data##*,}
See BashFAQ #100 for an introduction to native string manipulation in bash.

It is not possible using just cut. Here is a way using grep:
grep -o '[^,]*$'
Replace the comma for other delimiters.
Explanation:
-o (--only-matching) only outputs the part of the input that matches the pattern (the default is to print the entire line if it contains a match).
[^,] is a character class that matches any character other than a comma.
* matches the preceding pattern zero or more time, so [^,]* matches zero or more non‑comma characters.
$ matches the end of the string.
Putting this together, the pattern matches zero or more non-comma characters at the end of the string.
When there are multiple possible matches, grep prefers the one that starts earliest. So the entire last field will be matched.
Full example:
If we have a file called data.csv containing
one,two,three
foo,bar
then grep -o '[^,]*$' < data.csv will output
three
bar

Without awk ?...
But it's so simple with awk:
echo 'maps.google.com' | awk -F. '{print $NF}'
AWK is a way more powerful tool to have in your pocket.
-F if for field separator
NF is the number of fields (also stands for the index of the last)

There are multiple ways. You may use this too.
echo "Your string here"| tr ' ' '\n' | tail -n1
> here
Obviously, the blank space input for tr command should be replaced with the delimiter you need.

This is the only solution possible for using nothing but cut:
echo "s.t.r.i.n.g." | cut -d'.' -f2-
[repeat_following_part_forever_or_until_out_of_memory:] | cut -d'.' -f2-
Using this solution, the number of fields can indeed be unknown and vary from time to time. However as line length must not exceed LINE_MAX characters or fields, including the new-line character, then an arbitrary number of fields can never be part as a real condition of this solution.
Yes, a very silly solution but the only one that meets the criterias I think.

If your input string doesn't contain forward slashes then you can use basename and a subshell:
$ basename "$(echo 'maps.google.com' | tr '.' '/')"
This doesn't use sed or awk but it also doesn't use cut either, so I'm not quite sure if it qualifies as an answer to the question as its worded.
This doesn't work well if processing input strings that can contain forward slashes. A workaround for that situation would be to replace forward slash with some other character that you know isn't part of a valid input string. For example, the pipe (|) character is also not allowed in filenames, so this would work:
$ basename "$(echo 'maps.google.com/some/url/things' | tr '/' '|' | tr '.' '/')" | tr '|' '/'

the following implements A friend's suggestion
#!/bin/bash
rcut(){
nu="$( echo $1 | cut -d"$DELIM" -f 2- )"
if [ "$nu" != "$1" ]
then
rcut "$nu"
else
echo "$nu"
fi
}
$ export DELIM=.
$ rcut a.b.c.d
d

An alternative using perl would be:
perl -pe 's/(.*) (.*)$/$2/' file
where you may change \t for whichever the delimiter of file is

It is better to use awk while working with tabular data. You don't have to master on command. If it can be achieved by awk, why not use that? I suggest you do not waste your precious time, and use a handful of commands to get the job done.
Example:
# $NF refers to the last column in awk
ll | awk '{print $NF}'

If you have a file named filelist.txt that is a list paths such as the following:
c:/dir1/dir2/file1.h
c:/dir1/dir2/dir3/file2.h
then you can do this:
rev filelist.txt | cut -d"/" -f1 | rev

Adding an approach to this old question just for the fun of it:
$ cat input.file # file containing input that needs to be processed
a;b;c;d;e
1;2;3;4;5
no delimiter here
124;adsf;15454
foo;bar;is;null;info
$ cat tmp.sh # showing off the script to do the job
#!/bin/bash
delim=';'
while read -r line; do
while [[ "$line" =~ "$delim" ]]; do
line=$(cut -d"$delim" -f 2- <<<"$line")
done
echo "$line"
done < input.file
$ ./tmp.sh # output of above script/processed input file
e
5
no delimiter here
15454
info
Besides bash, only cut is used.
Well, and echo, I guess.

choose -1
choose supports negative indexing (the syntax is similar to Python's slices).

I realized if we just ensure a trailing delimiter exists, it works. So in my case I have comma and whitespace delimiters. I add a space at the end;
$ ans="a, b"
$ ans+=" "; echo ${ans} | tr ',' ' ' | tr -s ' ' | cut -d' ' -f2
b

Unix cut except last two tokens

I'm trying to parse file names in specific directory. Filenames are of format:
token1_token2_token3_token(N-1)_token(N).sh
I need to cut the tokens using delimiter '_', and need to take string except the last two tokens. In above examlpe output should be token1_token2_token3.
The number of tokens is not fixed. I've tried to do it with -f#- option of cut command, but did not find any solution. Any ideas?

With cut:
$ echo t1_t2_t3_tn1_tn2.sh | rev | cut -d_ -f3- | rev
t1_t2_t3
rev reverses each line.
The 3- in -f3- means from the 3rd field to the end of the line (which is the beginning of the line through the third-to-last field in the unreversed text).

You may use POSIX defined parameter substitution:
$ name="t1_t2_t3_tn1_tn2.sh"
$ name=${name%_*_*}
$ echo $name
t1_t2_t3

It can not be done with cut, However, you can use sed
sed -r 's/(_[^_]+){2}$//g'

Just a different way to write ysth's answer :
echo "t1_t2_t3_tn1_tn2.sh" |rev| cut -d"_" -f1,2 --complement | rev

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

How to Substring a string in bash? - linux

I want to cut following string abcd|xyz number of characters before and afer pipe "|" symbol can vary. I know cut command but it requires specific number 'from' and 'to' and i dont think will work here. Can anyone help?

Related

How to strip stdout before logging into file? [duplicate]

Extract all unique URL from log using sed

What do back brackets do in this bash script code?

How to find the last field using 'cut'

Unix cut except last two tokens

Categories

Resources