how to use 'cut' command in linux with multi character sting - linux

/home/user/views/links/user1/gitsrc/database/src/
This is my string. I want to cut it in 2 strings such as
"/home/user/views/links/user1/"
"/database/src/"
so the delim is not actally a single character but a group of characters ie "gitsrc".

You can only define a single character as delimiter in cut.
You could use awk where the field separator can be a single character, a null string or a regular expression, e.g.
$ echo '/home/user/views/links/user1/gitsrc/database/src/' |
awk -F'gitsrc' '{ print $1 " " $2 }'
/home/user/views/links/user1/ /database/src/
or
$ echo '/home/user/views/links/user1/gitsrc/database/src/' |
awk -F'gitsrc' '{ print $1 ORS $2 }'
/home/user/views/links/user1/
/database/src/
In your shell you could or use a parameter expansion to get the first and second part:
$ str=/home/user/views/links/user1/gitsrc/database/src/
$ echo "${str%%gitsrc*}" # remove longest suffix `gitsrc*`
/home/user/views/links/user1/
$ echo "${str#*gitsrc}" # remove shortest prefix `*gitsrc`
/database/src/

Related

Remove special characters from 2nd column of a file

I have a file s.csv
a,b+ -.,c
aa,bb ().,c._c
I want to remove all special characters from 2nd column (file separated by comma)
cat s.csv | tr -dc '[:alnum:]\n\r' | tr '[:upper:]' '[:lower:]'
The above code also removes special characters from 3rd column as well.
awk -F, '{print $2}' s.csv | tr -dc '[:alnum:]\n\r' | tr '[:upper:]' '[:lower:]'
This code only print 2nd column.
Any idea how can I remove special char from 2nd column and price all
Required output should be
a,b,c
aa,bb,c._c
Remove all (from second field)
characters that are not upper case letters [^A-Z
or lower case letters a-z
or digits 0-9]
from second field $2
fields are with "," separated -F ','
keep the separator in output OFS=FS
$ awk -F ',' 'BEGIN{OFS=FS}{gsub(/[^A-Za-z0-9]/,"",$2); print}' s.csv
# test
$ awk -F ',' 'BEGIN{OFS=FS}{gsub(/[^A-Za-z0-9]/,"",$2); print}' <<<'aa,bb ().,c._c'
aa,bb,c._c
As #Léa Gris mentioned below
Don't forget to set the locale to C or [^A-Za-z0-9] is gonna be
interpreted unexpectedly in non-western European alphabets. Prepend
awk invocation with
LC_ALL=C
You can use the [:alpha:] character class using awk, here for second field and remove with gsub() function the characteres that aren't alpha:
awk 'BEGIN{OFS=FS=","} {gsub(/[^[:alpha:]]+/, "", $2)} 1' file
a,b,c
aa,bb,c._c
if you need other set of characters, you can see this answer of Ed Morton:
https://stackoverflow.com/questions/56481541/how-can-you-tell-which-characters-are-in-which-character-classes
and see "which characters are in which character classes"
Use this Perl one-liner:
perl -F',' -lane '$F[1] =~ s{[\W_]+}{}g; #F = map { lc } #F; print join ",", #F; ' in_file > out_file
The Perl one-liner uses these command line flags:
-e : Tells Perl to look for code in-line, instead of in a file.
-n : Loop over the input one line at a time, assigning it to $_ by default.
-l : Strip the input line separator ("\n" on *NIX by default) before executing the code in-line, and append it when printing.
-a : Split $_ into array #F on whitespace or on the regex specified in -F option.
-F',' : Split into #F on comma, rather than on whitespace.
s{[\W_]+}{} : Replace 1 or more occurrences of \W (non-word character) or underscore with nothing.
The regex uses these modifiers:
/g : Match the pattern repeatedly.
SEE ALSO:
perldoc perlrun: how to execute the Perl interpreter: command line switches
perldoc perlre: Perl regular expressions (regexes)
perldoc perlre: Perl regular expressions (regexes): Quantifiers; Character Classes and other Special Escapes; Assertions; Capture groups
perldoc perlrequick: Perl regular expressions quick start
You don't have to alter locale just to do it - by using octals instead of letters, the regex engine respects them as ASCII instead of being overly clever - i even intentionally set it to Belgian French to illustrate :
CODE
echo 'a,b+ -.,c
aa,bb ().,c._c' | {m,g}awk '
gsub("[^\\060-\\071\\101-\\132\\141-\\172]+","",$(!_+!_))^_' \
OFS=',' FS=','
OUTPUT
a,b,c
aa,bb,c._c
SHOWCASE LOCALE=C isn't needed
LANG="fr_BE.UTF8" gawk -e '
BEGIN { for(_=8*4;_<8^4;_++) { printf("%c",_) } } ' |
LANG="fr_BE.UTF8" gawk -p- -e '
gsub("[^\\060-\\071\\101-\\132\\141-\\172]+","",$-_)^_' OFS=',' FS=','
——————————
0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz
# profile gawk, cr'e'e Sun May 29 05:58:26 2022
# R`egle(s)
1 (gsub("[^\\060-\\071\\101-\\132\\141-\\172]+", "", $-_)) ^ _ { # 1
1 print
}

String manipulation in shell in a single line

I am using the below script
list=kmakalas#gmail.com;kmakalas#gmail.com;kmakalas#gmail.com;
for the above I wanted extract to
r=kmakalas,r=kmakalas,r=kmakalas
for that I used the below shell manipulations
rev_list="r=${list//#gmail.com;/r=}
the above gave me
r=kmakalas,r=kmakalas,r=kmakalas,r=
to get r=kmakalas,r=kmakalas,r=kmakalas
I used rev2="${rev_list%,r=}"
Is there any possibility to do in a single line command
Using GNU awk:
awk -F '[=;]' '{ for(i=2;i<NF;i++) { split($i,map,"#");printf i==(NF-1)?"r="map[1]:"r="map[1]"," } printf "\n" }' <<< "$list"
Explanation:
awk -F '[=;]' '{ # Set the field delimiter to "=" or ";"
for(i=2;i<NF;i++) {
split($i,map,"#"); # Loop through each field and split the field/address into an array map using "#" as the delimiter
printf i==(NF-1)?"r="map[1]:"r="map[1]"," # Print "r=" along with the "#" prefix
}
printf "\n"
}' <<< "$list"

How to search word in file that include dollar sign ($) using awk

I want to catch the lines that include that to following pattern:
word1, word2, ^word3$, word4
I want to check if the third word in the line is equal to word3, the ^ and $ signs always will be in the third column using awk.
something like this:
less file.txt | awk '{if ($3=="^word3$") {print $0}}'
and this will print the line, i just cant grab the word with the ^ and $ signs.
How can i do it?
(I must use awk because the original command is more complicated and i can't use grep)
Thanks!
Edited: the words of the input are separated by a space and a comma. So the awk command needs the -F ", " option.
If you use the operator ==, awk will perform an exact match. If you want to use a regex, you will have to use the ~ operator. So you have two solutions:
Exact match:
awk -F ", " '($3 == "word3") {print}' file.txt
Regex:
awk -F ", " '($3 ~ "^word3$") {print}' file.txt
Since {print} is the instruction executed by default, you can even drop that part and only issue:
awk -F ", " '($3 ~ "^word3$")' file.txt
But if I read your question again and again, I start thinking you want to extract ^word3$ from your text. In that case, what you need is:
awk -F ", " '($3 == "^word3$")' file.txt
(without square brackets in this case, as suggested somewhere).
An equivalent solution without redefining the field separator is to put the comma inside of the string you are trying to match:
awk '($3 == "^word3$,")' file.txt

Field separation with adding quotes

I am beginner in shell script .
I have one variable containing value having = character.
I want to add quote in fields after = Character.
abc="source=TDG"
echo $abc|awk -F"=" '{print $2}'
My code is printing one field only.
my expected output is
source='TDG'
$ abc='source=TDG'
$ echo "$abc" | sed 's/[^=]*$/\x27&\x27/'
source='TDG'
[^=]*$ match non = characters at end of line
\x27&\x27 add single quotes around the matched text
With awk
$ echo "$abc" | awk -F= '{print $1 FS "\047" $2 "\047"}'
source='TDG'
-F= input field separator is =
print $1 FS "\047" $2 "\047" print first field, followed by input field separator, followed by single quotes then second field and another single quotes
See how to escape single quote in awk inside printf
for more ways of handling single quotes in print
With bash parameter expansion
$ echo "${abc%=*}='${abc#*=}'"
source='TDG'
${abc%=*} will delete last occurrence of = and zero or more characters after it
${abc#*=} will delete zero or more characters and first = from start of string
Sed would be the better choice:
echo "$abc" | sed "s/[^=]*$/'&'/"
Awk can do it but needs extra bits:
echo "$abc" | awk -F= 'gsub(/(^|$)/,"\047",$2)' OFS==
What is taking place?
Using sub to surround TDG with single quotes by its octal nr to avoid quoting problems.
echo "$abc" | awk '{sub(/TDG/,"\047TDG\047")}1'
source='TDG'

How to split a string on the basis of delimiter and get the count of parts in linux shell script

I have string like this
my/path/to/home/file.txt
Now I want to get the number of parts in this string on the basis of delimeter (/). So for the above string the answer would be 5. I need this in my linux shell script. How to get that without using a for loop.
$ awk -F'/' '{print NF}' <<< "my/path/to/home/file.txt"
5
-F'/' : This will tell awk that fields are separate by / .
NF : This is the last field number. In this case "my" is field 1,path is 2nd..... and file.txt is 5th field.
{print NF}: This will print the last field number.
It greps the delimiter, counts the occurences and adds 1 :
echo $(($(echo "my/path/to/home/file.txt" | grep -o "/" | wc -l)+1))
#=> 5
You can use awk:
string="my/path/to/home/file.txt"
count="$(awk -F/ '{print $NF}' <<< "$string")"
-F/ splits the string into fields based on / as the delimiter. $NF contains the number of those fields.
only pure bash (fastest way):
#!/bin/bash
a=my/path/to/home/file.txt
b=${a//[^\/]}
echo $[ ${#b} +1 ]

Resources