grep string with special characters in file - linux

my file contains:
/*uid:68160*/\n SELECT
/*uid:68160*/SELECT
I tried with below:
grep -vF "/[*]uid::[[:digit:]][*]/SELECT"
which is helping to removed 2nd line.
How to remove 1st line by grep
also tried:
grep -vF "/[*]uid::[[:digit:]][*]/\n SELECT"

Assuming you have a literal text like that,
s='/*uid:68160*/\n SELECT
/*uid:68160*/SELECT
Text'
and you want to remove lines 1 and 2, you may use
grep -Ev '/[*]uid:[[:digit:]]+[*]/(\\n *)?SELECT'
See the online grep demo
Details
-Ev - E enables POSIX ERE and v will negate the result
/[*]uid:[[:digit:]]+[*]/(\\n *)?SELECT - matches
/[*]uid: - a /*uid: string
[[:digit:]]+ - 1+ digits
[*]/ - a */ string
(\\n *)? - an optional group matching 1 or 0 occurrences of \n two-char combination and then any 0 or more spaces
SELECT - a string

Related

How to grep the string with specific pattern

I am trying to grep a file.txt to search 2 strings cp and (target file name) where the line in file is as below,
cp (source file name) (target file name)
the problem for me here is string '(target file name)' has specific pattern as /path/to/file/TC_12_IT_(6 digits)_(6 digits)_TC_12_TEST _(2 digits).tc12.tc12
I am using below grep command to search a line with these 2 strings,
grep -E cp.*/path/to/file/TC_12_IT_ file.txt
how can I be more specific about (target file name) in grep command to search (target file name) with all its patterns, something like below,
grep -E 'cp.*/path/to/file/TC_12_IT_*_*_TC_12_TEST_*.tc12.tc12' file.txt
can we use wildcards in grep to search string in file just like we can use wilecard like * in listing out files e.g.
ls -lrt TC_12_*_12345678.txt
please suggest if there are any other ways to achieve this.
More specifically:
grep -P '^cp\s+.+\s+\S+/TC_12_IT_\d{6}_\d{6}_TC_12_TEST _\d2[.]tc12[.]tc12$' in_file > out_file
^ : beginning of the line.
\s+ : 1 or more whitespace characters.
.+ : 1 or more any characters.
\S+ : 1 or more non-whitespace characters.
\d{6} : exactly 6 digits.
[.] : literal dot (.). Note that just plain . inside a regular expression means any character, unless it is inside a character class ([.]) or escaped (\.).
$ : end of the line.
SEE ALSO:
GNU grep manual
perlre - Perl regular expressions
Like this, using GNU grep:
grep -P 'cp.*TC_12_IT_\d{6}_\d{6}TC_12_TEST\d{2}.tc12.tc12' file
The regular expression matches as follows:
Node
Explanation
cp
'cp'
.*
any character except \n (0 or more times (matching the most amount possible))
TC_12_IT_
'TC_12_IT_'
\d{6}
digits (0-9) (6 times)
_
_
\d{6}
digits (0-9) (6 times)
TC_12_TEST
'TC_12_TEST'
\d{2}
digits (0-9) (2 times)
.
any character except \n
tc12
'tc12'
.
any character except \n
tc12
'tc12'

Is it possible replace the value of a cell in a csv file using grep,sed or both

I have written the following command
#!/bin/bash
awk -v value=$newvalue -v row=$rownum -v col=1 'BEGIN{FS=OFS=","} NR==row {$col=value}1' "${file}".csv >> temp.csv && mv temp.csv "${file}".csv
Sample Input of file.csv
Header,1
Field1,Field2,Field3
1,ABC,4567
2,XYZ,7890
Assuiming $newvalue=3 ,$rownum=4 and col=1, then the above code will replace:
Required Output
Header,1
Field1,Field2,Field3
1,ABC,4567
3,XYZ,7890
So if I know the row and column, is it possible to replace the said value using grep, sed?
Edit1: Field3 will always have a unique value for their respective rows. ( in case that info helps anyway)
Assuming your CSV file is as simple as what you show (no commas in quoted fields), and your newvalue does not contain characters that sed would interpret in a special way (e.g. ampersands, slashes or backslashes), the following should work with just sed (tested with GNU sed):
sed -Ei "$rownum s/[^,]*/$newvalue/$col" file.csv
Demo:
$ cat file.csv
Header,1
Field1,Field2,Field3
1,ABC,4567
3,XYZ,7890
$ rownum=3
$ col=2
$ newvalue="NEW"
$ sed -Ei "$rownum s/[^,]*/$newvalue/$col" file.csv
$ cat file.csv
Header,1
Field1,Field2,Field3
1,NEW,4567
3,XYZ,7890
Explanations: $rownum is used as the address (here the line number) where to apply the following command. s is the sed substitute command. [^,]* is the regular expression to search for and replace: the longest possible string not containing a comma. $newvalue is the replacement string. $col is the occurrence to replace.
If newvalue may contain ampersands, slashes or backslashes we must sanitize it first:
sanitizednewvalue=$(sed -E 's/([/\&])/\\\1/g' <<< "$newvalue")
sed -Ei "$rownum s/[^,]*/$sanitizednewvalue/$col" file.csv
Demo:
$ newvalue='NEW&\/&NEW'
$ sanitizednewvalue=$(sed -E 's/([/\&])/\\\1/g' <<< "$newvalue")
$ echo "$sanitizednewvalue"
NEW\&\\\/\&NEW
$ sed -Ei "$rownum s/[^,]*/$sanitizednewvalue/$col" file.csv
$ cat file.csv
Header,1
Field1,Field2,Field3
1,NEW&\/&NEW,4567
3,XYZ,7890
With sed, how about:
#!/bin/bash
newvalue=3
rownum=4
col=1
sed -i -E "${rownum} s/(([^,]+,){$((col-1))})[^,]+/\\1${newvalue}/" file.csv
Result of file.csv
Header,1
Field1,Field2,Field3
1,ABC,4567
3,XYZ,7890
${rownum} matches the line number.
(([^,]+,){n}) matches the n-time repetition of the group of
non-comma characters followed by a comma. Then it should be the substring
before the target (to be substituted) column by assigning n to
col - 1.
Let's Try to Implement sed command
Let us consider a sample CSV file with the following content:
$ cat file
Solaris,25,11
Ubuntu,31,2
Fedora,21,3
LinuxMint,45,4
RedHat,12,5
To remove the 1st field or column :
$ sed 's/[^,]*,//' file
25,11
31,2
21,3
45,4
12,5
This regular expression searches for a sequence of non-comma([^,]*) characters and deletes them which results in the 1st field getting removed.
To print only the last field, OR remove all fields except the last field:
$ sed 's/.*,//' file
11
2
3
4
5
This regex removes everything till the last comma(.*,) which results in deleting all the fields except the last field.
To print only the 1st field:
$ sed 's/,.*//' file
Solaris
Ubuntu
Fedora
LinuxMint
RedHat
This regex(,.*) removes the characters starting from the 1st comma till the end resulting in deleting all the fields except the last field.
To delete the 2nd field:
$ sed 's/,[^,]*,/,/' file
Solaris,11
Ubuntu,2
Fedora,3
LinuxMint,4
RedHat,5
The regex (,[^,]*,) searches for a comma and sequence of characters followed by a comma which results in matching the 2nd column, and replaces this pattern matched with just a comma, ultimately ending in deleting the 2nd column.
Note: To delete the fields in the middle gets more tougher in sed since every field has to be matched literally.
To print only the 2nd field:
$ sed 's/[^,]*,\([^,]*\).*/\1/' file
25
31
21
45
12
The regex matches the first field, second field and the rest, however groups the 2nd field alone. The whole line is now replaced with the 2nd field(\1), hence only the 2nd field gets displayed.
Print only lines in which the last column is a single digit number:
$ sed -n '/.*,[0-9]$/p' file
Ubuntu,31,2
Fedora,21,3
LinuxMint,45,4
RedHat,12,5
The regex (,[0-9]$) checks for a single digit in the last field and the p command prints the line which matches this condition.
To number all lines in the file:
$ sed = file | sed 'N;s/\n/ /'
1 Solaris,25,11
2 Ubuntu,31,2
3 Fedora,21,3
4 LinuxMint,45,4
5 RedHat,12,5
This is simulation of cat -n command. awk does it easily using the special variable NR. The '=' command of sed gives the line number of every line followed by the line itself. The sed output is piped to another sed command to join every 2 lines.
Replace the last field by 99 if the 1st field is 'Ubuntu':
$ sed 's/\(Ubuntu\)\(,.*,\).*/\1\299/' file
Solaris,25,11
Ubuntu,31,99
Fedora,21,3
LinuxMint,45,4
RedHat,12,5
This regex matches 'Ubuntu' and till the end except the last column and groups each of them as well. In the replacement part, the 1st and 2nd group along with the new number 99 is substituted.
Delete the 2nd field if the 1st field is 'RedHat':
$ sed 's/\(RedHat,\)[^,]*\(.*\)/\1\2/' file
Solaris,25,11
Ubuntu,31,2
Fedora,21,3
LinuxMint,45,4
RedHat,,5
The 1st field 'RedHat', the 2nd field and the remaining fields are grouped, and the replacement is done with only 1st and the last group , resuting in getting the 2nd field deleted.
To insert a new column at the end(last column) :
$ sed 's/.*/&,A/' file
Solaris,25,11,A
Ubuntu,31,2,A
Fedora,21,3,A
LinuxMint,45,4,A
RedHat,12,5,A
The regex (.*) matches the entire line and replacing it with the line itself (&) and the new field.
To insert a new column in the beginning(1st column):
$ sed 's/.*/A,&/' file
A,Solaris,25,11
A,Ubuntu,31,2
A,Fedora,21,3
A,LinuxMint,45,4
A,RedHat,12,5
Same as last example, just the line matched is followed by the new column
I hope this will help. Let me know if you need to use Awk or any other command.
Thank you

Linux sed regular expression

I have a string:
2021-05-27 10:40:50.678117 PID529270:TID 47545543550720:SID 1673488:TXID 786092740:QID 140: INFO:MEMCONTEXT:MemContext state: mem[cur/hi/max] = 9135 / 96586 / 96576 MB, VM[cur/hi/max] = 9161 / 21841178 / 100663296 MB
I want to get the number 9135 that first occurrence between '=' and '/', right now, my command as below, it works, but I don't think it's perfect:
sed -r 's/.* = ([0-9]+) .* = .*/\1 /'
Need a more neat one, please help advise.
You can use
sed -En 's~.*= ([0-9]+) /.*=.*~\1~p'
See the online demo.
An awk solution:
awk -F= '{gsub(/\/.*|[^0-9]/,"",$2);print $2}'
See this demo.
Details:
-En - E (or r as in your example) enables the POSIX ERE syntax and n suppresses the default line output
.*= ([0-9]+) /.*=.* - matches any text, = + space, captures one or more digits into Grou 1, then matches a space, /, then any text, = and again any text
\1 - replaces with Group 1 value
p - prints the result of the substitution.
Here, ~ are used as regex delimiters in order not to escape / in the pattern.
awk:
-F= - sets the input field separator to =
gsub(/\/.*|[^0-9]/,"",$2) - removes any non-digit or / and the rest of the string
print $2 - prints the modified Field 2 value.
You could also get the first match with grep using -P for Perl-compatible regular expressions.
grep -oP "^.*? = \K\d+(?= /)"
^ Start of string
.*? Match as least as possible chars
= Match space = and space
\K\d+ Forget what is matched so far
(?= /) Assert a space and / to the right
Output
9135
See a bash demo
Since you want the material between the first = and the first /, ignoring the spaces, you could use:
sed -E -e 's%^[^=]*= ([^/]*) /.*$%\1%'
This uses Extended Regular Expressions (ERE) (-E; -r also works with GNU sed), and searches from the start of the line for a sequence of 'not =' characters, the = character, a space, anything that's not a slash (which is remembered), another space, a slash, and anything that follows, replacing it all with what was remembered. The ^ and $ anchors aren't crucial; it will work the same without them. The % symbols are used instead of / because the searched-for pattern includes a /. If your sure there'll never be any spaces other than the first and last ones between the = and /, you can use [^ /]* in place of [^/]* and there should be some small (probably immeasurable) performance benefit.

Get Text after word at specific position

I have file like this
TT;12-11-18;text;abc;def;word
AA;12-11-18;tee;abc;def;gih;word
TA;12-11-18;teet abc;def;word
TT;12-11-18;tdd;abc;def;gih;jkl;word
I want output like this
TT;12-11-18;text;abc;def;word
TA;12-11-18;teet abc;def;word
I want to get word if it occur at position 5 after date 12-11-18. I do not want this occurrence if its found after this position that is at 6th or 7th position. Count of position start from date 12-11-18
I want tried this command
cat file.txt|grep "word" -n1
This print all occurrence in which this pattern word is matched. How should I solve my problem?
Try this(GNU awk):
awk -F"[; ]" '/12-11-18/ && $6=="word"' file
Or sed one:
sed -n '/12-11-18;\([^; ]*[; ]\)\{3\}word/p' file
Or grep with basically the same regex(different escape):
grep -E "12-11-18;([^; ]*[; ]){3}word" file
[^; ] means any character that's not ; or (space).
* means match any repetition of former character/group.
-- [^; ]* means any length string that don't contain ; or space, the ^ in [^; ] is to negate.
[; ] means ; or space, either one occurance.
() is to group those above together.
{3} is to match three repetitives of former chracter/group.
As a whole ([^; ]*[; ]){3} means ;/space separated three fields included the delimiters.
As #kvantour points out, if there could be multiple spaces at one place they could be faulty.
To consider multiple spaces as one separator, then:
awk -F"(;| +)" '/12-11-18/ && $6=="word"'
and
grep -E "12-11-18;([^; ]*(;| +)){3}word"
or GNU sed (posix/bsd/osx sed does not support |):
sed -rn '/12-11-18;([^; ]*(;| +)){3}word/p'

Printing only 6 and 10 charcters words in a linux file

This is my file:
$cat filename
10023a,vija45,8877au,qwer65,guru12 0099888das,baburam123,ganeshan1,feild55512
What I tried to do is using the sed below command to get the output to be only 6 charcters words in that file
sed -ne 's/[a-z][0-9]\{6}/&/p' filename
it displaying all words and lines
Could you please any one help me on this..
Expected output is
vija45 baburam123
8877au ganeshan1
qwer65 feild55512
guru12
Use that:
tr "," "\n" <file | grep '^.\{6\}$\|^.\{10\}$'
First tr replaces all , with newlines, that we have each segment between the commas in a line.
Then grep searches for 6 or 10 character long lines and prints them.
With your given example, the output would then be:
10023a
vija45
8877au
qwer65
baburam123
feild55512
If guru12 0099888das must also be matched as a 6 character and a 10 character word, then just change the tr part to include also spaces:
tr ", " "\n" <file | grep '^.\{6\}$\|^.\{10\}$'
I suggest you to use grep for matching.
grep -o '\b\w\{6\}\b' file
sed '
# keep only 6 char word (and space) by removing less or more than 6 character word
s/.*/,&,/
s/[^[:space:],]\{11,\}//g;s/[[:space:],][^[:space:],][[:space:],]\{1,5\}/,/g;s/[[:space:],][^[:space:],][[:space:],]\{7,9\}/,/g
# clean space element
s/[[:space:],]\{2,\}/,/g;s/^[[:space:],]*//g;s/[[:space:],]*$//g
# remove empty line
/$[[:space:],]*$/d
# 1 word per line (optional)
y/ ,/\n\n/
' YourFile
Detail:
print all word of 6 letter find in lines (option for 1 word printed per output line)
self explained
adapted for , separated
Correction: forget some g and a small bug on small word removing and add 10 char word (take 6 only in first version)

Resources