Get Text after word at specific position - linux

I have file like this
TT;12-11-18;text;abc;def;word
AA;12-11-18;tee;abc;def;gih;word
TA;12-11-18;teet abc;def;word
TT;12-11-18;tdd;abc;def;gih;jkl;word
I want output like this
TT;12-11-18;text;abc;def;word
TA;12-11-18;teet abc;def;word
I want to get word if it occur at position 5 after date 12-11-18. I do not want this occurrence if its found after this position that is at 6th or 7th position. Count of position start from date 12-11-18
I want tried this command
cat file.txt|grep "word" -n1
This print all occurrence in which this pattern word is matched. How should I solve my problem?

Try this(GNU awk):
awk -F"[; ]" '/12-11-18/ && $6=="word"' file
Or sed one:
sed -n '/12-11-18;\([^; ]*[; ]\)\{3\}word/p' file
Or grep with basically the same regex(different escape):
grep -E "12-11-18;([^; ]*[; ]){3}word" file
[^; ] means any character that's not ; or (space).
* means match any repetition of former character/group.
-- [^; ]* means any length string that don't contain ; or space, the ^ in [^; ] is to negate.
[; ] means ; or space, either one occurance.
() is to group those above together.
{3} is to match three repetitives of former chracter/group.
As a whole ([^; ]*[; ]){3} means ;/space separated three fields included the delimiters.
As #kvantour points out, if there could be multiple spaces at one place they could be faulty.
To consider multiple spaces as one separator, then:
awk -F"(;| +)" '/12-11-18/ && $6=="word"'
and
grep -E "12-11-18;([^; ]*(;| +)){3}word"
or GNU sed (posix/bsd/osx sed does not support |):
sed -rn '/12-11-18;([^; ]*(;| +)){3}word/p'

Related

Is it possible replace the value of a cell in a csv file using grep,sed or both

I have written the following command
#!/bin/bash
awk -v value=$newvalue -v row=$rownum -v col=1 'BEGIN{FS=OFS=","} NR==row {$col=value}1' "${file}".csv >> temp.csv && mv temp.csv "${file}".csv
Sample Input of file.csv
Header,1
Field1,Field2,Field3
1,ABC,4567
2,XYZ,7890
Assuiming $newvalue=3 ,$rownum=4 and col=1, then the above code will replace:
Required Output
Header,1
Field1,Field2,Field3
1,ABC,4567
3,XYZ,7890
So if I know the row and column, is it possible to replace the said value using grep, sed?
Edit1: Field3 will always have a unique value for their respective rows. ( in case that info helps anyway)
Assuming your CSV file is as simple as what you show (no commas in quoted fields), and your newvalue does not contain characters that sed would interpret in a special way (e.g. ampersands, slashes or backslashes), the following should work with just sed (tested with GNU sed):
sed -Ei "$rownum s/[^,]*/$newvalue/$col" file.csv
Demo:
$ cat file.csv
Header,1
Field1,Field2,Field3
1,ABC,4567
3,XYZ,7890
$ rownum=3
$ col=2
$ newvalue="NEW"
$ sed -Ei "$rownum s/[^,]*/$newvalue/$col" file.csv
$ cat file.csv
Header,1
Field1,Field2,Field3
1,NEW,4567
3,XYZ,7890
Explanations: $rownum is used as the address (here the line number) where to apply the following command. s is the sed substitute command. [^,]* is the regular expression to search for and replace: the longest possible string not containing a comma. $newvalue is the replacement string. $col is the occurrence to replace.
If newvalue may contain ampersands, slashes or backslashes we must sanitize it first:
sanitizednewvalue=$(sed -E 's/([/\&])/\\\1/g' <<< "$newvalue")
sed -Ei "$rownum s/[^,]*/$sanitizednewvalue/$col" file.csv
Demo:
$ newvalue='NEW&\/&NEW'
$ sanitizednewvalue=$(sed -E 's/([/\&])/\\\1/g' <<< "$newvalue")
$ echo "$sanitizednewvalue"
NEW\&\\\/\&NEW
$ sed -Ei "$rownum s/[^,]*/$sanitizednewvalue/$col" file.csv
$ cat file.csv
Header,1
Field1,Field2,Field3
1,NEW&\/&NEW,4567
3,XYZ,7890
With sed, how about:
#!/bin/bash
newvalue=3
rownum=4
col=1
sed -i -E "${rownum} s/(([^,]+,){$((col-1))})[^,]+/\\1${newvalue}/" file.csv
Result of file.csv
Header,1
Field1,Field2,Field3
1,ABC,4567
3,XYZ,7890
${rownum} matches the line number.
(([^,]+,){n}) matches the n-time repetition of the group of
non-comma characters followed by a comma. Then it should be the substring
before the target (to be substituted) column by assigning n to
col - 1.
Let's Try to Implement sed command
Let us consider a sample CSV file with the following content:
$ cat file
Solaris,25,11
Ubuntu,31,2
Fedora,21,3
LinuxMint,45,4
RedHat,12,5
To remove the 1st field or column :
$ sed 's/[^,]*,//' file
25,11
31,2
21,3
45,4
12,5
This regular expression searches for a sequence of non-comma([^,]*) characters and deletes them which results in the 1st field getting removed.
To print only the last field, OR remove all fields except the last field:
$ sed 's/.*,//' file
11
2
3
4
5
This regex removes everything till the last comma(.*,) which results in deleting all the fields except the last field.
To print only the 1st field:
$ sed 's/,.*//' file
Solaris
Ubuntu
Fedora
LinuxMint
RedHat
This regex(,.*) removes the characters starting from the 1st comma till the end resulting in deleting all the fields except the last field.
To delete the 2nd field:
$ sed 's/,[^,]*,/,/' file
Solaris,11
Ubuntu,2
Fedora,3
LinuxMint,4
RedHat,5
The regex (,[^,]*,) searches for a comma and sequence of characters followed by a comma which results in matching the 2nd column, and replaces this pattern matched with just a comma, ultimately ending in deleting the 2nd column.
Note: To delete the fields in the middle gets more tougher in sed since every field has to be matched literally.
To print only the 2nd field:
$ sed 's/[^,]*,\([^,]*\).*/\1/' file
25
31
21
45
12
The regex matches the first field, second field and the rest, however groups the 2nd field alone. The whole line is now replaced with the 2nd field(\1), hence only the 2nd field gets displayed.
Print only lines in which the last column is a single digit number:
$ sed -n '/.*,[0-9]$/p' file
Ubuntu,31,2
Fedora,21,3
LinuxMint,45,4
RedHat,12,5
The regex (,[0-9]$) checks for a single digit in the last field and the p command prints the line which matches this condition.
To number all lines in the file:
$ sed = file | sed 'N;s/\n/ /'
1 Solaris,25,11
2 Ubuntu,31,2
3 Fedora,21,3
4 LinuxMint,45,4
5 RedHat,12,5
This is simulation of cat -n command. awk does it easily using the special variable NR. The '=' command of sed gives the line number of every line followed by the line itself. The sed output is piped to another sed command to join every 2 lines.
Replace the last field by 99 if the 1st field is 'Ubuntu':
$ sed 's/\(Ubuntu\)\(,.*,\).*/\1\299/' file
Solaris,25,11
Ubuntu,31,99
Fedora,21,3
LinuxMint,45,4
RedHat,12,5
This regex matches 'Ubuntu' and till the end except the last column and groups each of them as well. In the replacement part, the 1st and 2nd group along with the new number 99 is substituted.
Delete the 2nd field if the 1st field is 'RedHat':
$ sed 's/\(RedHat,\)[^,]*\(.*\)/\1\2/' file
Solaris,25,11
Ubuntu,31,2
Fedora,21,3
LinuxMint,45,4
RedHat,,5
The 1st field 'RedHat', the 2nd field and the remaining fields are grouped, and the replacement is done with only 1st and the last group , resuting in getting the 2nd field deleted.
To insert a new column at the end(last column) :
$ sed 's/.*/&,A/' file
Solaris,25,11,A
Ubuntu,31,2,A
Fedora,21,3,A
LinuxMint,45,4,A
RedHat,12,5,A
The regex (.*) matches the entire line and replacing it with the line itself (&) and the new field.
To insert a new column in the beginning(1st column):
$ sed 's/.*/A,&/' file
A,Solaris,25,11
A,Ubuntu,31,2
A,Fedora,21,3
A,LinuxMint,45,4
A,RedHat,12,5
Same as last example, just the line matched is followed by the new column
I hope this will help. Let me know if you need to use Awk or any other command.
Thank you

Deleting characters from permutations and character combinations

To delete particular characters from a combination list.
printf "%s\n" {a..c}{a..d} | sed 's/^cc//' | tr -s '\n'
I used the code above to delete a particular line of character from combination. Is there a way I can do it without sed, awk, grep or bc. Can I get it done with a single line of code in the script?
If you have stored your values in an array, e.g.:
arr=({a..c}{a..d})
Then you may filter your array elements with a string substitution:
printf -- '%s\n' "${arr[#]/%cc/}"
The syntax ${arr[#]/%cc/} tells to parse all elements from the array arr and substitute %cc with nothing. The % character indicates the beginning of the string, similar to ^ in sed, thus %cc means "every string beginning with cc".

How to remove all data from a file before a line containing string by passing variable in linux

I am trying to trim the data above the line from a file, where line containing some string by passing variable to it
varfile=$(cat variable.txt)
echo "$varfile"
if [ -z "$varfile" ]; then
echo "null"
else
echo "data"
sed "1,/$varfile/d" fileee.txt
fi
Here I am taking a string from variable.txt file and trying to find that text in fileee.txt file and removing all the data above the line
EX: variable.txt has 3
I am finding 3 in fileee.txt and removing data above three
INPUT:
1
2
3
4
OUTPUT:
3
4
I suppose the issue here is that you want to remove all lines before the match, but not the matching line itself?
One way, with GNU sed, is to explicitly add a print for the matching line first:
pattrn=3
seq 1 4 | sed -e "/$pattrn/p;1,/$pattrn/d"
Though this will duplicate any further lines that match the pattern.
Better, invert the sense of the match:
seq 1 4 | sed -ne "/$pattrn/,\$p"
That is, don't print by default (-n), but print (p) anything from a match to the end ($, escaped because of the double-quoted string)
Even better would be to use awk:
pattrn=3
seq 1 4 | awk -vpat="$pattrn" '$0 ~ pat {p=1} p'
This sets a flag on the line where the whole line ($0) matches the pattern (~ is a regex match), then prints the lines whenever that flag is set.
The awk solution is also better in that special characters in the pattern don't cause issues (at least not as many); in the sed case, if the pattern contains a slash /, it will terminate the regex in the sed code, and cause syntax errors or allow for code injection.
I used seq from GNU coreutils here only to make up the sequence of numbers for input.

Two pattern match on same sed command

I have the following sed command:
sed -n '/^out(/{n;p}' ${filename} | sed -n '/format/ s/.*format=//g; s/),$//gp; s/))$//gp'
I tried to do it as one line as in:
sed -n '/^out(/{n;}; /format/ s/.*format=//g; s/),$//gp; s/))$//gp' ${filename}
But that also display the lines I don't want (those that do not match).
What I have is a file with some strings as in:
entry(variable=value)),
format(variable=value)),
entry(variable=value)))
out(variable=value)),
format(variable=value)),
...
I just want the format lines that came right after the out entry. and remove those trailing )) or ),
You can use this sed command:
sed -nr '/^out[(]/ {n ; s/.*[(]([^)]+)[)].*/\1/p}' your_file
Once a out is found, it advanced to the next line (n) and uses the s command with p flag to extract only what is inside parenthesises.
Explanation:
I used [(] instead of \(. Outside brackets a ( usually means grouping, if you want a literal (, you need to escape it as \( or you can put it inside brackets. Most RE special characters dont need escaping when put inside brackets.
([^)]+) means a group (the "(" here are RE metacharacters not literal parenthesis) that consists of one or more (+) characters that are not (^) ) (literal closing parenthesis), the ^ inverts the character class [ ... ]

Need oneliner to insert character after Nth occurrence of delimiter on 2nd line in unix

Hello I need oneliner to insert character after Nth occurrence of delimiter on 2nd line in unix; The criteria are these.
Find position of nth occurrence of the delimiter.
Insert character after the nth occurrence.
This is on the 2nd line only.
Note: I am doing this in Linux.
With awk :
INPUT FILE
1 foo bar base
2 foo bar base
CODE
awk 'NR==2{$2=$2"X"; print}' file
you can specify a delimiter with -F
NR to specify the line we work on
$2 is the 2th value separated by space (in this case)
$2=$2"X" is a concatenation
print alone print the entire line
OUTPUT
2 fooX bar base
Suppose we have the input file:
$ cat file
1 foo bar base
2 foo bar base
To insert the character X after the 3rd occurrence of the delimiter space, use:
$ sed -r '2 s/([^ ]* ){3}/&X/' file
1 foo bar base
2 foo bar Xbase
To make the change to the file in place, use sed's -i option:
sed -i -r '2 s/([^ ]* ){3}/&X/' file
How it works
Consider the sed command:
2 s/([^ ]* ){3}/&X/
The initial 2 instructs sed to apply this command only to the second line.
We are using the s or substitute command. This command has the form s/old/new/ where old and new are:
old is the regular expression ([^ ]* ){3}. This matches everything up to and including the third occurrence of space.
new is the replacement text, &X. The ampersand refers to what we matched in old, which is all the line up to and including the third space. The X is the new character that we are inserting.
This might work for you (GNU sed):
sed '2s/X/&Y/3' file
This inserts Y after the third occurence of X on the second line only.

Resources