awk / sed script to remove text - text

I am currently in need of a way to programmatically remove some text from Makefiles that I am dealing with. Now the problem is that (for whatever reason) the makefiles are being generated with link commands of -l<full_path_to_library>/<library_name> when they should be generated with -l<library_name>. So what I need is a script to find all occurrences of -l/ and then remove up to and including the next /.
Example of what I'm dealing with
-l/home/user/path/to/boost/lib/boost_filesystem
I need it to be
-lboost_filesystem
As could be imagined this is a stop gap measure until I fix the real problem (on the generation side) but in the meantime it would be a great help to me if this could work and I am not too good with my awk and sed.
Thanks for any help.

sed -i 's|-l[^ ]*/\([^/ ]*\)|-l\1|g' Makefile

Here you go
echo "-l/home/user/path/to/boost/lib/boost_filesystem" | awk -F"/" '{ print $1 $NF } '

Related

Awk pattern always matches last record?

I'm in the process of switching from zsh to bash, and I need to produce a bash script that can remove duplicate entries in $PATH without reordering the entries (thus no sort -d magic). zsh has some nice array handling shortcuts that made it easy to do this efficiently, but I'm not aware of such shortcuts in bash. I came across this answer which has gotten me 90% of the way there, but there is a small problem that I would like to understand better. It appears that when I run that awk command, the last record processed incorrectly matches the pattern.
$ awk 'BEGIN{RS=ORS=":"}!a[$0]++' <<<"aa:bb:cc:aa:bb:cc"
aa:bb:cc:cc
$ awk 'BEGIN{RS=ORS=":"}!a[$0]++' <<<"aa:bb:cc:aa:bb"
aa:bb:cc:bb
$ awk 'BEGIN{RS=ORS=":"}!a[$0]++' <<<"aa:bb:cc:aa:bb:cc:" # note trailing colon
aa:bb:cc:
I don't understand awk well enough to know why it behaves this way, but I have managed to work around the issue by using an intermediate array like so.
array=($(awk 'BEGIN{RS=":";ORS=" "}!a[$0]++' <<<"aa:bb:cc:aa:bb:cc:"))
# Use a subshell to avoid modifying $IFS in current context
echo $(export IFS=":"; echo "${array[*]}")
aa:bb:cc
This seems like a sub-optimal solution however, so my question is: did I do something wrong in the awk command that is causing false positive matches on the final record processed?
The last record in your original string is cc\n which is different from cc. When unsure what's happening in any program in any language, adding some print statements is step 1 to debugging/investigating:
$ awk 'BEGIN{RS=ORS=":"} {print "<"$0">"}' <<<"aa:bb:cc:aa:bb:cc"
<aa>:<bb>:<cc>:<aa>:<bb>:<cc
>:$
If you want the RS to be : or \n then just state that (with GNU awk at least):
$ awk 'BEGIN{RS="[:\n]"; ORS=":"} !a[$0]++' <<<"aa:bb:cc:aa:bb:cc"
aa:bb:cc:$
The $ in all of the above is my prompt.
Another possible workaround instead of your bash array solution
$ echo "aa:bb:cc:aa:bb:cc" | tr ':' '\n' | awk '!a[$0]++' | paste -sd:
aa:bb:cc

optimize sed command for better performance

I have got the following lines in a file:
1231231213123123123|W3A|{ (ABCDE)="8=3AF.R.Y2=133AA=9WW=334MNFN=20120925-22:23:59.998
1231231213123123123|4GM|{ (ABCDE)="8=3AF.R.Y2=123AA=9WW=4AF013DCV=EXAMPLE=ABC
1231231213123123123|KYC|{ (ABCDE)="8=3AF.R.Y2=112AA=9WW=0002DDS=20120921-14:55:21
In order to get the value between '|' characters I am using:
sed -e 's/\(^.*|\)\(.*\)\(|.*$\)/\2/'
And output is:
W3A
4GM
KYC
Which is expected. But as file has thousands of records, sed command is taking a lot of time. Is there any way to improve the performance of this command?
Seems to me like you just want to use cut:
cut -d '|' -f 2 file
Set the delimiter to | and print the second field.
Since you're only keeping the 2nd parenthesized groups, you're making sed do unnecessary work by remembering the other stuff. Try
sed -e 's/^[^|]*|\([^|]*\)|.*/\1/'
Tom Fenech's answer should be a lot faster though.

Removing specific strings from strings in a file

I want to remove specific fields in all strings in a semi-colon delimited file.
The file looks something like this :-
texta1;texta2;texta3;texta4;texta5;texta6;texta7
textb1;textb2;textb3;textb4;textb5;textb6;textb7
textc1;textc2;textc3;textc4;textc5;textc6;textc7
I would like to remove positions 2, 5 and 7 from all strings in the file.
Desired output :-
texta1;texta3;texta4;texta6
textb1;textb3;textb4;textb6
textc1;textc3;textc4;textc6
I am trying to write a small shell script using 'awk' but the code is not working as expected. I am still seeing the semicolons in between & at the end not being removed.
(Note- I was able to do it with 'sed' but my file has several hundred thousands of records & the sed code is taking a lot of time)
Could you please provide some help on this ? Thanks in advance.
Most simply with cut:
cut -d \; -f 1,3-4,6,8- filename
or
cut -d \; -f 2,5,7 --complement filename
I think --complement is GNU-specific, though. The 8- in the first example is not actually necessary for a file with only seven columns; it would include all columns from the eighth forward if they existed. I included it because it doesn't hurt and provides a more general solution to the problem.
I voted the answer by #Wintermute up, but if cut --complement is not available to you or you insist on using awk, then you can do:
awk -v scols=2,5,7 'BEGIN{FS=";"; OFS=";"} {
split(scols,acols,","); for(i in acols) $acols[i]=""; gsub(";;", ";"); print}' tmp.txt

renaming files using loop in unix

I have a situation here.
I have lot of files like below in linux
SIPTV_FIPTV_ID00$line_T20141003195717_C0000001000_FWD148_IPV_001.DATaac
SIPTV_FIPTV_ID00$line_T20141003195717_C0000001000_FWD148_IPV_001.DATaag
I want to remove the $line and make a counter from 0001 to 6000 for my 6000 such files in its place.
Also i want to remove the trailer 3 characters after this is done for each file.
After fix file should be like
SIPTV_FIPTV_ID0000001_T20141003195717_C0000001000_FWD148_IPV_001.DAT
SIPTV_FIPTV_ID0000002_T20141003195717_C0000001000_FWD148_IPV_001.DAT
Please help.
With some assumption, I think this should do it:
1. list of the files is in a file named input.txt, one file per line
2. the code is running in the directory the files are in
3. bash is available
awk '{i++;printf "mv \x27"$0"\x27 ";printf "\x27"substr($0,1,16);printf "%05d", i;print substr($0,22,47)"\x27"}' input.txt | bash
from the command prompt give the following command
% echo *.DAT??? | awk '{
old=$0;
sub("\\$line",sprintf("%4.4d",++n));
sub("...$","");
print "mv", old, $1}'
%
and check the output, if it looks OK
% echo *.DAT??? | awk '{
old=$0;
sub("\\$line",sprintf("%4.4d",++n));
sub("...$","");
print "mv", old, $1}' | sh
%
A commentary: echo *.DAT??? is meant to give as input to awk a list of all the filenames that you want to modify, you may want something more articulated if the example names you gave aren't representative of the whole spectrum... regarding the awk script itself, I used sprintf to generate a string with the correct number of zeroes for the replacement of $line, the idiom `"\\$..." with two backslashes to quote the dollar sign is required by gawk and does no harm in mawk, and as a last remark I have to say that in similar cases I prefer to make at least a dry run before passing the commands to the shell...

Replacing a line in a csv file?

I have a set of 10 CSV files, which normally have a an entry of this kind
a,b,c,d
d,e,f,g
Now due to some error entries in this file have become of this kind
a,b,c,d
d,e,f,g
,,,
h,i,j,k
Now I want to remove the line with only commas in all the files. These files are on a Linux filesystem.
Any command that you recommend that can replaces the erroneous lines in all the files.
It depends on what you mean by replace. If you mean 'remove', then a trivial variant on #wnoise's solution is:
grep -v '^,,,$' old-file.csv > new-file.csv
Note that this deletes just those lines with exactly three commas. If you want to delete mal-formed lines with any number of commas (including zero) - and no other characters on the line, then:
grep -v '^,*$' ...
There are endless other variations on the regex that would deal with other scenarios. Dealing with full CSV data with commas inside quotes starts to need something other than a regex machine. It can be done, within broad limits, especially in more complex regex systems such as PCRE or Perl. But it requires more work.
Check out Mastering Regular Expressions.
sed 's/,,,/replacement/' < old-file.csv > new-file.csv
optionally followed by
mv new-file.csv old-file.csv
Replace or remove, your post is not clear... For replacement see wnoise's answer. For removing, you could use
awk '$0 !~ /,,,/ {print}' <old-file.csv > new-file.csv
What about trying to keep only lines which are matching the desired format instead of handling one exception ?
If the provided input is what you really want to match:
grep -E '[a-z],[a-z],[a-z],[a-z]' < oldfile.csv > newfile.csv
If the input is different, provide it, the regular expression should not be too hard to write.
Do you want to replace them with something, or delete them entirely? Either way, it can be done with sed. To delete:
sed -i -e '/^,\+$/ D' yourfile1.csv yourfile2.csv ...
To replace: well, see wnoise's answer, or if you don't want to create new files with the output,
sed -i -e '/^,\+$/ s//replacement/' yourfile1.csv yourfile2.csv ...
or
sed -i -e '/^,\+$/ c\
replacement' yourfile1.csv yourfile2.csv ...
(that should be entered exactly as is, including the line break). Of course, you can also do this with awk or perl or, if you're only deleting lines, even grep:
egrep -v '^,+$' < oldfile.csv > newfile.csv
I tested these to make sure they work, but I'd advise you to do the same before using them (just in case). You can omit the -i option from sed, in which case it'll print out the results (rather than writing them back to the file), or omit the output redirection >newfile.csv from grep.
EDIT: It was pointed out in a comment that some features of these sed commands only work on GNU sed. As far as I can tell, these are the -i option (which can be replaced with shell redirection, sed ... <infile >outfile ) and the \+ modifier (which can be replaced with \{1,\} ).
Most simply:
$ grep -v ,,,, oldfile > newfile
$ mv newfile oldfile
yes, awk or grep are very good option if you are working in linux platform. However you can use perl regex for other platform. using join & split options.

Resources