optimize sed command for better performance

optimize sed command for better performance - linux

I have got the following lines in a file:
1231231213123123123|W3A|{ (ABCDE)="8=3AF.R.Y2=133AA=9WW=334MNFN=20120925-22:23:59.998
1231231213123123123|4GM|{ (ABCDE)="8=3AF.R.Y2=123AA=9WW=4AF013DCV=EXAMPLE=ABC
1231231213123123123|KYC|{ (ABCDE)="8=3AF.R.Y2=112AA=9WW=0002DDS=20120921-14:55:21
In order to get the value between '|' characters I am using:
sed -e 's/\(^.*|\)\(.*\)\(|.*$\)/\2/'
And output is:
W3A
4GM
KYC
Which is expected. But as file has thousands of records, sed command is taking a lot of time. Is there any way to improve the performance of this command?

Seems to me like you just want to use cut:
cut -d '|' -f 2 file
Set the delimiter to | and print the second field.

Since you're only keeping the 2nd parenthesized groups, you're making sed do unnecessary work by remembering the other stuff. Try
sed -e 's/^[^|]*|\([^|]*\)|.*/\1/'
Tom Fenech's answer should be a lot faster though.

Related

sed not working on a variable within a bash script; requesting a file. Simple example

If I declare a variable within a bash script, and then try to operate on it with sed, I keep getting errors. I've tried with double quotes, back ticks and avoiding single quotes on my variable. Here is what I'm essentially doing.
Call my script with multiple parameters
./myScript.sh apples oranges ilike,apples,oranges,bananas
My objective is to use sed to replace $3 "," with " ", then use wc -w to count how many words are in $3.
MyScript.sh
fruits="$3"
checkFruits= sed -i 's/,/ /g' <<< "$fruits"
echo $checkFruits
And the result after running the script in the terminal:
ilike,apples,oranges,bananas
sed: no input files
P.s. After countless google searches, reading suggestions and playing with my code, I simply cannot get this easy sample of code to work and I'm not really sure why. And I can't try to implement the wc -w until I move past this block.

You can do
fruits="$3"
checkFruits="${3//,/ }"
# or
echo "${3//,/ }"

The -i flag to sed requires a file argument, without it the sed command does what you expect.
However, I'd consider using tr instead of sed for this simple replacement:
fruits="$3"
checkFruits="$(tr , ' ' <<< $fruits)"
echo $checkFruits
Looking at the larger picture, do you want to count comma-separated strings, or the number of words once you have changed commas into spaces? For instance, do you want the string "i like,apples,oranges,and bananas" to return a count of 4, or 6? (This question is moot if you are 100% sure you will never have spaces in your input data.)
If 6, then the other answers (including mine) will already work.
However, if you want the answer to be 4, then you might want to do something else, like:
fruits="$3"
checkFruits="$(tr , \\n <<< $fruits)"
itemCount="$(wc -l <<< $checkFruits)"
Of course this can be condensed a little, but just throwing out the question as to what you're really doing. When asking a question here, it's good to post your expected results along with the input data and the code you've already used to try to solve the problem.

The -i option is for inplace editing of input file, you don't need it here.
To assign a command's output to a variable, use command expansion like var=$(command).
fruits="$3"
checkFruits=$(sed 's/,/ /g' <<< "$fruits")
echo $checkFruits

You don't need sed at all.
IFS=, read -a things <<< "$3"
echo "${#things[#]}"

Replace multiple commas with a single one - linux command

This is an output from my google csv contacts (which contains more than 1000 contacts):
A-Tech Computers Hardware,A-Tech Computers,,Hardware,,,,,,,,,,,,,,,,,,,,Low,,,* My Contacts,,,,,,,,,Home,+38733236313,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
I need a linux cli command to replace the duplicate commas, with single commas, so i get this:
A-Tech Computers Hardware,A-Tech Computers,Hardware,Low,* My Contacts,Home,+38733236313,
What I usually do in notepad++ is Replace ",," with "," six times.
I tried with:
cat googlecontacts.txt | sed -e 's/,,/,/g' -e 's/,,/,/g' -e 's/,,/,/g' -e 's/,,/,/g' -e 's/,,/,/g' -e 's/,,/,/g' > google.txt
But it doesn't work...
However, when I try it on smaller files (two lines) it works... :(
Help please!

Assuming your line still compliant after modification(not the concern of the question)
sed 's/,\{2,\}/,/g' googlecontacts.txt > google.txt
It replace any occurence greater than 1 of , by a single , any place on the line
any space between , is consider as a correct field, so not modified
In your command, you need to recursive change the character and not reexecute several time the same (there is always a gretear occurence possible) , like this
cat googlecontacts.txt | sed ':a
# make your change
s/,,/,/g
# if change occur, retry once again by returning to line :a
t a' > google.txt

You need the squeeze option of tr:
tr -s ',' < yourFile
You can see it in action like this:
echo hello,,there,,,,I,have,,too,many,,,commas | tr -s ,
hello,there,I,have,too,many,commas

This might work for you (GNU sed):
sed 's/,,*/,/g' file
or
sed 's/,\+/,/g' file

Thanks #potong, your solution worked for one of my requirement. I had to replace the | symbol in the first line of my file and used this solution with small change.
sed -i "1s/|'*//g" ${filename}
I was unable to add comments so thought of posting it as an answer. Please excuse

Removing specific strings from strings in a file

I want to remove specific fields in all strings in a semi-colon delimited file.
The file looks something like this :-
texta1;texta2;texta3;texta4;texta5;texta6;texta7
textb1;textb2;textb3;textb4;textb5;textb6;textb7
textc1;textc2;textc3;textc4;textc5;textc6;textc7
I would like to remove positions 2, 5 and 7 from all strings in the file.
Desired output :-
texta1;texta3;texta4;texta6
textb1;textb3;textb4;textb6
textc1;textc3;textc4;textc6
I am trying to write a small shell script using 'awk' but the code is not working as expected. I am still seeing the semicolons in between & at the end not being removed.
(Note- I was able to do it with 'sed' but my file has several hundred thousands of records & the sed code is taking a lot of time)
Could you please provide some help on this ? Thanks in advance.

Most simply with cut:
cut -d \; -f 1,3-4,6,8- filename
or
cut -d \; -f 2,5,7 --complement filename
I think --complement is GNU-specific, though. The 8- in the first example is not actually necessary for a file with only seven columns; it would include all columns from the eighth forward if they existed. I included it because it doesn't hurt and provides a more general solution to the problem.

I voted the answer by #Wintermute up, but if cut --complement is not available to you or you insist on using awk, then you can do:
awk -v scols=2,5,7 'BEGIN{FS=";"; OFS=";"} {
split(scols,acols,","); for(i in acols) $acols[i]=""; gsub(";;", ";"); print}' tmp.txt

sed replace whole line by line number - chaining commands

I need to replace a line in a file and struggling with it. Firstly I need to find a string in the file, store this line number as a variable and then replace the whole line using the variable with a new string.
I have tried a few variations of sed and currently the code I have is as follows:
line=$(grep -n "latitude" /tmp/system.cfg |cut -f1 -d:);
sed "{$line}s/.*/system.latitude=1.888888/" /tmp/system.cfg ;
I have run the first command and successfully set line. When echoing $line I get 176. However, the sed command does not seem to be replacing the line regardless of if I use the variable or manually place the 176 like so
sed "176s/.*/system.latitude=1.888888/" /tmp/system.cfg ;
I have also tried the following which seems to write the line to the file, but it adds the line, rather than overwriting the existing line:
sed -i $line'i'"system.latitude=1.76011" /tmp/system.cfg;
line=$(grep -n "latitude" /tmp/system.cfg |cut -f1 -d:);
I have also tried using single and double quotes to no avail. Can someone point me in the direction of where I am going wrong.

if I understood you right, you can just save the grep|cut, your requirement could be done in one shot with (gnu) sed:
sed -i '/latitude/s/.*/system.latitude=1.888888/' /tmp/system.cfg

Kent was absolutely correct when he mentioned about cutting the grep out of the equation completely. For anyone interested here is the final code using my method:
line=$(grep -n "latitude" /tmp/system.cfg |cut -f1 -d:);
sed -i "${line}s/.*/system.latitude=1.999/" /tmp/system.cfg;
But I will be going with Kent's answer for the final implementation.
This is one of those issues where you commit to a process and complicate matters further as pointed out by Tom:
This is what is known as an XY problem. You have decided upon a process (find the line, store to a variable, etc.) and asked how to do that, rather than simply asking "how do I replace the whole line that matches a pattern". Well done to Kent for getting to the bottom of it. – Tom Fenech

Replacing a line in a csv file?

I have a set of 10 CSV files, which normally have a an entry of this kind
a,b,c,d
d,e,f,g
Now due to some error entries in this file have become of this kind
a,b,c,d
d,e,f,g
,,,
h,i,j,k
Now I want to remove the line with only commas in all the files. These files are on a Linux filesystem.
Any command that you recommend that can replaces the erroneous lines in all the files.

It depends on what you mean by replace. If you mean 'remove', then a trivial variant on #wnoise's solution is:
grep -v '^,,,$' old-file.csv > new-file.csv
Note that this deletes just those lines with exactly three commas. If you want to delete mal-formed lines with any number of commas (including zero) - and no other characters on the line, then:
grep -v '^,*$' ...
There are endless other variations on the regex that would deal with other scenarios. Dealing with full CSV data with commas inside quotes starts to need something other than a regex machine. It can be done, within broad limits, especially in more complex regex systems such as PCRE or Perl. But it requires more work.
Check out Mastering Regular Expressions.

sed 's/,,,/replacement/' < old-file.csv > new-file.csv
optionally followed by
mv new-file.csv old-file.csv

Replace or remove, your post is not clear... For replacement see wnoise's answer. For removing, you could use
awk '$0 !~ /,,,/ {print}' <old-file.csv > new-file.csv

What about trying to keep only lines which are matching the desired format instead of handling one exception ?
If the provided input is what you really want to match:
grep -E '[a-z],[a-z],[a-z],[a-z]' < oldfile.csv > newfile.csv
If the input is different, provide it, the regular expression should not be too hard to write.

Do you want to replace them with something, or delete them entirely? Either way, it can be done with sed. To delete:
sed -i -e '/^,\+$/ D' yourfile1.csv yourfile2.csv ...
To replace: well, see wnoise's answer, or if you don't want to create new files with the output,
sed -i -e '/^,\+$/ s//replacement/' yourfile1.csv yourfile2.csv ...
or
sed -i -e '/^,\+$/ c\
replacement' yourfile1.csv yourfile2.csv ...
(that should be entered exactly as is, including the line break). Of course, you can also do this with awk or perl or, if you're only deleting lines, even grep:
egrep -v '^,+$' < oldfile.csv > newfile.csv
I tested these to make sure they work, but I'd advise you to do the same before using them (just in case). You can omit the -i option from sed, in which case it'll print out the results (rather than writing them back to the file), or omit the output redirection >newfile.csv from grep.
EDIT: It was pointed out in a comment that some features of these sed commands only work on GNU sed. As far as I can tell, these are the -i option (which can be replaced with shell redirection, sed ... <infile >outfile ) and the \+ modifier (which can be replaced with \{1,\} ).

Most simply:
$ grep -v ,,,, oldfile > newfile
$ mv newfile oldfile

yes, awk or grep are very good option if you are working in linux platform. However you can use perl regex for other platform. using join & split options.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

optimize sed command for better performance - linux

Seems to me like you just want to use cut: cut -d '|' -f 2 file Set the delimiter to | and print the second field.

Since you're only keeping the 2nd parenthesized groups, you're making sed do unnecessary work by remembering the other stuff. Try sed -e 's/^[^|]|\([^|]\)|.*/\1/' Tom Fenech's answer should be a lot faster though.

Related

sed not working on a variable within a bash script; requesting a file. Simple example

Replace multiple commas with a single one - linux command

Removing specific strings from strings in a file

sed replace whole line by line number - chaining commands

Replacing a line in a csv file?

Categories

Resources

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

optimize sed command for better performance - linux

Seems to me like you just want to use cut: cut -d '|' -f 2 file Set the delimiter to | and print the second field.

Since you're only keeping the 2nd parenthesized groups, you're making sed do unnecessary work by remembering the other stuff. Try sed -e 's/^[^|]*|\([^|]*\)|.*/\1/' Tom Fenech's answer should be a lot faster though.

Related

sed not working on a variable within a bash script; requesting a file. Simple example

Replace multiple commas with a single one - linux command

Removing specific strings from strings in a file

sed replace whole line by line number - chaining commands

Replacing a line in a csv file?

Categories

Resources

Since you're only keeping the 2nd parenthesized groups, you're making sed do unnecessary work by remembering the other stuff. Try sed -e 's/^[^|]|\([^|]\)|.*/\1/' Tom Fenech's answer should be a lot faster though.