how to sed spacial character if it come inside double quote in linux file - linux

I have txt file delimited by comma (,) and each column quoted by double quote
what I want to do is :
I need to keep the delimiter as comma but I want to remove each comma come into double pair quote (as each column around by double quote)
sample on input and output file I want
input file :
"2022111812160156601777153","","","false","test1",**"here the , issue , that comma comma come inside the column"**
the output as I want :
"2022111812160156601777153","","","false","test1",**"here the issue that comma comma come inside the column"**
what I try :
sed -i ':a' -e 's/\("[^"]*\),\([^"]*"\)/\1~\2/;ta' test.txt
but above sed command replace all comma not only the comma that come inside the column
is there are way to do it ?

Using sed
$ sed -Ei.bak ':a;s/((^|,)(\*+)?"[^"]*),/\1/;ta' input_file
"2022111812160156601777153","","","false","test1",**"here the issue that comma comma come inside the column"**

Any time you find yourself using more than s, g, and p (with -n) in sed you'd be better off using awk for some combination of clarity, robustness, efficiency, portability, etc.
Using any awk in any shell on every Unix box:
$ awk 'BEGIN{FS=OFS="\""} {for (i=2; i<=NF; i+=2) gsub(/,/,"",$i)} 1' file
"2022111812160156601777153","","","false","test1",**"here the issue that comma comma come inside the column"**
Just like GNU sed has -i as in your question to update the input file with the command's output, GNU awk has -i inplace, or just add > tmp && mv tmp file with any awk or any other Unix command.

This might work for you (GNU sed):
sed -E ':a;s/^(("[^",]*"\**,?\**)*"[^",]*),/\1/;ta' file
This iterates through each line removing any commas within paired double quoted fields.
N.B. The solution above also caters for double quoted field prefixed/suffixed by zero or *'s. If this should not be catered for, here is an ameliorated solution:
sed -E ':a;s/^(("[^",]*",?)*"[^",]*),/\1/;ta' file
N.B. Escaped double quotes and commas would need a or more involved regexp.

Related

Remove double quotes within the column value using Unix

I am working on Processing a (90 Cols) CSV File - Semicolon Separated (;) {case can be ignore and I am aware file standard is a mess but I am helpless in that regards}
Input Rows :
"AAAAA";"ABABDBDA";"ASDASDA"asads";"123";"456"
"AAAAA";"ABABDBDA";"12322AAasd"asads";"123";"456"
"Lmnop";"asdasads";"mer";"123;2343;asa"dwd";"456"
Output Expected :
"AAAAA";"ABABDBDA";"ASDASDA asads";"123";"456"
"AAAAA";"ABABDBDA";"12322AAasd asads";"123";"456"
"Lmnop";"asdasads";"mer";"123;2343;asa dwd";"456"
(Double Quote can be replaced by Space or blank). {Kindly note - even though this is ';' seperated file some rows have ';' within quoted data for a column.
Issue : In the rows - I am getting an extra Double Quote within the quoted data.
Please advise me on how to handle this in Unix.
one trick you can use is to remove " not around the field boundaries. A simple sed script can be
$ sed -E 's/([^;])"([^;])/\1 \2/g' file
note that if you allow escaped quote marks is you fields, this is going to remove them as well.
note the example below in the comments which is not covered with one round of the sed. Due to greedy match a single char can't be a condition for both matches, so "a"b"c"; won't work correctly.
What would you think of the following solution:
Replace all ";" by ;
Remove all remaining "
Replace all ; back into ";"
Add additional " characters, at the beginning and at the end of every line.
The whole thing can be done with tr or sed or whatever command you prefer.
mawk 'NF*(gsub(__," ",$!(NF=NF))^_ +gsub(OFS,FS) +gsub("^ | $",__))' \
__='\42' FS='\442\73\42' OFS='\31\17'
"AAAAA";"ABABDBDA";"ASDASDA asads";"123";"456"
"AAAAA";"ABABDBDA";"12322AAasd asads";"123";"456"
"Lmnop";"asdasads";"mer";"123;2343;asa dwd";"456"
This transform is easy to do using tool which provide regular expression with zero-length assertions (lookbehind and lookahead), as you applied unix tag there is good chance you have perl command and therefore I propose following solution, let file.txt content be
"AAAAA";"ABABDBDA";"ASDASDA"asads";"123";"456"
"AAAAA";"ABABDBDA";"12322AAasd"asads";"123";"456"
"Lmnop";"asdasads";"mer";"123;2343;asa"dwd";"456"
then
perl -p -e 's/(?<=[[:alnum:]])"(?=[[:alnum:]])/ /g' file.txt
gives output
"AAAAA";"ABABDBDA";"ASDASDA asads";"123";"456"
"AAAAA";"ABABDBDA";"12322AAasd asads";"123";"456"
"Lmnop";"asdasads";"mer";"123;2343;asa dwd";"456"
Explanation: I inform perl that I want to use it sed-style via -p -e then I provide substitution (s): " which is after alphanumeric character (letter or digit) and before alphanumeric should be replaced using space character. This is applied to all such " that is globally (g).
Note: you might elect to port that answer to any other tools which does provide ability to replace regular expression with zero-length assertions.
(tested in perl 5, version 26, subversion 3)
When you consider the combination ";" as a delimiter, you can use
awk -F '";"' '{
printf "\"";
for (i=1;i<NF;i++) {
gsub("\"","", $i);
printf("%s\";\"",$i)
};
print $NF
}' inputfile
This might work for you (GNU sed):
sed -E ':a;s/^(("[^"]*";)*"[^"]*)"([^;])/\1 \3/;ta' file
Iterate starting from the start of the line, match zero or more correctly double quoted fields followed by an incorrect double quote and replace that double quote by a space.

How to extract and replace columns with a multi-character delimiter?

I got a file with ^$ as delimiter, the text is like :
tony^$36^$developer^$20210310^$CA
I want to replace the datetime.
I tried awk -F '\^\$' '{print $4}' file.txt | sed -i '/20210310/20221210/' , but it returns nothing. Then I tried the awk part, it returns nothing, I guess it still treat the line as a whole and the delimiter doesn't work. Wondering why and how to solve it?
A simple solution would be:
sed 's/\^\$/\n/g; s/20210310/20221210/g' -i file.txt
which will modify the file to separate each section to a new line.
If you need a different delimiter, change the \n in the command to maybe space or , .. up to you.
And it will also replace the date in the file.
If you want to see the changes, and really modify the file, remove the -i from the command.
When I run your awk command, I get these warnings:
awk: warning: escape sequence `\^' treated as plain `^'
awk: warning: escape sequence `\$' treated as plain `$'
That explains why your output is blank: the field delimiter is interpreted as the regular expression '^$', which matches a completely blank line (only). As a result, each non-blank line of input is without any field separators, and therefore has only a single field. $4 can be non-empty only if there are at least four fields.
You can fix that by escaping the backslashes:
awk -F '\\^\\$' '{print $4}' file.txt
If all you want to do is print the modified datecodes py themselves, then that should get you going. However, the question ...
How to extract and replace columns with a multi-character delimiter?
... sounds like you may want actually to replace the datecode within each line, keeping the rest intact. In that case, it is a non-starter for the awk command to discard the other parts of the line. You have several options here, but two of the more likely would be
instead of sending field 4 out to sed for substitution, do the sub in the awk script, and then reconstitute the input line by printing all fields, with the expected delimiters. (This is left as an exercise.) OR
do the whole thing in sed:
sed -E 's/^((([^^]|\^[^$])*\^\$){3})20210310(\^\$.*)/\120221210\4/' file.txt
If you wanted to modify file.txt in-place then you could add the -i flag (which, on the other hand, is not useful in your original command, where sed's input is coming from a pipe rather than a file).
The -E option engages the POSIX extended regex dialect, which allows the given regex to be more readable (the alternative would require a bunch more \ characters).
Overall, presuming that there are five or more fields delimited by literal '^$' strings, and the fourth contains exactly "20210310", that matches the first three fields, including their trailing delimiters, and captures them all as group 1; matches the leading delimiter of the fifth field and all the remainder of the line and captures it as group 4; and substitutes replaces the whole line with group 1 followed by the new datecode followed by group 4.

SED: Insert a string with special character

I want to INSERT a string with "'" as special character in multiple files. All the files in which I want to insert have a line after which I want to perform my INSERT.
eg:
File Before INSERT:
...
FROM LOCAL :LOAD_FILE
REJECTED DATA :REJECT_PATH
...
File After INSERT:
...
FROM LOCAL :LOAD_FILE
DELIMITER AS '|'
REJECTED DATA :REJECT_PATH
...
I've tried writing down many SED commands but they are generating errors. One of them is:
sed 'LOAD_FILE/a/ DELIMITER AS \'\|\'/g' SOURCE > DESTINATION
awk -v line='DELIMITER AS '"'|'"'' '1; /LOAD_FILE/{print line }' input
FROM LOCAL :LOAD_FILE
DELIMITER AS '|'
REJECTED DATA :REJECT_PATH
Using surrounding double quotes:
sed "/FROM LOCAL :LOAD_FILE/s//&\nDELIMITER AS '|'/" file
or single quotes (safer to avoid unwanted variable expansion):
sed '/FROM LOCAL :LOAD_FILE/s//&\nDELIMITER AS '"'|'"'/' file
This might work for you (GNU sed):
sed '/LOAD_FILE/aDELIMITER AS '\'\|\' file
This appends the line DELIMITER AS \'\|\' following the match on LOAD_FILE
N.B. The sed command is in two parts, the /LOAD_FILE/aDELIMITER AS is concatenated with \'\|\'
If you prefer:
sed 's/LOAD_FILE/&\nDELIMITER AS '\'\|\''/' file
Another way of putting it :
sed -e ':a;N;$!ba;s/LOAD_FILE\n/LOAD_FILE\nDELIMITER AS \x27|\x27\n/g'
about syntax I used :
How can I replace a newline (\n) using sed?

replace old-link-url to new-link-url with sed

I'm writing a script in bash that would replace old-link-url to new-link-url
my problem is that sed can't replace the url because of the slashes. If i put just some text it works.
my code
sed -e s/"$old_link"/"$new_link"/g wget2.html > playlist.txt
sed supports any character as separator, so if the pattern you are trying to replace contains /, use a different separator. Most commonly used are # and |
sed 's|foo|bar|g' input
sed 's#foo#bar#g' input
Don't forget to put double quotes if you are using variables in sed substitution. Also, if your variable have / then use a different delimiter for sed. You can use _, %, |, # and many more.
So may be something like this would work -
sed -e "s_"$old_link"_"$new_link"_g" wget2.html > playlist.txt

Replace whitespace with a comma in a text file in Linux

I need to edit a few text files (an output from sar) and convert them into CSV files.
I need to change every whitespace (maybe it's a tab between the numbers in the output) using sed or awk functions (an easy shell script in Linux).
Can anyone help me? Every command I used didn't change the file at all; I tried gsub.
tr ' ' ',' <input >output
Substitutes each space with a comma, if you need you can make a pass with the -s flag (squeeze repeats), that replaces each input sequence of a repeated character that is listed in SET1 (the blank space) with a single occurrence of that character.
Use of squeeze repeats used to after substitute tabs:
tr -s '\t' <input | tr '\t' ',' >output
Try something like:
sed 's/[:space:]+/,/g' orig.txt > modified.txt
The character class [:space:] will match all whitespace (spaces, tabs, etc.). If you just want to replace a single character, eg. just space, use that only.
EDIT: Actually [:space:] includes carriage return, so this may not do what you want. The following will replace tabs and spaces.
sed 's/[:blank:]+/,/g' orig.txt > modified.txt
as will
sed 's/[\t ]+/,/g' orig.txt > modified.txt
In all of this, you need to be careful that the items in your file that are separated by whitespace don't contain their own whitespace that you want to keep, eg. two words.
without looking at your input file, only a guess
awk '{$1=$1}1' OFS=","
redirect to another file and rename as needed
What about something like this :
cat texte.txt | sed -e 's/\s/,/g' > texte-new.txt
(Yes, with some useless catting and piping ; could also use < to read from the file directly, I suppose -- used cat first to output the content of the file, and only after, I added sed to my command-line)
EDIT : as #ghostdog74 pointed out in a comment, there's definitly no need for thet cat/pipe ; you can give the name of the file to sed :
sed -e 's/\s/,/g' texte.txt > texte-new.txt
If "texte.txt" is this way :
$ cat texte.txt
this is a text
in which I want to replace
spaces by commas
You'll get a "texte-new.txt" that'll look like this :
$ cat texte-new.txt
this,is,a,text
in,which,I,want,to,replace
spaces,by,commas
I wouldn't go just replacing the old file by the new one (could be done with sed -i, if I remember correctly ; and as #ghostdog74 said, this one would accept creating the backup on the fly) : keeping might be wise, as a security measure (even if it means having to rename it to something like "texte-backup.txt")
This command should work:
sed "s/\s/,/g" < infile.txt > outfile.txt
Note that you have to redirect the output to a new file. The input file is not changed in place.
sed can do this:
sed 's/[\t ]/,/g' input.file
That will send to the console,
sed -i 's/[\t ]/,/g' input.file
will edit the file in-place
Here's a Perl script which will edit the files in-place:
perl -i.bak -lpe 's/\s+/,/g' files*
Consecutive whitespace is converted to a single comma.
Each input file is moved to .bak
These command-line options are used:
-i.bak edit in-place and make .bak copies
-p loop around every line of the input file, automatically print the line
-l removes newlines before processing, and adds them back in afterwards
-e execute the perl code
If you want to replace an arbitrary sequence of blank characters (tab, space) with one comma, use the following:
sed 's/[\t ]+/,/g' input_file > output_file
or
sed -r 's/[[:blank:]]+/,/g' input_file > output_file
If some of your input lines include leading space characters which are redundant and don't need to be converted to commas, then first you need to get rid of them, and then convert the remaining blank characters to commas. For such case, use the following:
sed 's/ +//' input_file | sed 's/[\t ]+/,/g' > output_file
This worked for me.
sed -e 's/\s\+/,/g' input.txt >> output.csv

Resources