Updating file with sed makes characters appear at line end - linux

I have a properties file. I need to update few field.
dirParam.dprop_release_version=4.1.1
dirParam.dprop_writeToFile=true
So I ran the below command
$ sed -i -e "/dirParam\.dprop_release_version=/ s/=.*/=4\.10\.10/" -e "/dirParam\.dprop_writeToFile=/ s/=.*/=false/" file.properties
after update, below was what I found. It's updating...
^M
^M
dirParam.dprop_release_version=4.10.10
dirParam.dprop_writeToFile=false^M
^M
But, I could see "^M" char at the end of the line. what is this and where I am I going wrong.
Note : Red hat linux

^M is 0x0d, CARRIAGE RETURN. ([1]) You are looking at Windows-style line ends in your original file.
Unix machines use only 0x0a, LINE FEED, while Windows uses 0x0d 0x0a.
When you first looked at the file, there were only Windows line ends in there, and the program you used to look at the file filtered the carriage returns out for easier viewing.
But this here...
"/dirParam\.dprop_release_version=/ s/=.*/=4\.10\.10/"
...removed everything from the = onward, and that includes the carriage return (as your sed considers that to be just another byte before the line feed). You end up with a file having mixed line ends (most with carriage return, one without), so the program you use to look at the file displays the carriage returns where present (note the absence in the version line):
^M
dirParam.dprop_release_version=4.10.10
dirParam.dprop_writeToFile=false^M
The easiest solution is to apply dos2unix to the file, as #Cyrus suggested. If that is not available, sed 's/\r$//' will do the same.
[1]: Why is 0x0d displayed as ^M? For the same reason 0x00 is displayed as ^#... M is 0x4d, # is 0x40. (The non-displayable character plus 0x40, "escaped" by a ^.)

Related

Why ^M character is getting appended at end of each line in linux?

I am doing find-replace operations using "sed" in Linux. I have a XML file in Linux named config.xml. The file contains data as followed-
<CATALOG>
<SERVER>
<URL value="http://ip-172-44-0-92.compute.internal:440/" />
</SERVER>
</CATALOG>
I want to find a line in the config.xml file that contains <URL value= and replaces the entire line with <URL value="http://ip-181-40-10-72.compute.internal:440/" />
I tried it by executing the command-
sed -i '/<URL value=/c\<URL value=\"http://ip-181-40-10-72.compute.internal:440/\" />' config.xml
The command executes correctly and does find replace operation but, when I open the file using vi config.xml I see ^M character at the end of each lines that were not replaced. Why did this happened and, how to fix it?
EDIT-
By referring to #Atalajaka's answer...
My original file contains CRLF line endings at the end of each line. And, sed replaces the line with LF line ending. As a result, all other unreplaced lines will still have CRLF ending and 'vi' editor will now show ^M at the end of these lines.
So the solution is to replace CRLF endings with LF endings by running the command-
sed -i $'s/\r$//' config.xml
The issue could rely on the original XML file, in case it has CRLF endings for each line. If that is the case, vim will recognize and hide them, so that they remain unimportant to you. Assuming this, all new lines added with vim will contain those same CRLF line-endings.
The sed command uses LF line-endings when adding any new lines, so when that happens, vim sees the two different line-endings, and will assume the LF line-endings as the regular ones. This means that all CR line-endings will be displayed as ^M.
If you have access to the original XML file before being edited, you can open it in vim and check if you see [dos] at the footer, right next to the file name, something as:
$ vim original_xml.xml
...
"original_xml" [dos] ...
Source.

What is the difference between ^M$ and just $ in the end of the line in Linux

In a text I have lines which end with the ^M$" character at the end and some lines that end with just $. If I understand it correctly ^M means new line and $ just points to the end of the line. So what do ^M and $ together (^M$) mean?
This is a problem of line ending.
For basic texts, in Windows, conventions tell to end a line with \r\n characters.
In Unix world, conventions tell to end a line with a \n character.
The ^M you see is just a way to display \r characters, which have no particuliar sense under such system.
For the $ you see, I guess that it correspond to the \n character.
you can use dos2unix command to convert the windows specific encoding into linux format

Understanding sed

I am trying to understand how
sed 's/\^\[/\o33/g;s/\[1G\[/\[27G\[/' /var/log/boot
worked and what the pieces mean. The man page I read just confused me more and I tried the info sai Id but had no idea how to work it! I'm pretty new to Linux. Debian is my first distro but seemed like a rather logical place to start as it is a root of many others and has been around a while so probably is doing stuff well and fairly standardized. I am running Wheezy 64 bit as fyi if needed.
The sed command is a stream editor, reading its file (or STDIN) for input, applying commands to the input, and presenting the results (if any) to the output (STDOUT).
The general syntax for sed is
sed [OPTIONS] COMMAND FILE
In the shell command you gave:
sed 's/\^\[/\o33/g;s/\[1G\[/\[27G\[/' /var/log/boot
the sed command is s/\^\[/\o33/g;s/\[1G\[/\[27G\[/' and /var/log/boot is the file.
The given sed command is actually two separate commands:
s/\^\[/\o33/g
s/\[1G\[/\[27G\[/
The intent of #1, the s (substitute) command, is to replace all occurrences of '^[' with an octal value of 033 (the ESC character). However, there is a mistake in this sed command. The proper bash syntax for an escaped octal code is \nnn, so the proper way for this sed command to have been written is:
s/\^\[/\033/g
Notice the trailing g after the replacement string? It means to perform a global replacement; without it, only the first occurrence would be changed.
The purpose of #2 is to replace all occurrences of the string \[1G\[ with \[27G\[. However, this command also has a mistake: a trailing g is needed to cause a global replacement. So, this second command needs to be written like this:
s/\[1G\[/\[27G\[/g
Finally, putting all this together, the two sed commands are applied across the contents of the /var/log/boot file, where the output has had all occurrences of ^[ converted into \033, and the strings \[1G\[ have been converted to \[27G\[.

Linux replace ^M$ with $ in csv

I have received a csv file from a ftp server which I am ingesting into a table.
While ingesting the file I am receiving the error "File was a truncated file"
The actual reason is the data in a file contains $ and ^M$ in end of the line.
e.g :
ACT_RUN_TM, PROG_RUN_TM, US_HE_DT*^M$*
"CONFIRMED","","3600"$
How can I remove these $ and ^M$ from end of the line using linux command.
The ultimately correct solution is to transfer the file from the FTP server in text mode rather than binary mode, which does the appropriate end-of-line conversion for you. Change your download scripts or FTP application configuration to enable text transfers to fix this in future.
Assuming this is a one-shot transfer and you have already downloaded the file and just want to fix it, you can use tr(1) to translate characters. So to remove all control-M characters from a file, you can pipe through tr -d '\r'. Or if you want to replace them with control-J instead – for example you would do this if the file came from a pre-OSX Mac system — do tr '\r' '\n'.
It's odd to see ^M as not-the-last character, but:
sed -e 's/^M*\$$//g' <badfile >goodfile
Or use "sed -i" to update in-place.
(Note that "^M" is entered on the command line by pressing CTRL-V CTRL_M).
Update: It's been established that the question is wrong as the "^M$" are not in the file but displayed with VI. He actually wants to change CRLF pairs to just LF.
sed -e 's/^M$//g' <badfile >goodfile

multiple end of file $'s in a single file

I copy pasted some enum values from my IntelliJ IDE in windows to notepad, saved the file in a shared drive, then opened it up in a linux box. When I did cat -A on the file it showed something like:
A,B,C,^M$
D,E,F,^M$
G,H,I,^M$
After searching around I figured that ^M is the carriage return and $ means the last line of the file. I'm just puzzled at how this file is able to have multiple $'s.
From man cat on my GNU box:
-A, --show-all
equivalent to -vET
(snip)
-E, --show-ends
display $ at end of each line
Thus, there are multiple $s because there are multiple lines, each with an end.
$ is the end of line marker with cat -A, not end of file.
This is indicating the file has Windows-style line endings (carriage return followed by line feed) and not Unix-style (only line feed).
(You can convert text files from one format to the other using the programs dos2unix or unix2dos.)

Resources