Linux replace ^M$ with $ in csv - linux

I have received a csv file from a ftp server which I am ingesting into a table.
While ingesting the file I am receiving the error "File was a truncated file"
The actual reason is the data in a file contains $ and ^M$ in end of the line.
e.g :
ACT_RUN_TM, PROG_RUN_TM, US_HE_DT*^M$*
"CONFIRMED","","3600"$
How can I remove these $ and ^M$ from end of the line using linux command.

The ultimately correct solution is to transfer the file from the FTP server in text mode rather than binary mode, which does the appropriate end-of-line conversion for you. Change your download scripts or FTP application configuration to enable text transfers to fix this in future.
Assuming this is a one-shot transfer and you have already downloaded the file and just want to fix it, you can use tr(1) to translate characters. So to remove all control-M characters from a file, you can pipe through tr -d '\r'. Or if you want to replace them with control-J instead – for example you would do this if the file came from a pre-OSX Mac system — do tr '\r' '\n'.

It's odd to see ^M as not-the-last character, but:
sed -e 's/^M*\$$//g' <badfile >goodfile
Or use "sed -i" to update in-place.
(Note that "^M" is entered on the command line by pressing CTRL-V CTRL_M).
Update: It's been established that the question is wrong as the "^M$" are not in the file but displayed with VI. He actually wants to change CRLF pairs to just LF.
sed -e 's/^M$//g' <badfile >goodfile

Related

syntax error near unexpected token ' - bash

I have a written a sample script on my Mac
#!/bin/bash
test() {
echo "Example"
}
test
exit 0
and this works fine by displaying Example
When I run this script on a RedHat machine, it says
syntax error near unexpected token '
I checked that bash is available using
cat /etc/shells
which bash shows /bin/bash
Did anyone come across the same issue ?
Thanks in advance !
It could be a file encoding issue.
I have encountered file type encoding issues when working on files between different operating systems and editors - in my case particularly between Linux and Windows systems.
I suggest checking your file's encoding to make sure it is suitable for the target linux environment. I guess an encoding issue is less likely given you are using a MAC than if you had used a Windows text editor, however I think file encoding is still worth considering.
--- EDIT (Add an actual solution as recommended by #Potatoswatter)
To demonstrate how file type encoding could be this issue, I copy/pasted your example script into Notepad in Windows (I don't have access to a Mac), then copied it to a linux machine and ran it:
jdt#cookielin01:~/windows> sh ./originalfile
./originalfile: line 2: syntax error near unexpected token `$'{\r''
'/originalfile: line 2: `test() {
In this case, Notepad saved the file with carriage returns and linefeeds, causing the error shown above. The \r indicates a carriage return (Linux systems terminate lines with linefeeds \n only).
On the linux machine, you could test this theory by running the following to strip carriage returns from the file, if they are present:
cat originalfile | tr -d "\r" > newfile
Then try to run the new file sh ./newfile . If this works, the issue was carriage returns as hidden characters.
Note: This is not an exact replication of your environment (I don't have access to a Mac), however it seems likely to me that the issue is that an editor, somewhere, saved carriage returns into the file.
--- /EDIT
To elaborate a little, operating systems and editors can have different file encoding defaults. Typically, applications and editors will influence the filetype encoding used, for instance, I think Microsoft Notepad and Notepad++ default to Windows-1252. There may be newline differences to consider too (In Windows environments, a carriage return and linefeed is often used to terminate lines in files, whilst in Linux and OSX, only a Linefeed is usually used).
A similar question and answer that references file encoding is here: bad character showing up in bash script execution
try something like
$ sudo apt-get install dos2unix
$ dos2unix offendingfile
Easy way to convert example.sh file to UNIX if you are working in Windows is to use NotePad++ (Edit>EOL Conversion>UNIX/OSX Format)
You can also set the default EOL in notepad++ (Settings>Preferences>New Document/Default Directory>select Unix/OSX under the Format box)
Thanks #jdt for your answer.
Following that, and since I keep having this issue with carriage return, I wrote that small script. Only run carriage_return and you'll be prompted for the file to "clean".
https://gist.github.com/kartonnade/44e9842ed15cf21a3700
alias carriage_return=remove_carriage_return
remove_carriage_return(){
# cygwin throws error like :
# syntax error near unexpected token `$'{\r''
# due to carriage return
# this function runs the following
# cat originalfile | tr -d "\r" > newfile
read -p "File to clean ? "
file_to_clean=$REPLY
temp_file_to_clean=$file_to_clean'_'
# file to clean => temporary clean file
remove_carriage_return_one='cat '$file_to_clean' | tr -d "\r" > '
remove_carriage_return_one=$remove_carriage_return_one$temp_file_to_clean
# temporary clean file => new clean file
remove_carriage_return_two='cat '$temp_file_to_clean' | tr -d "\r" > '
remove_carriage_return_two=$remove_carriage_return_two$file_to_clean
eval $remove_carriage_return_one
eval $remove_carriage_return_two
# remove temporary clean file
eval 'rm '$temp_file_to_clean
}
I want to add to the answer above is how to check if it is carriage return issue in Unix like environment (I tested in MacOS)
1) Using cat
cat -e my_file_name
If you see the lines ended with ^M$, then yes, it is the carriage return issue.
2) Find first line with carriage return character
grep -r $'\r' Grader.sh | head -1
3) Using vim
vim my_file_name
Then in vim, type
:set ff
If you see fileformat=dos, then the file is from a dos environment which contains a carriage return.
After finding out, you can use the above mentioned methods by other people to correct your file.
I had the same problem when i was working with armbian linux and Windows .
i was trying to coppy my codes from windows to armbian and when i run it this Error Pops Up. My problem Solved this way :
1- try to Coppy your files from windows using WinSCP .
2- make sure that your file name does not have () characters

Linux command to replace string in HUGE file with another string

I have a huge file (8GB), I want replace on the first 30 lines the String LATIN1 with UTF-8 what is the most efficient method? Means exist there a way to use probably sed but to quit after parsed first 30 lines.
VIM was not able to save the file in 3 hours.
The problem is that in the event of a replacement, all programs will make a copy of the file with the substitution in place in order to replace the original file ultimately -- they don't want to risk losing the original for obvious reasons.
With perl, you can do this in a one-liner, but that doesn't make it any shorter (well, it probably does compared to vim, since vim preserves history in yet another file, which perl doesn't):
perl -pi -e 's,\bLATIN1\b,UTF-8,g if $. <= 30' thefile
With sed, you can quit using q:
sed -e 's/LATIN1/UTF-8/g' -e 30q
untested, but I think ed will edit the file in-place without writing to a temp file.
ed yourBigFile << END
1,30s/LATIN1/UTF-8/g
w
q
END

multiple end of file $'s in a single file

I copy pasted some enum values from my IntelliJ IDE in windows to notepad, saved the file in a shared drive, then opened it up in a linux box. When I did cat -A on the file it showed something like:
A,B,C,^M$
D,E,F,^M$
G,H,I,^M$
After searching around I figured that ^M is the carriage return and $ means the last line of the file. I'm just puzzled at how this file is able to have multiple $'s.
From man cat on my GNU box:
-A, --show-all
equivalent to -vET
(snip)
-E, --show-ends
display $ at end of each line
Thus, there are multiple $s because there are multiple lines, each with an end.
$ is the end of line marker with cat -A, not end of file.
This is indicating the file has Windows-style line endings (carriage return followed by line feed) and not Unix-style (only line feed).
(You can convert text files from one format to the other using the programs dos2unix or unix2dos.)

How to remove ^M (CRLF) from w file sent from Windows to linux FTP server in perl?

I'm sending a comma delimited file (in ASCII) via Net::FTP in perl (generated on Windows) to a linux based FTP account. The issue is that my file on the linux side has ^M at the end of each line. I know I can remove these by calling a
dos2unix" command on that file but how do I remove ^M on the windows side so that I send a correct file in the first place.
I tried doing the below but that doesn't affect the file on the linux side.
$content =~ s/^M//g;
If you had "^","M", then s/\^M//g would work. ("^" is special in regex patterns.) If you had a CR, then s/\r\n/\n/g (or just s/\r//g) would work.
If neither work, please provide a portion of "od -c" of your data file.
When you are writing the file:
open my $fh, '>:raw', $file or die "could not open $file: $!\n";
See perldoc -f binmode.

How can I replace a specific line by line number in a text file?

I have a 2GB text file on my linux box that I'm trying to import into my database.
The problem I'm having is that the script that is processing this rdf file is choking on one line:
mismatched tag at line 25462599, column 2, byte 1455502679:
<link r:resource="http://www.epuron.de/"/>
<link r:resource="http://www.oekoworld.com/"/>
</Topic>
=^
I want to replace the </Topic> with </Line>. I can't do a search/replace on all lines but I do have the line number so I'm hoping theres some easy way to just replace that one line with the new text.
Any ideas/suggestions?
sed -i yourfile.xml -e '25462599s!</Topic>!</Line>!'
sed -i '25462599 s|</Topic>|</Line>|' nameoffile.txt
The tool for editing text files in Unix, is called ed (as opposed to sed, which as the name implies is a stream editor).
ed was once intended as an interactive editor, but it can also easily scripted. The way ed works, is that all commands take an address parameter. The way to address a specific line is just the line number, and the way to change the addressed line(s) is the s command, which takes the same regexp that sed would. So, to change the 42nd line, you would write something like 42s/old/new/.
Here's the entire command:
FILENAME=/path/to/whereever
LINENUMBER=25462599
ed -- "${FILENAME}" <<-HERE
${LINENUMBER}s!</Topic>!</Line>!
w
q
HERE
The advantage of this is that ed is standardized, while the -i flag to sed is a proprietary GNU extension that is not available on a lot of systems.
Use "head" to get the first 25462598 lines and use "tail" to get the remaining lines (starting at 25462601). Though... for a 2GB file this will likely take a while.
Also are you sure the problem is just with that line and not somewhere previous (ie. the error looks like an XML parse error which might mean the actual problem is someplace else).
My shell script:
#!/bin/bash
awk -v line=$1 -v new_content="$2" '{
if (NR == line) {
print new_content;
} else {
print $0;
}
}' $3
Arguments:
first: line number you want change
second: text you want instead original line contents
third: file name
This script prints output to stdout then you need to redirect. Example:
./script.sh 5 "New fifth line text!" file.txt
You can improve it, for example, by taking care that all your arguments has expected values.

Resources