How to remove ^M (CRLF) from w file sent from Windows to linux FTP server in perl? - linux

I'm sending a comma delimited file (in ASCII) via Net::FTP in perl (generated on Windows) to a linux based FTP account. The issue is that my file on the linux side has ^M at the end of each line. I know I can remove these by calling a
dos2unix" command on that file but how do I remove ^M on the windows side so that I send a correct file in the first place.
I tried doing the below but that doesn't affect the file on the linux side.
$content =~ s/^M//g;

If you had "^","M", then s/\^M//g would work. ("^" is special in regex patterns.) If you had a CR, then s/\r\n/\n/g (or just s/\r//g) would work.
If neither work, please provide a portion of "od -c" of your data file.

When you are writing the file:
open my $fh, '>:raw', $file or die "could not open $file: $!\n";
See perldoc -f binmode.

Related

sed command working on command line but not in perl script

I have a file in which i have to replace all the words like $xyz and for them i have to substitutions like these:
$xyz with ${xyz}.
$abc_xbs with ${abc_xbc}
$ab,$cd with ${ab},${cd}
This file also have some words like ${abcd} which i don't have to change.
I am using this command
sed -i 's?\$([A-Z_]+)?\${\1}?g' file
its working fine on command line but not inside a perl script as
sed -i 's?\$\([A-Z_]\+\)?\$\{\1\}?g' file;
What i am missing?
I think adding some backslashes would help.I tried adding some but no success.
Thanks
In a Perl script you need valid Perl language, just like you need valid C text in a C program. In the terminal sed.. is understood and run by the shell as a command but in a Perl program it is just a bunch of words, and that line sed.. isn't valid Perl.
You would need this inside qx() (backticks) or system() so that it is run as an external command. Then you'd indeed need "some backslashes," which is where things get a bit picky.
But why run a sed command from a Perl script? Do the job with Perl
use warnings;
use strict;
use File::Copy 'move';
my $file = 'filename';
my $out_file = 'new_' . $file;
open my $fh, '<', $file or die "Can't open $file: $!";
open my $fh_out, '>', $out_file or die "Can't open $out_file: $!";
while (<$fh>)
{
s/\$( [^{] [a-z_]* )/\${$1}/gix;
print $fh_out $_;
}
close $fh_out;
close $fh;
move $out_file, $file or die "Can't move $out_file to $file: $!";
The regex uses a negated character class, [^...], to match any character other than { following $, thus excluding already braced words. Then it matches a sequence of letters or underscore, as in the question (possibly none, since the first non-{ already provides at least one).
With 5.14+ you can use the non-destructive /r modifier
print $fh_out s/\$([^{][a-z_]*)/\${$1}/gir;
with which the changed string is returned (and original is unchanged), right for the print.
The output file, in the end moved over the original, should be made using File::Temp. Overwriting the original this way changes $file's inode number; if that's a concern see this post for example, for how to update the original inode.
A one-liner (command-line) version, to readily test
perl -wpe's/\$([^{][a-z_]*)/\${$1}/gi' file
This only prints to console. To change the original add -i (in-place), or -i.bak to keep backup.
A reasonable question of "Isn't there a shorter way" came up.
Here is one, using the handy Path::Tiny for a file that isn't huge so we can read it into a string.
use warnings;
use strict;
use Path::Tiny;
my $file = 'filename';
my $out_file = 'new_' . $file;
my $new_content = path($file)->slurp =~ s/\$([^{][a-z_]*)/\${$1}/gir;
path($file)->spew( $new_content );
The first line reads the file into a string, on which the replacement runs; the changed text is returned and assigned to a variable. Then that variable with new text is written out over the original.
The two lines can be squeezed into one, by putting the expression from the first instead of the variable in the second. But opening the same file twice in one (complex) statement isn't exactly solid practice and I wouldn't recommend such code.
However, since module's version 0.077 you can nicely do
path($file)->edit_lines( sub { s/\$([^{][a-z_]*)/\${$1}/gi } );
or use edit to slurp the file into a string and apply the callback to it.
So this cuts it to one nice line after all.
I'd like to add that shaving off lines of code mostly isn't worth the effort while it sure can lead to trouble if it disturbs the focus on the code structure and correctness even a bit. However, Path::Tiny is a good module and this is legitimate, while it does shorten things quite a bit.

syntax error near unexpected token ' - bash

I have a written a sample script on my Mac
#!/bin/bash
test() {
echo "Example"
}
test
exit 0
and this works fine by displaying Example
When I run this script on a RedHat machine, it says
syntax error near unexpected token '
I checked that bash is available using
cat /etc/shells
which bash shows /bin/bash
Did anyone come across the same issue ?
Thanks in advance !
It could be a file encoding issue.
I have encountered file type encoding issues when working on files between different operating systems and editors - in my case particularly between Linux and Windows systems.
I suggest checking your file's encoding to make sure it is suitable for the target linux environment. I guess an encoding issue is less likely given you are using a MAC than if you had used a Windows text editor, however I think file encoding is still worth considering.
--- EDIT (Add an actual solution as recommended by #Potatoswatter)
To demonstrate how file type encoding could be this issue, I copy/pasted your example script into Notepad in Windows (I don't have access to a Mac), then copied it to a linux machine and ran it:
jdt#cookielin01:~/windows> sh ./originalfile
./originalfile: line 2: syntax error near unexpected token `$'{\r''
'/originalfile: line 2: `test() {
In this case, Notepad saved the file with carriage returns and linefeeds, causing the error shown above. The \r indicates a carriage return (Linux systems terminate lines with linefeeds \n only).
On the linux machine, you could test this theory by running the following to strip carriage returns from the file, if they are present:
cat originalfile | tr -d "\r" > newfile
Then try to run the new file sh ./newfile . If this works, the issue was carriage returns as hidden characters.
Note: This is not an exact replication of your environment (I don't have access to a Mac), however it seems likely to me that the issue is that an editor, somewhere, saved carriage returns into the file.
--- /EDIT
To elaborate a little, operating systems and editors can have different file encoding defaults. Typically, applications and editors will influence the filetype encoding used, for instance, I think Microsoft Notepad and Notepad++ default to Windows-1252. There may be newline differences to consider too (In Windows environments, a carriage return and linefeed is often used to terminate lines in files, whilst in Linux and OSX, only a Linefeed is usually used).
A similar question and answer that references file encoding is here: bad character showing up in bash script execution
try something like
$ sudo apt-get install dos2unix
$ dos2unix offendingfile
Easy way to convert example.sh file to UNIX if you are working in Windows is to use NotePad++ (Edit>EOL Conversion>UNIX/OSX Format)
You can also set the default EOL in notepad++ (Settings>Preferences>New Document/Default Directory>select Unix/OSX under the Format box)
Thanks #jdt for your answer.
Following that, and since I keep having this issue with carriage return, I wrote that small script. Only run carriage_return and you'll be prompted for the file to "clean".
https://gist.github.com/kartonnade/44e9842ed15cf21a3700
alias carriage_return=remove_carriage_return
remove_carriage_return(){
# cygwin throws error like :
# syntax error near unexpected token `$'{\r''
# due to carriage return
# this function runs the following
# cat originalfile | tr -d "\r" > newfile
read -p "File to clean ? "
file_to_clean=$REPLY
temp_file_to_clean=$file_to_clean'_'
# file to clean => temporary clean file
remove_carriage_return_one='cat '$file_to_clean' | tr -d "\r" > '
remove_carriage_return_one=$remove_carriage_return_one$temp_file_to_clean
# temporary clean file => new clean file
remove_carriage_return_two='cat '$temp_file_to_clean' | tr -d "\r" > '
remove_carriage_return_two=$remove_carriage_return_two$file_to_clean
eval $remove_carriage_return_one
eval $remove_carriage_return_two
# remove temporary clean file
eval 'rm '$temp_file_to_clean
}
I want to add to the answer above is how to check if it is carriage return issue in Unix like environment (I tested in MacOS)
1) Using cat
cat -e my_file_name
If you see the lines ended with ^M$, then yes, it is the carriage return issue.
2) Find first line with carriage return character
grep -r $'\r' Grader.sh | head -1
3) Using vim
vim my_file_name
Then in vim, type
:set ff
If you see fileformat=dos, then the file is from a dos environment which contains a carriage return.
After finding out, you can use the above mentioned methods by other people to correct your file.
I had the same problem when i was working with armbian linux and Windows .
i was trying to coppy my codes from windows to armbian and when i run it this Error Pops Up. My problem Solved this way :
1- try to Coppy your files from windows using WinSCP .
2- make sure that your file name does not have () characters

Reading data from properties file in shell script

I am writing a shell script which reads data from a properties file and stores in into a local variable in shell script. The problem is when i am trying to read multiple properties from the file and form a string its getting over written
#!/bin/bash
. /opt/oracle/scripts/user.properties
echo $username
echo $password
echo $service_name
conn=$username$password$service_name
echo $conn
the values of username=xxxx password=yyyy and service_name=zzzz i expect the output to be
xxxxyyyyzzzz
but instead of that i am getting the output as
zzzz
please tell me where am i doing the mistake ?
I'm certain that the file /opt/oracle/scripts/user.properties contains CR+LF line endings. (Running the file command for the properties file would say ... with CRLF line terminators). Changing those to LF using dos2unix or any other utility should make it work.
Moreover, instead of saying:
conn=$username$password$service_name
you could say:
conn="${username}${password}${service_name}"

Using Perl Win7 to write a file for Linux and having only Linux line endings

This Perl script is running on Win7, modifying a Clearcase config spec that will be read on a Linux machine. Clearcase is very fussy about its line endings, they must be precisely and only \n (0x0A) however try as I may I cannot get Perl to spit out only \n endings, they usually come out \r\n (0x0D 0x0A)
Here's the Perl snippet, running over an array of config spec elements and converting element /somevob/... bits into element /vobs/somevob/... and printing to a file handle.
$fh = new FileHandle;
foreach my $line (#cs_array)
{
$line =~ s/([element|load])(\s+\/)(.+)/$1$2vobs\/$3/g;
$line =~ s/[\r\n]/\n/g; # tried many things here
$fh->print($line);
}
$fh->close();
Sometimes the elements in the array are multi-line and separated by \n
element /vob1/path\nelement\n/vob2/path\nload /vob1/path\n element\n
/vob3/path
load /vob3/path
When I look into the file written on Win7 in a binary viewer there is always a 0x0D 0x0A newline sequence which Clearcase on Linux complains about. This appears to come from the print.
Any suggestions? I thought this would be a 10 minute job...
Try
$fh->binmode;
Otherwise you're probably in text mode, and for Windows this means that \n is translated to \r\n.
You are running afoul of the :crlf IO Layer that is the default for Perl on Windows.
You can use binmode after the fact to remove this layer, or you can open the filehandle with :raw (the default layer for *nix) or some other appropriate IO Layer in the 1st place.
Sample:
$fh = FileHandle->new($FileName, '>:raw')
Check perldoc open for more details on IO Layers.

Linux replace ^M$ with $ in csv

I have received a csv file from a ftp server which I am ingesting into a table.
While ingesting the file I am receiving the error "File was a truncated file"
The actual reason is the data in a file contains $ and ^M$ in end of the line.
e.g :
ACT_RUN_TM, PROG_RUN_TM, US_HE_DT*^M$*
"CONFIRMED","","3600"$
How can I remove these $ and ^M$ from end of the line using linux command.
The ultimately correct solution is to transfer the file from the FTP server in text mode rather than binary mode, which does the appropriate end-of-line conversion for you. Change your download scripts or FTP application configuration to enable text transfers to fix this in future.
Assuming this is a one-shot transfer and you have already downloaded the file and just want to fix it, you can use tr(1) to translate characters. So to remove all control-M characters from a file, you can pipe through tr -d '\r'. Or if you want to replace them with control-J instead – for example you would do this if the file came from a pre-OSX Mac system — do tr '\r' '\n'.
It's odd to see ^M as not-the-last character, but:
sed -e 's/^M*\$$//g' <badfile >goodfile
Or use "sed -i" to update in-place.
(Note that "^M" is entered on the command line by pressing CTRL-V CTRL_M).
Update: It's been established that the question is wrong as the "^M$" are not in the file but displayed with VI. He actually wants to change CRLF pairs to just LF.
sed -e 's/^M$//g' <badfile >goodfile

Resources