Unix to linux migration - unix2dos

Unix to linux migration - unix2dos - linux

I am migrating my code from UNIX to LINUX and I know unix2dos command is not available in SUSE 11.0. Please let me know if
alias unix2dos='recode lat1..ibmpc'
will have the same effect as that of unix2dos?

I'm not sure about recode, but sed -i 's/$/\r/' would do the same thing as unix2dos.

Using recode seems risky, it's used for converting character sets, not just changing newline characters. I haven't used recode but I'm afraid that you could break your files if they are encoded in e.g. UTF-8 instead of Latin1.
I don't know what packages are found in SUSE, but perhaps there is one of the alternatives to unix2dos, like todos (on Debian found in package tofromdos).
Using sed as other suggested should work well. If you are really worried about performance, you could compare sed's performance to awk '{print $0 "\r"}' - I can't say for sure which will be faster in your case but it won't hurt to measure both on a sample of your files.

Load your file(s) into vim, then
:set ff=dos -- Changes to DOS line endings (CRLF)
:set ff=unix -- Changes to unix line endings (LF)

Related

Is there a way to let vi run editing commands in a file?

Let's say I have a file "edit_commands" with some editing commands, like:
:1,$s/good/better/g
:1,$s/bad/worse/g
Is there a way to let vi load and run the commands in "edit_commands"?
Update:
It appears the "-S" option can be used for this. Refer to:
How to run a series of vim commands from command prompt

It seems like you would want to use a program like sed which shares a
common ancestor with vi (i.e. the 'ed' editor) for such a task:
sed -i 's/good/better/g; s/bad/worse/g' your_file
See this great sed tutorial.
Is there a reason you need to use vi to do it? You could use perl if you need more advanced regex capabilities.

The solution in Perl may look this way:
perl -i.old -pe 's/good/better/g || s/bad/worse/g' your_file
The -i.old option saves a copy of your old file under the name your_file.old, what can be very useful when bad comes to worse and worse comes to worst...

Remove file coding mark but preserve its coding

I've got a file with UTF-8 (Without BOM) coding. File is being created on Windows site and it's being transfered to Linux server through SFTP. Using cat -e on it, I get something like this:
cat -e file.txt
M-oM-;M-?test13;hbana0Kw;$
lala;LjgX$
Now, I know that M-oM-;M-? stands for UTF-8 (Without BOM). Is there a way to remove it from file but preseve its coding?

To remove the BOM from the first line of a file you can use something like this sed -e '1 s/^.//' file.txt.
sed commands have two parts an address and a command. Most of the time you see sed used without addresses (which means apply to all lines) but you can restrict the command operation to only specific lines by using addresses.
In this case the address is 1 meaning the first line. So the replacement only applies to the first line and every line is printed (as that is the default sed behaviour).

When transferring file from Windows to Linux, apply dos2unix command. This removes the BOM symbol and transforms line-edings to Unix style.
dos2unix file.txt

dos2unix doesn't convert ^M

I exported results in a text file from a program running on Windows 7, and copied the file on Xubuntu 14.04. In a terminal, I ran dos2unix file.txt, which tells me converting file out_mapqtl.txt to Unix format. However, when I look at the file with less, I still see the Windows end-of-line as ^M, and wc -l returns me "0".
I tried several things described here, but none works. I then opened the file in Vim and did :%s/\r/\r/g as explained there, which worked fine. So any idea why dos2unix didn't work? Would there be a way to avoid opening Vim every time?

I know you have gotten this resolved, but I wanted to add a note for reference, based on some testing I've done.
If less is showing ^M, then like Sybren I suspect it is a MAC style ending (\r), not DOS (\r\n). You can determine that easily using cat:
$ cat -e filename
Unix endings (\n) show as $
MAC endings (\r) show as ^M (less shows these)
DOS\Windows endings (\r\n) show as ^M$ (less does not appear to show these)
Use dos2unix to get rid of the DOS (^M$) endings
Use mac2unix to get rid of the MAC (^M) endings - dos2unix won't get rid of these.
I had a file where I had to use dos2unix and mac2unix to get rid of all the non-Unix endings.

\r denotes a carriage return, and on MAC it is used without \n to denote a line break. Are you sure the file is in DOS (\r\n) format and not MAC (\r)?
If VIM really turns out to be the only thing that'll repair your files, you can also invoke it as:
vim somefile.txt +"%s/\r/\r/g" +wq
This will open the file, perform the operation, save it, then quit.
Can you give us an example of the file, so that we can investigate further?

Try this:
tr -d '\r' < file

I have used Notepad++ feature:
Edit>EOL Conversions>Unix(LF).
Now export this file to the Unix machine using pscp.exe.
Let me know if that worked for you.

syntax error near unexpected token `$'in\r''

I'm trying to compile the NIST Biometric Image Software, and I have been having trouble all day. I finally got the source checked out right, and I installed cygwin with no problems (I have used it in the past), but when I went to compile, I get this error:
$ sh setup.sh </cygdrive/c/NBIS> [--without-X11]
setup.sh: line 94: syntax error near unexpected token `$'in\r''
'etup.sh: line 94: ` case $1 in
Now I'm sure any advanced coder would head to the setup.sh and look for problems, but I'm not really much of a coder (I'm only compiling this because there are no pre-compiled packages) so I don't know what to do. I didn't install any libraries with cygwin, I just left everything default. I'm trying to follow the NBIS manual, but I don't really understand it that well and so I'm struggling badly. Maybye taking a look at it you may notice something I missed: http://www.nist.gov/customcf/get_pdf.cfm?pub_id=51097

run
sed -i 's/\r//' setup.sh
to fix your line endings

That's a symptom of line-ending mismatch.
To convert setup.sh to Unix line endings on Cygwin, use
dos2unix setup.sh

Easy way to convert example.sh file to unix is use NotePad++ (Edit>EOL Conversion>UNIX/OSX Format)
You can also set the default EOL in notepad++ (Settings>Preferences>New Document/Default Directory>select Unix/OSX under the Format box)

Windows uses two characters (CR and LF, or \r\n) to mark the end of a line in a text file. Unix, Linux, and (by default) Cygwin use a single LF or '\n' character. Some Cygwin tools are able to deal with either format, but sh typically can't.
It looks like setup.sh uses Windows-style line endings -- or at least line 94 does.
I didn't find the download for the sources, but if they're distributed as a zip file, you might need to extract them using the Cygwin unzip command with the -a option, so any line endings are automatically converted.
But I suspect there's more to it than that. The distributed setup.sh file shouldn't have had any Windows-style line endings in the first place, and if it did, I don't know why the problem wouldn't show up until line 94.
If you can post the URL for the source download, I'll take a look at setup.exe.

In pycharm you can quickly change the line endings by clicking on the letters CRLF at the bottom right of the screen and selecting LF.

": Command not found"

Some issue arise when sourcing one of your env file (a series of variable exporting)
for instance:
...
export MY_ROOT=/Soft/dev/blah/blah
export MY_BIN=${MY_ROOT}/bin
...
results in
$. my_env.sh
$echo $MY_BIN
/bint/dev/blah/blah
=> "/bin" seems to overwrite the begining of the variable instead of suffixing it..
Any idea?
By the way every time we source this file, an error message is reported:
": Command not found"
Which is weird.. This message appears even though we comment its whole content.
The invoked shell at the begining seems good #!/bin/sh, or #!/bin/bash.
What about control characters? How to screen them on linux?

": Command not found" is the error I've seen when a UNIX/Linux shell script has been (mis-)handled by an MS Windows system. For example if it was checked out using a WebCVS, modified using Notepad or WordPad, and then re-submitted.
(It's complaining that it can't find the [Ctrl-M] executable --- which is a perfectly valid, though extremely inconvenient and somewhat suspicious filename for UNIX/Linux).
Run the file through GNU cat -A or the od -x or hexdump commands to see these (and verify my diagnosis ... or run it through tr -d with the appropriate quoting and shell "verbatim" handling for your system. (For example tr -d '[Ctrl-V],[Ctrl-M]' under Bash on a typical Linux system).
Depending on your version of tr you might be able to use: tr -d '\r' or tr -d \015 (015 is the octal for CR, "carriage return" or ^M --- MS-DOS used to used CR/LF pairs as line termination, which is just one of the many reasons that MS-DOS can rot in the forsaken abyss when it comes to interoperability. Line terminators of single characters cause no real issues for anyone else ... but PAIRS cause real conversion issues when everything else in the history of mainstream computing used single characters for this).
Oh, yeah, vim has a handy set ff (a.k.a. set fileformat option which can handle UNIX, MacOS, and MS-DOS line termination conventions from any copy of vim regardless of which platform you're on. I seem to recall the vim default is to detect which types of line termination a file is using and leave it unchanged (and to default to your platform's native for any new files, of course).

This will fix the line endings in the file:
dos2unix my_env.sh
There's no need for a shebang in a file that's only going to be sourced since it is run in the current shell anyway. However, as a comment it might be informative for human readers.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string