This is such a basic question I am surprised I could not easily find an answer to it:
I use Notepad++ to write my scripts in. Someone sent me some code for a shell script (.sh) that I could modify to suit my needs. I simply changed a small bit of text using Notepad++ (on Windows) and used FileZilla (SFTP) to upload it to my server (Debian Linux).
There were a few problems with this that it took my server admin an hour to find, namely:
FileZilla, for whatever reason, defaults to ASCII rather than binary! (changed it to binary and removed the .sh association with ASCII)
The permissions were wrong, chmod took care of this
Problem is it STILL did not work. To fix it my server admin simply copied the text right on the server (using vim or nano) into a new shell script file and saved that. Before he kept saying the problem was Windows (which he loves to hate on) but it seems it is the encoding that text-editors are using that is corrupting the files.
He said my text-editor encoding needs to be said to "None". However, that is not an option - only ANSI, UTF and UTS variants are options!
How can I create a shell script on Windows with no encoding whatsoever so that it doesn't get corrupted?
I need to be able to simply transfer the file to the server, I can't mess around with modifying it once on the server which is wholly impractical.
To fix it EndOfLine and encoding on Notepad++ :
On the bottom right of Notepad++ you can right click on the left of the encoding "UTF-8" and click on Convert UNIX(LF) format. Be sure to change encoding to UTF-8 if it is not the case.
In Filezilla :
Transfert mode : auto
Related
New to Oracle...
I have a bunch of SQL scripts from SQL Server that I want to edit into Oracle. I load these notepad-capable ASCII text files (ex: myscript.sql) into SQL Developer. When I open it, SQL Developer adds an extra line break between every line I had in the ascii text file. Annoying, but I can deal with that. I soldier on. I edit and change syntax. I run it. It works. I save. I'm happy so far. Feeling good.
But...
Now when I try to open myscript.sql in Notepad, line breaks are gone... there is a blank between every character in a word... it's a mess.
What the heck happened? And how do I make it stop? I know I'm old school, but I like to edit the format of my scripts myself... I want them in ascii text so that I can use a bulk file editor to change things...
I have googled this for a couple hours and have found countless pages regarding saving the OUTPUT of a script as text, but nothing about saving a SQL Developer script as plain text.
Welcome to the world of UNIX vs Windows/DOS line ending differences.
I would recommend using a better editor than Notepad. A modern code editing program will automatically handle the conversion and display.
http://www.cs.toronto.edu/~krueger/csc209h/tut/line-endings.html
So I still don't know how a couple of my scripts got so mangled up, because I can't recreate the issue. Other scripts are editing and saving just fine.
But I did figure out how to fix the couple scripts that did get hosed up.
I loaded Notepad++ for Windows. I opened the offending script, which shows a bunch of [NUL] all over the place. I was able to use the Searce/Replace function in 'extended mode' to search for \x00 (aka null) and replace it with nothing.
Next was the annoyance with line breaks (new line) versus "carriage return plus new line". I was able to use extended search and replace to replace \n with \r\n .
This now got me a file that I could edit in 'regular' notepad. I still don't know where the extra line breaks came into play, but I was able to spin through the file and remove the extra blank lines.
So all is well with the universe again. I got my couple of mangled scripts back in order, and others seem to behave properly now.
Thanks to all for the help. This site has been invaluable to me.
I'm on a Windows machine, and in my P4V workspace:advanced-settings, I have the following selected:
But I still notice extra spaces in some of my files after submitting and checking on a different machine. Is there a way to force every checkout and submission to be Unix style?
Note, I can't modify server settings, only my local workspaces.
If you set the LineEnd to unix and the files you sync have \r characters in them, it means that somebody using unix submitted them in that form (i.e. the files were checked in with the understanding that the Unix version of them is supposed to have those characters in there distinct from the native line endings). If you submit Windows-style files from a unix-mode workspace, you're saying that all unix workspaces should have Windows-style line endings for those files. This is desirable in some instances, like when you're using Unix machines to build packages for Windows systems, but for source code that's meant to be used cross-platform it's generally a bad thing.
It's not too hard to go through the history, figure out who did that, and cluebat them (or get the admin to force a fix of their LineEnd setting so that it actually matches the contents of their workspace), and I wholeheartedly recommend doing this any time you ask for unix files and get a win-style file that you didn't want. Usually if everyone uses the default setting of local everything works fine.
As far as fixing the files, a pretty easy method is to change your LineEnd to share, open all the files for edit, and then submit them -- the share setting forces all the \r characters to be stripped out on submit, as if you were running dos2unix over all your files every time you submitted.
I want to view some ansi-art on the linux local-console. (my setup:raspberry pi3 / newest raspbian - no x11)
i've tried many different settings in raspi-config, dpkg-reconfigure console-setup, /etc files, environment vars but i had no luck yet. do i need a special pcf font to get it working?
a reliable way to enable it for remote terminals would also be great.
thanks in advance
It depends on what your data uses (see chart). Codes 0..31 are a problem unless you have a program that can map those codes to a printable value (as noted in Why does showconsolefont have different output in tmux?, the showconsolefont program does this mapping of 0..31).
Most of the usable fonts for the Linux console are "psf" fonts: having a header which tells which Unicode values each glyph corresponds to. Using that, along with a known character set (cp437), you could convert the data or "play" it using an application which knows how to do this:
You could convert it using iconv or recode, or
The line-drawing (128..255) could be done using luit in a UTF-8 console.
I have a CSV file with special accents and save it in Notepad by selecting UTF-8 encoding. When I read the file using Java, it reads the BOM characters too.
So I want to save this file in UTF-8 format without appending a BOM initially in Notepad.
Otherwise, is there a built-in class in Java that eliminates the BOM characters that present at beginning, when reading the contents in a file?
Use Notepad++ - it is free and much better than Notepad. It will help to save text without a BOM using Encoding → Encode in UTF-8 without BOM: Notepad++ v6 and olders:
Notepad++ v7+:
When I encountered this problem in Java, I didn't find any library to parse these first three bytes (BOM). So my advice:
Use PushbackInputStream(in, 3).
Read the first three bytes
If it's not BOM (EF BB BF), push them back
Process the stream as UTF-8
I just learned from this Stack Overflow post, as #martin-geisler points out, that you can save files without the BOM in Windows Notepad, by selecting ANSI as the encoding.
I'm assuming that for more advanced uses this won't work because the resulting file is probably not the end encoding wished, but actually ANSI; but I tested and confirmed this works to save a very small .php script without BOM using only Notepad.
I learned the long, hard way that Windows' Notepad is not a true editor, although I'd like to point out for others that, despite this, it is misleadingly called up when you type "editor" on newer Windows machines, at least on one of mine.
I am currently using Emacs and other editors to solve this problem.
Use Notepad++ instead. See my personal blog post on it. From within Notepad++, choose the "Encoding" menu, then "Encode in UTF-8 without BOM".
Notepad on Windows 10 version 1903 (May 2019 update) and later versions supports saving to UTF-8 without a BOM. In fact, UTF-8 is the default file format now.
Reference: Windows 10 Notepad is Getting Better UTF-8 Encoding Support
The answer is: Not at all. Notepad can't do that.
In Java you can just skip the first byte in your InputStream and be done.
You might want to try out Notepad2 or Notepad++. Those Notepad replacements have the option for you to choose whether to output BOM.
As for a Java solution, as far as I know, Java does not understand the standard UTF-8. I googled and found Java's UTF-8 and Unicode writing is broken - Use this fix that might be the solution.
We're using the utility BOMStripperInputStream.java to strip the BOM from our input if present.
Given a text file in ubuntu (or debian unix in general), how do I find out the file encoding of the file ? Can I run od or hexdump on it to fingerprint its encoding ? What should I be looking out for ?
There are many tools to do this. Try a web search for "detect encoding". Here are some of the tools I found:
The Internationalizations Classes for Unicode (ICU) are a great place to start. See especially their page on Character Set Detection.
Chardet is a Python module to guess the encoding
of a file. See chardet.feedparser.org
The *nix command-line tool file detects file types, but might also detect encodings if mentioned in the file (e.g. if there's a mime-type notation in
the file). See man file
Perl modules Encode::Detect and Encode::Guess .
Someone asked a similar question in StackOverflow. Search for the question, PHP: Detect encoding and make everything UTF-8. That's in the context of fetching files from the net and using PHP, but you could write a command-line PHP script.
Note well what the ICU page says about character set detection: "Character set detection is ..., at best, an imprecise operation using statistics and heuristics...." In my experience the problem domain makes a big difference in how easy or difficult the job is. Don't forget that it's possible that the octets in a file can be of ambiguous encoding, i.e. sensibly interpreted using multiple different encodings. They can also be of mixed encoding, i.e. different subsets of the octets make sense interpreted in different encodings. This is why there's not a single command-line tool I can recommend which always does the job.
If you have a single file and you just want to get it into a known encoding, my trick is to open the file with a text editor which can import using a bunch of different encodings, such as TextWrangler or OpenOffice.org. First, open the file and let the editor guess the encoding. Take a look at the result. If you aren't satisfied with it, guess an encoding, open the file with the editor specifying that encoding, and take a look at the result. Then save as a known encoding, e.g. UTF-16.
You can use enca. Enca is a small command line tool for encoding detection and convertion.
You can install it at debian / ubuntu by:
apt-get install enca
In order to use it, just call
enca FILENAME
Also see the manpage for more information.