I'm trying to read a file that contains extended ascii characters like 'á' or 'è', but NodeJS doesn't seem to recognize them.
I tried reading into:
Buffer
String
Tried differente encoding types:
ascii
base64
utf8
as referenced on http://nodejs.org/api/fs.html
Is there a way to make this work?
I use the binary type to read such files. For example
var fs = require('fs');
// this comment has I'm trying to read a file that contains extended ascii characters like 'á' or 'è',
fs.readFile("foo.js", "binary", function zz2(err, file) {
console.log(file);
});
When I do save the above into foo.js, then the following is shown on the output:
var fs = require('fs');
// this comment has I'm trying to read a file that contains extended ascii characters like '⟡ 漀爀 ✀',
fs.readFile("foo.js", "binary", function zz2(err, file) {
console.log(file);
});
The wierdness above is because I have run it in an emacs buffer.
The file I was trying to read was in ANSI encoding. When I tried to read it using the 'fs' module's functions, it couldn't perform the conversion of the extended ASCII characters.
I just figured out that nodepad++ is able to actually convert from some formats to UTF-8, instead of just flagging the file with UTF-8 encoding.
After converting it, I was able to read it just fine and apply all the operations I needed to the content.
Thank you for your answers!
I realize this is an old post, but I found it in my personal search for a solution to this particular problem.
I have written a module that provides Extended ASCII decoding and encoding support to/from Node Buffers. You can see the source code here. It is a part of my implementation of Buffer in the browser for an in-browser filesystem I've created called BrowserFS, but it can be used 100% independently of any of that within NodeJS (or Browserify) as it has no dependencies.
Just add bfs-buffer to your dependencies and do the following:
var ExtendedASCII = require('bfs-buffer/js/extended_ascii').default;
// Decodes the input buffer as an extended ASCII string.
ExtendedASCII.byte2str(aBufferWithText);
// Encodes the input string as an extended ASCII string.
ExtendedASCII.str2byte("Some text");
Alternatively, just adapt the module into your project if you don't want to add an extra dependency to your project. It's MIT Licensed.
I hope this helps anyone in the future who finds this page in their searches like I did. :)
Related
I'm writing an app in node.js, and see that I can do things like this:
var buf = new Buffer("Hello World!")
console.log(buf.toString("hex"))
console.log(buf.toString("utf8"))
And I know of 'ascii' as an encoding type (it'll take an ASCII code, such as 112 and turn it into a p), but what other types of encoding can I do?
The official node.js documentation for Buffer is the best place to check for something like this. As previously noted, Buffer currently supports these encodings: 'ascii', 'utf8', 'utf16le'/'ucs2', 'base64', 'base64url', 'latin1'/'binary', and 'hex'.
As is always the way, I spent a while Googling but found nothing until after I posted the question:
http://www.w3resource.com/node.js/nodejs-buffer.php has the answer. You can use the following types in .toString() on a buffer:
ascii
utf8
utf16le
ucs2 (alias of utf16le)
base64
binary
hex
It support ascii , utf-8 , ucs2, base64, binary
My requirements are very simple… open any old ANSI-ASCII-UTF8-Unicode TXT file and replace some of the special "word processing" characters like the fancy single quote (\u2019) and double quotes (\u201C and \u201D) with the plain vanilla Ascii ones, and then do some other (irrelevant to the problem) parsing.
However, regardless of the encoding I try (ascii, utf8, binary) I just can’t get Node.js to return all characters correctly so as to replace them with their Ascii equivalents and instead I get the useless little rectangles!
Here’s the relevant part of the function…
function LoadTxtFile(Name){
fs=require('fs');
if (fs.existsSync(Name)){
var Source=fs.readFileSync(Name,'binary').toString();
/* Replace miscellaneous characters which works fine…*/
Source=Source.replace(/\©/g,'©');
Source=Source.replace(/\…/g,'...');
Source=Source.replace(/\t/g,' ');
Source=Source.replace(/\'/g,''')
/* Replace the dreaded single/double quotes but they are never located! */
Source=Source.replace(/\u2019/g,''');
Source=Source.replace(/\u201C/g,'"');
Source=Source.replace(/\u201D/g,'"');
/* And we’re stuck! */
}}
Thank you very much.
Try the Node-Iconv library see if it helps
I'm trying to do something rather simple: write a text file with data entered in a text input field to a file...
var data = document.getElementById("fileContent").value;
fs.writeFileSync("test.txt", data);
For instance if I type in,
Write this to file 123 123
I end up with this in the file...
Write this to
If I hard code a string into the application, it writes correctly.
fs.writeFileSync("test.txt", "this is a hard coded string");
I tried using writeFileSync with and without the encoding parameter set. I've tried createWriteStream with and without encoding the parameter set. I've tried fileOpen, fs.writeSync, and fs.close. I even tried converting the date to a Buffer object and writing that. In every case, I got the exact same results.
The encoding is also strange. Notepad++ indicates that the encoding is "UCS2-LE w/o BOM" I'd expect it to be UTF-8, as I'v been setting the encoding parameter to that.
Any thoughts?
It's a bug with Node-Webkit-v0.9.*
It's OK if you use Node-Webkit-v.8.* or lower version.
After some more research and determining it was something with encoding, I stumbled on this post. Apparently, utf8 doesn't work...
https://groups.google.com/forum/#!msg/node-webkit/3M-0v92o9Zs/eSYnSZ8dUK0J
I changed the encoding it to "utf16le", and this appears to write the text correctly for hard-coded text and text from a text box.
I'm reading a directory in nodejs using the fs.readdir() function. You feed it a string containing a path and it returns an array containing all the files inside that directory path in string format. It does not work for me with special characters (like ï).
I came across this similar issue, however I am on OS X).
First I created a new dir called encoding and created a file called maïs.md (with my editor Sublime Text).
fs.readdir('encoding', function(err, files) {
console.log(files); // [ 'maïs.md' ]
console.log(files[0]); // maïs.md
console.log(files[0] === 'maïs.md'); // false
console.log(files[0] == 'maïs.md'); // false
console.log(files[0].toString('utf8') === 'maïs.md'); // false
});
The above test works correctly for files without special characters. How can I compare this correctly?
you character seems to be this one. You should try with
(1) console.log(files[0] == 'ma\u00EF;s.md');
(2) console.log(files[0] == 'mai\u0308;s.md');
If (1) works it could mean that the file containing your code is not saved in utf-8 format, so the node.js engine does not interpret correctly the ï character in your code.
If (2) works it could mean that the file system gives to the node engine the ï character in its decomposed unicode form (i followed by a diacritic ¨). cf #thejh answer
In this (2) case, use the unorm library available on npm to normalize the strings before comparing them (or the original UnicodeNormalizer)
https://apple.stackexchange.com/a/10484/23863 looks relevant – it's probably because there are different ways to express ï in utf8.
In java, we can use the method of String : byte[] getBytes(Charset charset) .
This method Encodes a String into a sequence of bytes using the given charset, storing the result into a new byte array.
But how to do this in GO?
Is there any similar way in Go can do this?
Please let me know it.
The standard Go library only supports Unicode (UTF-8, UTF-16, UTF-32) and ASCII encoding. ASCII is a subset of UTF-8.
The go-charset package (found from here) supports conversion to and from UTF-8 and it also links to the GNU iconv library.
See also field CharsetReader in encoding/xml.Decoder.
I believe here is an answer: https://stackoverflow.com/a/6933412/1315563
There is no way to do it without writing the conversion yourself or
using a third-party package. You could try using this:
http://code.google.com/p/go-charset