Convert a text file to UTF8 in D

Convert a text file to UTF8 in D - text

I'm attempting to use the Phobos standard library functions to read in any valid UTF file (UTF-8, UTF-16, or UTF-32) and get it back as a UTF-8 string (aka D's string). After looking through the docs, the most concise function I could think of to do so is
using std.file, std.utf;
string readToUTF8(in string filename)
{
try {
return readText(filename);
}
catch (UTFException e) {
try {
return toUTF8(readText!wstring(filename));
}
catch (UTFException e) {
return toUTF8(readText!dstring(filename));
}
}
}
However, catching a cascading series of exceptions seems extremely hackish. Is there a "cleaner" way to go about it without relying on catching a series of exceptions?
Additionally, the above function seems to return a one-byte BOM in the resulting string if the source file was UTF-16 or UTF-32, which I would like to omit given that it's UTF-8. Is there a way to omit that besides explicitly stripping it?

One of your questions answers the other: the BOM allows you to identify the exact UTF encoding used in the file.
Ideally, readText would do this for you. Currently, it doesn't, so you'd have to implement it yourself.
I'd recommend using std.file.read, casting the returned void[] to a ubyte[], then looking at the first few bytes to see if they start with a BOM, then cast the result to the appropriate string type and convert it to a string (using toUTF8 or to!string).

Related

How to generate a warning/error when using non-string-variables inside string interpolation?

TypeScript does not produce any errors for the following code:
const maybe_a_string: undefined | string = undefined;
const false_or_string: false | string = false;
// I'd like the following to produce an error/warning...
const message_string = `Some readable string info should be here: ${maybe_a_string} ${false_or_string}`;
Is there some kind of setting I can turn on, or simple alternative ways to write the last line that will warn me about trying to use non-string variables inside strings like this? (but without needing to add extra lines of code for every sub-string to be asserted individually)
I guess it treats them as fine because some types like bools, numbers and misc objects have a .toString() method...
But especially in the case of undefined (which actually doesn't have a .toString() method) - it's quite common for you to have a bug there, as the only time you really want to see the string "undefined" inside another string is for debugging purposes. But there's a lot of these bugs out there in the wild where end users are seeing stuff like "hello undefined" unintentionally.

Personally I would handle this by making the string template into a function. That way you can specify that the arguments must be strings.
const createMessageString = (first: string, second: string): string => {
return `Some readable string info should be here: ${first} ${second}`;
}
const message_string = createMessageString( maybe_a_string, false_or_string );
// will give an error unless types are refined

Vote for https://github.com/microsoft/TypeScript/issues/30239 [Restrict template literal interpolation expressions to strings]
Additionally, you can try workarounds from the issue comments.

NodeJS why is object[0] returning '{' instead of the first property from this json object?

So I have to go through a bunch of code to get some data from an iframe. the iframe has a lot of data but in there is an object called '_name'. the first key of name is 'extension_id' and its value is a big long string. the json object is enclosed in apostrophes. I have tried removing the apostrophes but still instead of 'extension_id_output' I get a single curly bracket. the json object looks something like this
Frame {
...
...
_name: '{"extension_id":"a big huge string that I need"} "a bunch of other stuff":"this is a valid json object as confirmed by jsonlint", "globalOptions":{"crev":"1.2.50"}}}'
}
it's a whole big ugly paragraph but I really just need the extension_id. so this is the code I'm currently using after attempt 100 or whatever.
var frames = await page.frames();
// I'm using puppeteer for this part but I don't think that's relevant overall.
var thing = frames[1]._name;
console.log(frames[1])
// console.log(thing)
thing.replace(/'/g, '"')
// this is to remove the apostrophes from the outside of the object. I thought that would change things before. it does not. still outputs a single {
JSON.parse(thing)
console.log(thing[0])
instead of getting a big huge string that I need or whatever is written in extension_id. I get a {. that's it. I think that is because the whole object starts with a curly bracket. this is confirmed to me because console.log(thing[2]) prints e. so what's going on? jsonlint says this is a valid json object but maybe it's just a big string and I should be doing some kind of split to grab whaat's between the first : and the first ,. I'm really not sure.

For two reasons:
object[0] doesn't return the value an object's "first property", it returns the value of the property with the name "0", if any (there probably isn't in your object); and
Because it's JSON, and when you're dealing with JSON in JavaScript code, you are by definition dealing with a string. (More here.) If you want to deal with the object that the JSON describes, parse it.
Here's an example of parsing it and getting the value of the extension_id property from it:
const parsed = JSON.parse(frames[1]._name);
console.log(parsed.extension_id); // The ID

Treat all cells as strings while using the Apache POI XSSF API

I'm using the Apache POI framework for parsing large Excel spreadsheets. I'm using this example code as a guide: XLSX2CSV.java
I'm finding that cells that contain just numbers are implicitly being treated as numeric fields, while I wanted them to be treated always as strings. So rather than getting 1.00E+13 (which I'm currently getting) I'll get the original string value: 10020300000000.
The example code uses a XSSFSheetXMLHandler which is passed an instance of DataFormatter. Is there a way to use that DataFormatter to treat all cells as strings?
Or as an alternative: in the implementation of the interface SheetContentsHandler.cell method there is string value that is the cellReference. Is there a way to convert a cellReference into an index so that I can use the SharedStringsTable.getEntryAt(int idx) method to read directly from the strings table?
To reproduce the issue, just run the sample code on an xlsx file of your choice with a number like the one in my example above.
UPDATE: It turns out that the string value I get seems to match what you would see in Excel. So I guess that's going to be "good enough" generally. I'd expect the data I'm sent to "look right" and therefore it'll get parsed correctly. However, I'm sure there will be mistakes and in those cases it'd be nice if I could get at the raw string value using the streaming API.

To resolve this issue I created my own class based on XSSFSheetXMLHandler
I copied that class, renamed it and then in the endElement method I changed this part of the code which is formatting the raw string:
case NUMBER:
String n = value.toString();
if (this.formatString != null && n.length() > 0)
thisStr = formatter.formatRawCellContents(Double.parseDouble(n), this.formatIndex, this.formatString);
else
thisStr = n;
break;
I changed it so that it would not format the raw string:
case NUMBER:
thisStr = value.toString();
break;
Now every number in my spreadsheet has its raw value returned rather than a formatted version.

Processing Split (server)

I am doing 2player game and when I get informations from server, it's in format "topic;arg1;arg2" so if I am sending positions it's "PlayerPos;x;y".
I then use split method with character ";".
But then... I even tried to write it on screen "PlayerPos" was written right, but it cannot be gained through if.
This is how I send info on server:
server.write("PlayerPos;"+player1.x+";"+player1.y);
And how I accept it on client:
String Get=client.readString();
String [] Getted = split(Get, ';');
fill(0);
text(Get,20,20);
text(Getted[0],20,40);
if(Getted[0]=="PlayerPos"){
text("HERE",20,100);
player1.x=parseInt(Getted[1]);
player1.x=parseInt(Getted[2]);
}
It writes me "PlayerPos;200;200" on screen, even "PlayerPos" under it. But it never writes "HERE" and it never makes it into the if.
Where is my mistake?

Don't use == when comparing String values. Use the equals() function instead:
if(Getted[0].equals("PlayerPos")){
From the Processing reference:
To compare the contents of two Strings, use the equals() method, as in if (a.equals(b)), instead of if (a == b). A String is an Object, so comparing them with the == operator only compares whether both Strings are stored in the same memory location. Using the equals() method will ensure that the actual contents are compared. (The troubleshooting reference has a longer explanation.)

Is there standard method for managing camel cased strings in groovy?

For example groovy converts getSomeProperty() method to someProperty.
I need the same for my string. prefixMyString converted to myString.
Is there standard way to do so?

Groovy doesn't actually convert getSomeProperty() into someProperty. It only converts the other way, turning someProperty into getSomeProperty()
It does this using the capitalize(String property) method on org.codehaus.groovy.runtime.MetaClassHelper. You can run this in the console to see it work:
org.codehaus.groovy.runtime.MetaClassHelper.capitalize('fredFlinstone')
// outputs 'FredFlintstone'
The full conversion, including adding set, get, or is, can be found in the class groovy.lang.MetaProperty, under the methods getGetterName and getSetterName.
To convert the other way, you'll have to write your own code. However, that's relatively simple:
def convertName(String fullName) {
def out = fullName.replaceAll(/^prefix/, '')
out[0].toLowerCase() + out[1..-1]
}
println convertName('prefixMyString') // outputs: myString
println convertName('prefixMyOTHERString') // outputs: myOTHERString
Just change the prefix to meet your needs. Note that it's a regex, so you have to escape it.
EDIT: I made a mistake. There actually is a built-in Java method to decapitalize, so you can use this:
def convertName(String fullName) {
java.beans.Introspector.decapitalize(fullName.replaceAll(/^prefix/, ''))
}
It works nearly the same, but uses the built-in Java class for handling the decapitalization. This method handles uppercase characters a little differently, so that prefixUPPERCASETest returns UPPERCASETest.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Convert a text file to UTF8 in D - text

Related

How to generate a warning/error when using non-string-variables inside string interpolation?

NodeJS why is object[0] returning '{' instead of the first property from this json object?

Treat all cells as strings while using the Apache POI XSSF API

Processing Split (server)

Is there standard method for managing camel cased strings in groovy?

Categories

Resources