How to convert saved text file encoding to UTF8? - c#-4.0

recently i saved a text file on my computer but when i open it again i saw some strings like:
"˜ÌÇí ÍÑÝã ÚÌíÈå¿"
now i want to know is it possible to reconvert it to the original text (UTF8)?
i try this codes but it doesn't works
string tempStr="˜ÌÇí ÍÑÝã ÚÌíÈå¿";
Encoding ANSI = Encoding.GetEncoding(1256);
byte[] ansiBytes = ANSI.GetBytes(tempStr);
byte[] utf8Bytes = Encoding.Convert(ANSI, Encoding.UTF8, ansiBytes);
String utf8String = Encoding.UTF8.GetString(utf8Bytes);

You can use something like:
string str = Encoding.GetEncoding(1256).GetString(Encoding.GetEncoding("iso-8859-1").GetBytes(tempStr))
The string wasn't really decoded... Its bytes where simply "enlarged" to char, with something like:
byte[] bytes = ...
char[] chars = new char[bytes.Length];
for (int i = 0; i < bytes.Length; i++)
{
chars[i] = bytes[i];
}
string str = new string(chars);
Now... This transformation is the same that is done by the codepage ISO-8859-1. So I could simply have done the reverse, or I could have used that codepage to do it for me, I selected the second one.
Encoding.GetEncoding("iso-8859-1").GetBytes(tempStr)
this gave me the original byte[]
Then I've done some tests and it seems that the text in the beginning wasn't UTF8, it was in codepage 1256, that is an arabic codepage. So I
string str = Encoding.GetEncoding(1256).GetString(...);
The only thing, the ˜ doesn't seem to be part of the original string.
There is another possibility:
string str = Encoding.GetEncoding(1256).GetString(Encoding.GetEncoding(1252).GetBytes(tempStr));
The codepage 1252 is the codepage used in the USA and in a big part of Europe. If you have a Windows configured to English, there is a good chance it uses the 1252 as the default codepage. The result is slightly different than using the iso-8859-1

Related

Writing bytes in a pdf

I am getting a response for an API in form of bytes.
let a = "JSHDHHFHFHHFKFLLFLDMDMDMDMMSKKW==";
I want to write this into a pdf file.
The approach I have taken till now is to convert it to binary using atob library. Then I convert it to a Uint8Array and write to a file using fs.writestream. When the file write completes it gives me an output of unidetified type.
%
1 0 obj
<</Filter/FlateDecode/First 141/N 20/Length 848/Type/ObjStm>>
stream
xUmoâ8þ+óm[U½ø%/ÎiU ÈÂr]º¸ë¢|ð/)`©ý÷7ã´vK·HQ2gÆÏ3À#)ÀcHBd#*P§¤ Î U8GC8q3È_C¦x¦¤øU8Oà3$c¨*Æ/æK?óÝ7÷¸5Á`
íÆ-Ð4?¶Î¬Çï²3Ù=0YØÑ8èm0.ÍÆUî1  ¢ý+3í²©¶Î6"ûº5~º?½»®Fzçôºº¹!ÙO>=¸ÑÜigºù÷¼ÏyDqÇIè?*Ê¥J!âB)ÿ§ ½àøÌÝDÇþ{;ü2¹ê5®¯û¶.'1Í«ÚÊZã#$~4GØÿda0vº®½Íª6H-­àZ&Lo?jõÃáÁIÐ림=¬õªñ+¿T¿oòë³S$±Þ±ð³wzm^ú²ÍNϨJ[Bß6iEði³´eµYíi»þ|#ÜÂþ½©ÐÁx/iãt©ÑÕÌT¯LKNy³µ]½dzHÞÈì¿ì¯Á¢ùï2Ta°<b¬HDNµ¤¹êþø2Äz=WTâ÷hg õr¡æQîI²2xj;÷æÁe[|Ó à±¦#b\:IEÌ,ékvª_]ØÌ´v×,Mû$êCô¯hêgþp»DEäÁ4óàµ#Å¡v$§vDx¤y yR;qè#Q;ByÇíÓ{Z6»£UÛªlsλ
ÜÙ>5»[pÍÎ_§tíO;Û¬u}¢ñm·µYv|mJÓ`)_<×Ç%²½ªZ×<^ôJûÍ\þÒ{£Þð'"u?ÅÅ!\{þÈ~?âEF¡xàBxÏþigX]¿quu&^/ú¶ìŽIüþvZj<§A_½ñ¾ëº5¯ÖÄ.²?ãsÁY_1ñ±Á 1ÚUvó¶
£Ü-Ms1~ÑÛº#Hÿìr$ö¤ÿ²}R
endstream
endobj
22 0 obj```
When I am trying the response on online editor it gives me the write response.
The code I have used till now.
let encodedPDF = JSON.parse(d).Resp_Policy_Document.PDF_BYTES;
var bin = atob(encodedPDF);
var binaryLen = bin.length;
var bytes = new Uint8Array(binaryLen);
for (var i = 0; i < binaryLen; i++)
{
var ascii = bin.charCodeAt(i);
bytes[i] = ascii;
}
let writer=fs.createWriteStream('Last.pdf');
writer.write(bin);
The data you get is Base64 encoded. That's a pretty common way for APIs to pass information. The giveaways? the equal signs at the end, and the use of ASCII uppercase and lowercase letters, numbers, +, and /.
So, you need to decode it that way. Use something like this
const pdfBinary = Buffer.from(a, 'base64');
The contents of this buffer, I guess, are a pdf document. You should write it directly to a file without trying to convert it to a text string.

How to get the right values when I convert hex to ascii character

I'm trying to convert a hexa file to text, using javascript node js.
function hex_to_ascii(str1){
var hex = str1.toString();
var str = '';
for (var n = 0; n < hex.length; n += 2) {
str += String.fromCharCode(parseInt(hex.substr(n, 2), 16));
}
return str;
}
I have a problem concening the extended ASCII charaters, so for example when I try to convert 93 I've get “ instead of ô and when I convert FF I've get ÿ instead of (nbsp) space.
I want to get the same extended charaters as this table: https://www.rapidtables.com/code/text/ascii-table.html
This problem is slightly more complex than it seems at first, since you need to specify an encoding when converting from extended ascii to a string. For example Windows-1252, ISO-8859-1 etc. Since you wish to use the linked table, I'm assuming you wish to use CP437 encoding.
To convert a buffer to string you need a module that will do this for you, converting from a buffer (in a given encoding) to string is not trivial unless the buffer is in a natively supported node.js encoding, e.g. UTF-8, ASCII (7-bit only!), Latin1 etc.
I would suggest using the iconv-lite package, this will convert many types of encoding. Once this is installed the code should look as follows (this takes each character from 0x00 to 0xFF and prints the encoded character):
const iconv = require('iconv-lite');
function hex_to_ascii(hexData, encoding) {
const buffer = Buffer.from(hexData, "hex");
return iconv.decode(buffer, encoding);
}
const testInputs = [...Array(256).keys()];
const encoding = "CP437";
console.log("Decimal\tHex\tCharacter")
for(let input of testInputs) {
console.log([input, input.toString(16), hex_to_ascii(input.toString(16), encoding)].join("\t"));
}

node.js buffer toString default utf8 but i want no encoding

var iconv = require('iconv').Iconv;
var UtfToEuckr = new iconv('utf-8', 'euckr');
var result = UtfToEuckr.convert('한글');
var str = result.toString(); <- default convert utf8
str is UTF-8 string
i wanna no encoding to string
want result : str is euckr string
please, tell me know how..
thank for reading
you always have some encoding, you should either try "ascii" or "binary" (whichever you mean in your situation)

Converting hex values within a string to their string equivalent

So, I've been looking around SO and a few other sites for a solid method of converting hex values embedded in normal strings (eg. The '/' are in the string as '\x2F') and wasn't able to find a solution that specifically fit my needs.
What I ended up doing was writing a bit of code myself to handle this:
for (int i = 0; i < 128; i++)
{
string hexString = i.ToString("X").PadLeft(2, '0');
string searchString = #"\x" + hexString;
if (response.Contains(searchString))
{
int charValue = Convert.ToInt32(hexString, 16);
string character = Char.ConvertFromUtf32(charValue);
response = response.Replace(searchString, character);
}
}
My question is this:
Is this a good way of going about it?
Any perticular drawbacks to me using this?
The purpose of this code is to take a string like:
"previous content...http:\x2F\x2Fwww.google.com...after content"
and convert it to:
"previous content...http://www.google.com...after content"

Converting Text to HTML In D

I'm trying to figure the best way of encoding text (either 8-bit ubyte[] or string) to its HTML counterpart.
My proposal so far is to use a lookup-table to map the 8-bit characters
string[256] lutLatin1ToHTML;
lutLatin1ToXML[0x22] = "&quot";
lutLatin1ToXML[0x26] = "&amp";
...
in HTML that have special meaning using the function
pure string toHTML(in string src,
ref in string[256] lut) {
return src.map!(a => (lut[a] ? lut[a] : new string(a))).reduce!((a, b) => a ~ b) ;
}
I almost work except for the fact that I don't know how to create a string from a `ubyte? (the no-translation case).
I tried
writeln(new string('a'));
but it prints garbage and I don't know why.
For more details on HTML encoding see https://en.wikipedia.org/wiki/Character_entity_reference
You can make a string from a ubyte most easily by doing "" ~ b, for example:
ubyte b = 65;
string a = "" ~ b;
writeln(a); // prints A
BTW, if you want to do a lot of html stuff, my dom.d and characterencodings.d might be useful:
https://github.com/adamdruppe/misc-stuff-including-D-programming-language-web-stuff
It has a html parser, dom manipulation functions similar to javascript (e.g. ele.querySelector(), getElementById, ele.innerHTML, ele.innerText, etc.), conversion from a few different character encodings, including latin1, and outputs ascii safe html with all special and unicode characters properly encoded.
assert(htmlEntitiesEncode("foo < bar") == "foo < bar";
stuff like that.
In this case Adam's solution works just fine, of course. (It takes advantage of the fact that ubyte is implicitly convertible to char, which is then appended to the immutable(char)[] array for which string is an alias.)
In general the safe way of converting types is to use std.conv.
import std.stdio, std.conv;
void main() {
// utf-8
char cc = 'a';
string s1 = text(cc);
string s2 = to!string(cc);
writefln("%c %s %s", cc, s1, s2);
// utf-16
wchar wc = 'a';
wstring s3 = wtext(wc);
wstring s4 = to!wstring(wc);
writefln("%c %s %s", wc, s3, s4);
// utf-32
dchar dc = 'a';
dstring s5 = dtext(dc);
dstring s6 = to!dstring(dc);
writefln("%c %s %s", dc, s5, s6);
ubyte b = 65;
string a = to!string(b);
}
NB. text() is actually intended for processing multiple arguments, but is conveniently short.

Resources