recognising encodings in Excel - excel

It is my understanding that txt files do not have encoding information stored so text editors simply make educated guesses about encoding of a given text file and then display the file on screen using that guessed encoding. If the editor guessed right you get your text on the screen, if the editor guessed wrong, then you (sometimes) get gibberish. Am I getting this right so far?
Now on to my problem. I have my bank statements in a csv file. When I open it in MS Excel 14 (MS Office 2010), it recognises the encoding and displays the problematic work as "obračun". Great. When I open the file in Emacs 24.3.1, it fails to recognise the correct encoding and displays the problematic word as "obra鑾n". Not so great.
My question is: how do I ask Excel which encoding the file is in? So I can tell that to Emacs since Excel obviously guessed correctly.
Thanks.

This could be a possible answer: http://metty-mathews.blogspot.si/2013/08/excel2013-character-encoding.html
After I opened ‘Advanced’ – ‘Web Options’ – ‘Encoding’, it said "Central European (Windows)" in "Save this document as:" field. It turns out that's Microsoft's name for Windows-1250 encoding and it turns out my file was indeed encoded with this encoding.
Is this just pure luck or does this field really show in which encoding Excel is displaying text - that I do not know.

Related

Unable to paste text in readable format from a PDF

I have a PDF document with the following sample text (screenshot) -
But when I copy and paste it to either word or other text editors all I see is the weird characters :





I am not quite sure why does it giving me weird square boxes instead of pasting the clear human-readable letters (just like the screenshot). Can someone help me how can I get rid of this issue ? Or at least what shall I do to identify the root cause of this strange issue ?
================== Workaround found ==================
I tried converting the document's corrupted unicode to a standard ANSCI unicode formats. But most of the online services couldn't recognize these garbage/weird characters.
This issue could be resolved by some programming, but I don't want to invest time with the programming approach and preferred on the fly approach.
Finally, as suggested by the user 'mkl', converting this document by using the OCR services like "Sedja"/ "Adobe OCR" resolved by issue.

Microsof Excel 2016 Unicode (UTF-8) does not work

I searched all around internet how to save CVS file as Unicode (UTF-8), but it still does not work, whenever i save, and open the file, there is ????? instead of letter that are UTF-8.
Has anyone ever had this issue? how can i solve this?
This has been annoying short coming of Excel for a long time.
A way to work around this issue, is to do the following:
Save as... Unicode text (*.txt). Make sure to keep the extension as txt (or at least not csv). It will be saved with tabs instead of commas separating the columns.
Open the document. You will be prompted with an import wizard, like so:
For File origin, choose 65001: Unicode (UTF-8)
For the rest of the options, choose the common sense options.
You will have your document back, ready to edit, with the proper unicode text intact.

Corrupt Text File read/write/open

I have a large text file that I take notes in; Recently, after saving it, it won't open and gives following error. I tried a few things on web that didn't work---opening in different encoding format, etc. Nothing worked. Any idea how I can open it again? Is there a language I can use from bash? I'm very familiar with PHP. Any ideas? Different text editor?
Error:
"The document “ToDo.txt” could not be opened. Text encoding Unicode (UTF-8) isn’t applicable."
"The file may have been saved using a different text encoding, or it may not be a text file."
cat the file from the CLI and make sure your data is still there. Then you could simply copy and paste the output into a new file and hopefully get rid of whatever weird encodings are causing that text editor to not read the file.

Cannot write british pound or euro symbols to CSV file - Nodejs

I'm writing a CSV file which contains text with british pound and euro symbols, however when I opened the file in Excel, I see some rather odd behavior. I see some weird A-looking symbol before the british pound, and quotes instead of the euro symbol. I figured it's probably because Excel doesn't like a file that's UTF8 encoded.
fs.writeFileAsync("the-file.csv", text-containing-foreing-currency, "utf8");
Does anyone know a way to get around this while creating the file? I don't want the users to have to do anything with excel after downloading the file, I just want them to be able to open the file and see the right symbols.
There shouldn't be any problem with node writing the symbols to the file, if you open it with a text editor you should see the correct characters.
The problem is with excel opening UTF8 csv files. By default it assumes ANSI encoding, so if the file is in UTF8, it scrambles the characters. You can open the file correctly with the text import wizard.
In general this is a limitation of excel. The best workaround for you will depend on your OS and Excel version. This is a heavily discussed topic, here are some good reads:
Is it possible to force Excel recognize UTF-8 CSV files automatically?
Which encoding opens CSV files correctly with Excel on both Mac and Windows?

How to mannually specify Byte Order Mark in CSV

I have a CSV that is encoded in Unicode, however lacks a byte order mark at the start. As such Excel (2013) opens without encoding correctly (i think it assumes ASCII if no BOM specified...), meaning that certain characters are displayed incorectly.
From reading around i have read that a BOM of "\uFEFF" should be entered at the start of the CSV file. I have tried opening in txt editor and adding the characters e.g.
\uFEFF
r1test 1, r1text2, r1text3
r2test 1, r2text2, r2text3
However, this does not solve the problem - the characters "\uFEFF" show up on the first row when I open in excel, rather than it beign interpreted as a BOM. I am not sure what I am doing wrong, and the format of how the text should be specified such that it is interpreted as a BOM, rather than text in the the first of the data
I have only very limited experience using CSV, and only just heard of a BOM... and thus I could be implementing this completely wrong!
(for reference, i know that I could specify the encoding if i use the import data option within excel... however I really want to work out how to get it correctly specified in advance such that I can just open the csv... I have several thousand of these files that I am creating and exporting - once I know how to do this 'manually' [i.e. by adding some text at start of a the file], I can configure to automatically do in Python).
Thanks in advance
For someone else wanting to tell Excel to add a BOM: See if you can "Save as Unicode Text".
source

Resources