Translating text in Microsoft translator removes // - python-3.x

I am trying to translate a code file and I see the below issues with Microsoft translator
Translating below text
// 行番号削除正常終
Gives output
Line number deletion successful end
where it removes the //(double slash)
Translating below text
System.out.println("【○】COBOL
Gives output
System.out.println_○○ COBOL
Where the special characters are removed and replaced with some other.
I am reading my file using CP932 encoding and writing using UTF-8.
Please let me know if there are any ideas on how to resolve this issue.
I have tried Google translate and it works well on the same encodings.

Related

Unable to paste text in readable format from a PDF

I have a PDF document with the following sample text (screenshot) -
But when I copy and paste it to either word or other text editors all I see is the weird characters :





I am not quite sure why does it giving me weird square boxes instead of pasting the clear human-readable letters (just like the screenshot). Can someone help me how can I get rid of this issue ? Or at least what shall I do to identify the root cause of this strange issue ?
================== Workaround found ==================
I tried converting the document's corrupted unicode to a standard ANSCI unicode formats. But most of the online services couldn't recognize these garbage/weird characters.
This issue could be resolved by some programming, but I don't want to invest time with the programming approach and preferred on the fly approach.
Finally, as suggested by the user 'mkl', converting this document by using the OCR services like "Sedja"/ "Adobe OCR" resolved by issue.

Preset encoding for Search-in-Files Feature

I have a huge Filedump to handle (7000+ Files) which are all encoded in OEM-US (and i need them to remain OEM-US or return to OEM-US when I'm done)
The search in files feature from Notepad++ would actually solve all my Problems. (It's a single use job - I don't want to bore you with the details but its about sanatizing old code which has partially been written in foreign languages like german or french including their notorious characters like äöüèéàç)
The thing is: Most of the time, Notepad++ detects the wrong encodings and different encodings for different files. Usually, it detects ANSI or UTF-8 but sometimes it get exotic and all of a sudden my files are supposed to be encoded in Shift-JIS or Big5 Which messes up my search terms as they sometimes turn different special chars into the same set of replacement chars.
So I'm looking for a way to either
a) Tell notepad++ which encoding to select for the "search in files" job i want to run.
b) convert all Files to UTF-8, run the search-replace job there and restore the encoding to OEM-US
or
c) Find a different Software to handle this issue for me
Can someone help me?

python reverse unicode text into readable

i believe i have similar problem to this how to convert unicode text to utf8 text readable? but i want a python 3.7 solution to it
i am a complete newbie, i have some experience with python so i am trying to use it to make a script that will convert a Unicode file into the previous readable text it was.
the file is a bookmark file i have recovered using easeusa then i opened the bookmark file and it is writen in unicode something like "&PŽ¾³kÊ
k-7ÄÜÅe–?XBdyÃ8߯r×»Êã¥bÏñ ¥»X§ÈÕÀ¬Zé‚1öÄEdýŽ‹€†.c"
whereas previously is said something like " "checksum": "112d56adbd0caa2b3693bb0442dd16ff",
"roots": {
"bookmark_bar": {
"children":"
fyi when i click save as for the unicode bookmark file, for unicode it has ANSI and not utf-8 maybe it was saved us ANSI, i might be waffling here but i'm just trying to give you all the information you might need to help me
i am a newbie who depressingly need help
This text isn't "Unicode". It's simply gibberish.
This file has been corrupted -- it may have been overwritten with other data before you were able to recover it. It is unlikely to be recoverable.

How to mannually specify Byte Order Mark in CSV

I have a CSV that is encoded in Unicode, however lacks a byte order mark at the start. As such Excel (2013) opens without encoding correctly (i think it assumes ASCII if no BOM specified...), meaning that certain characters are displayed incorectly.
From reading around i have read that a BOM of "\uFEFF" should be entered at the start of the CSV file. I have tried opening in txt editor and adding the characters e.g.
\uFEFF
r1test 1, r1text2, r1text3
r2test 1, r2text2, r2text3
However, this does not solve the problem - the characters "\uFEFF" show up on the first row when I open in excel, rather than it beign interpreted as a BOM. I am not sure what I am doing wrong, and the format of how the text should be specified such that it is interpreted as a BOM, rather than text in the the first of the data
I have only very limited experience using CSV, and only just heard of a BOM... and thus I could be implementing this completely wrong!
(for reference, i know that I could specify the encoding if i use the import data option within excel... however I really want to work out how to get it correctly specified in advance such that I can just open the csv... I have several thousand of these files that I am creating and exporting - once I know how to do this 'manually' [i.e. by adding some text at start of a the file], I can configure to automatically do in Python).
Thanks in advance
For someone else wanting to tell Excel to add a BOM: See if you can "Save as Unicode Text".
source

How to write excel file with special characters through Perl script?

I am writing Excel file through perl code. When I insert data in XML file and view in any browser, I see correct data with special characters, but when I write the same data in Excel file, it is showing garbage characters.
For eg.:
(word from XML file on browser) Gràcia - (word from Excel file) Grà cia
I am using 'Spreadsheet::XLSX' for reading excel and 'Excel::Writer::XLSX' for writing excel.
Also need help in finding the encoding format of excel fields.
Do you have any idea? Thanks in advance.
This seems very much like UTF-8 to iso-8859-1 conversion going wrong - seems like a string that contains UTF-8, but is not marked as being UTF-8, is being passed to $worksheet->write(). Since http://metacpan.org/pod/Excel::Writer::XLSX#UNICODE-IN-EXCEL claims to handle unicode correctly, it seems to be a problem with your input string, not the write method itself.
As you don't post any code, and don't tell us where your strings come from, i can't tell why the strings aren't marked correctly.
You can probably get away with
Encode::_utf8_on($str)
before passing your strings to $worksheet->write(), but this might just as well break other things, if not all of your strings are really utf-8. Basically the answer is "get the utf-8 flag on your strings right when you read them".

Resources