Closed. This question is off-topic. It is not currently accepting answers.
Want to improve this question? Update the question so it's on-topic for Stack Overflow.
Closed 11 years ago.
Improve this question
Is there a free tool under linux system, for converting ascii text to unicode by keeping original text formatting ?
iconv can convert between different encodings, if that's what you mean.
Sure, it's called cat:
cat myasciifile > myunicodefile
Now myunicodefile consists of unicode codepoints, encoded in the popular UTF8 encoding. Note that this assumes that myasciifile consists only of legal ASCII characters (i.e. in the range 0-127).
An alternative to this is cp.
Related
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about a specific programming problem, a software algorithm, or software tools primarily used by programmers. If you believe the question would be on-topic on another Stack Exchange site, you can leave a comment to explain where the question may be able to be answered.
Closed 8 years ago.
Improve this question
I've just discovered that many of the character encodings have non-printable characters designed to separate different pieces of information, e.g., ASCII character 31 is the "unit separator". It seems that if I could use one of these characters as the separator in a "comma/character separated value" (CSV), I could avoid problems in escaping the values that could occur in text, e.g., commas.
Is it possible to make MS Excel use the "unit separator" instead of a comma? If so, how can I do that?
EDIT: I've re-entered this question in SuperUser, where it's been answered.
You can change the list separator for a "CSV" system-wide by going into Control Panel, Region and Language, Formats, Advanced (on Windows 7), and change the list separator. I changed it to a backtick and then saved a spreadsheet as a CSV, and it used a backtick as the delimiter. You can try pasting the ASCII character in from the character map or using the Alt+# code.
(If you're trying to open a file, the Import Wizard should let you set the delimiter that's being used in the file).
It may be simpler to save as a tab-separated document, unless tabs are appearing in your text.
Closed. This question is off-topic. It is not currently accepting answers.
Want to improve this question? Update the question so it's on-topic for Stack Overflow.
Closed 10 years ago.
Improve this question
I have a text file that contains lots of data arranges as String separated by lines. I am facing an error in another program that uses this file. It seems there is a non alphabetical/numerical character. How can I search for it? How can I remove it if found ?
EDIT: including spaces (i.e, the line is white space).
Try this command:
cat bigfile.txt| od -cx | less
Unprintable characters will show up as "\xxx". Printable characters will be displayed as ASCII text. All characters - printable or not - will be displayed as hex, and you can see the offsets in the file.
Once you know 1) what the offending characters are, or 2) where they're located, you can develop a strategy for eliminating them. Perhaps all you need is a simple "sed" script.
PS:
"vi -b" would probably also work, even for what might seem gargantuan files. IMHO...
Closed. This question is off-topic. It is not currently accepting answers.
Want to improve this question? Update the question so it's on-topic for Stack Overflow.
Closed 10 years ago.
Improve this question
Does anyone know of a way to see file in Hex on Linux so that the view is updated when a file on Disk changes.
You can use vim. To switch into hex mode type:
:%!xxd
To exit from hex mode hit escape again and type:
:%!xxd -r
Also check out how to edit binary file on the unix systems
Closed. This question is off-topic. It is not currently accepting answers.
Want to improve this question? Update the question so it's on-topic for Stack Overflow.
Closed 10 years ago.
Improve this question
Okay, I am not sure if I am phrasing this correctly but I am given url in the following form:
http://zh.wikipedia.org/wiki/%E5%A4%A9%E6%96%87%E5%AD%B8
but I would to convert it into Chinese characters like so:
http://zh.wikipedia.org/wiki/天文學
what is a good language to batch process that in? Are there existing functions to do the conversion or do I have to rely on string manipulation. It seems like to convert requires some sort of a call to a lookup table.
I know that they are the same thing but they are shown differently. I like it in readable characters not with %#.
In JavaScript
alert(decodeURI("http://zh.wikipedia.org/wiki/%E5%A4%A9%E6%96%87%E5%AD%B8"))
See http://jsfiddle.net/rtoal/uv2Xy/
For other languages, search the web for "url decoding" (or uri decoding).
Closed. This question is off-topic. It is not currently accepting answers.
Want to improve this question? Update the question so it's on-topic for Stack Overflow.
Closed 9 years ago.
Improve this question
enscript doesn't support utf-8 and the only other suggestion I've seen is to use lpr:
lpr -o document-format=text/utf8 file_to_print
but that gives an "Unsupported format" error.
(Ubuntu 9.04 / GNOME Terminal 2.26.0)
what about converting to post script and then printing PS directly?
You can use paps to convert UTF-8 input to postscript. A bit crude but it seems to get the job done, at least for Latin-1 glyphs.