getting UTF-8 text files to display on all browsers

getting UTF-8 text files to display on all browsers - browser

I would like my Web page
http://www.gmarks.org/math_in_e-mail.txt
on my Apache 2.2.14 server to display correctly in all browsers on all platforms (or something approaching this). I have included the line
AddDefaultCharset utf-8
in the appropriate .htaccess file. I find, however, that on my own machine, running Ubuntu 10.04, the page displays exactly as I would like only in the Google Chrome browser. Problems in other browsers: in Opera the last two lines do not display, in Firefox the subscripted aleph’s are too small, in rekonq the last two lines display incorrectly with various Fraktur characters repeated and others not displayed, in Midori the Opera and Firefox problems both occur, in Arora the Firefox and rekonq problems both occur, in Epiphany the Opera problem occurs.
Is there something else I could put in my .htaccess file, or some other configuration I might set up, to get that Web page to display correctly everywhere? I suppose I must rely on the font set each user has installed on his or her computer (obviously it defeats the purpose of the Web page to use something like GIF images). I find the differences among the browsers strange: does each browser include its own set of fonts in some configuration file, or do they access some directory containing fonts for the entire computer? (And is the answer to the last question OS-dependent?)
Further questions: would I do better to change the line in my .htaccess file to
AddCharset UTF-8 .txt
and is there a way I can make the .txt file display by default with an increased font size?

A browser will not know the text is UTF-8 encoded unless the text starts with a UTF-8 BOM (assuming the browser even looks for that) or the HTTP Content-Type header specifies UTF-8 as the Charset, ie: Content-Type: text/plain; charset=utf-8. If AddCharset tells Apache to generate that attribute for .txt files, then great.
There is no way to specify a font for a .txt file by itself. You have to use HTML for that. To specify a font for a .txt file, you would have to write a server-side script that outputs an HTML wrapper around the .txt file content and then sets the HTTP Content-Type header to specify text/html instead of text/plain as the data type.

Related

Web pages served by local IIS showing black diamonds with question marks

I'm having an issue in a .NET application where pages served by local IIS display random characters (mostly black diamonds with white question marks in them). This happens in Chrome, Firefox, and Edge. IE displays the pages correctly for some reason.
The same pages in production and in lower pre-prod environments work in all my browsers. This is strictly a local issue.
Here's what I've tried:
Deleted code and re-cloned (also tried switching branches)
Disabled all browser extensions
Ran in incognito mode
Rebooted (you never know)
Deleted temporary ASP.NET files
Looked for corrupt fonts on machine but didn't find any
Other Information:
Running IIS 10.0.17134.1
.NET MVC application with Knockout
I realize there are several other posts regarding black diamonds with question marks, but none of them seem to address my issue.
Please let me know if you need more information.
Thanks for your help!

You are in luck. The explicit purpose of � is to indicate that character encodings are being misused. When users see that, they'll know that we've messed up and lost some of their text data, and we'll know that, at one or more points, our processing and/or configuration is wrong.
(Fonts are not at issue [unless there as no font available to render �]. When there is no font available for a character, it's usually rendered as a white-filled rectangle.)
Character encoding fundamentals are simple: use a sufficient character set (say Unicode), pick an appropriate encoding (say UTF-8), encode text with it to obtain bytes, tell every program and person that gets the bytes that they represent text and which encoding is used. The encoding might be understood from a standard, convention, or specification.
Your editor does the actual encoding.
If the file is part of a project or similar system, a project file might store the intended encoding for all or each text file in the project. If your editor is an IDE, it should understand how the project does that.
Your compiler needs the know the encoding of each text file you give it. A project system would communicate what it knows.
HTML provides an optional way to communicate the encoding. Example: <meta charset="utf-8">. An HTML-aware editor should not allow this indicator to be different than the encoding it uses when saving the file. An HTML-aware editor might discover this indicator when opening the file and use the specified encoding to read the file.
HTTP uses another optional way: Content-Type response header. The web server emits this either statically or in conjunction with code that it runs, such as ASP.NET.
Web browsers use the HTTP way if given.
XHR (AJAX, etc) uses HTTP along with JavaScript processing. If needed the JavaScript processing should apply the HTTP and HTML rules, as appropriate. Note: If the content is JSON, the current RFC requires the encoding to be UTF-8.
No one or thing should have to guess.
Diagnostics
Which character encoding did you intend to use? This century, UTF-8 is so much the norm that if you choose to use a different one, you should have a good reason and document it (for others and your future self).
Compare the bytes in the file with the text you expect it to represent. Does it use the entended encoding? Use an editor or tool that shows bytes in hex.
As suggested by #snakecharmerb, what does the server send? Use a web browser's F12 network tab.
What does the HTTP response header say, if anything?
What does the HTML meta tag say, if anything?
What is the HTML doctype, if any?

How to use MS Word to create html that displays correctly on windows and linux server?

When I create a document with MS Word and upload it to an html server it it correctly displayed when it is a windows server, but not when it is a linux server.
I tried this with both IE and Firefox.
The meta tag in the source says charset=windows-1252
Displaying the source code in the browser shows exactly the same source as I uploaded, so the server is not changing that. Nevertheless are characters like accented e displayed as silly characters when obtained from the linux server.
So somewhere in the tcp/http/??? records that the server sends to the browser makes the browser interpret the characters different from what is ment.
What could that be?

When you create a document in MS Word, there are a lot of characters that you can't see that are actually in the file, such as end of line markers, page breaks, etc. which you will not notice until after you upload the file to the server.
You should always use a plain text editor such as Notepad++, or even bluefish to create these files. Sometimes you can get MSWord to do the trick if you make sure to save the file as a web document(htm or html), but the special characters will usually begin to cause problems depending on your goal.

Can I link to a .txt file in a way that prevents browsers from treating it like html?

Since a while, when I open a plain text file with long lines, the lines will break.
See this example: https://oeis.org/A195665/a195665_4.txt
In Firebug I can see, that the text is in <pre> tags in an html structure.
To avoid the line breaks, I have to click on "View Page Source".
Is there any server side way to prevent that?

I do not think the browser is at fault. I believe it is the web server's job to serve it up differently. You should google how to do that for your particular web server.

This works in Firefox and Chrome:
view-source:http://oeis.org/A195665/a195665_4.txt
But not in IE. However, IE doesn't break the lines in the first place.

Get IE 10 to download .dwg files

I'm creating a web app that has a list of several .dwg files (These are AutoCAD drawings FYI).
I have an image of a .dwg icon that when clicked is supposed to download the .dwg file. This works in every other browser except IE 10. IE 10 just displays a blank page when the link is clicked, and the url displayed is that of the correct .dwg file. I'm assuming IE 10 is trying to open and display this file much like it would a .pdf, but I'm not certain of this.
I'm using xampp to access the page, and I've tried configuring the .htaccess file to do a force download of .dwg files but to no avail. My .htaccess file is below. I'm new to configuring .htaccess files, but from what I've been reading this is the proper way to do a force download.
AddType application/acad .dwg
AddType application/octet-stream .dwg
#Note that this is my entire .htaccess file
I tried this without the "application/acad .dwg" line at first, then I tried adding the MIME type but that did not work either.
I think it is worth noting that, as a test, I tried to force download .pdfs using the method above. This did not work either.
So I also tried looking in the httpd.conf file to see if there was a module that was commented out I could use, however, the instructions in the file told me not to mess with anything unless I knew what I was doing. I'm pretty new to this, so I looked through the modules for something obvious but couldn't find anything.
So my question is: How do I get these files do download in IE 10?
Other things to note:
The files are downloaded by every other browser I've tried. (I have access to a machine with IE 8 and even that downloads the files)
I do not have any browser extensions from Autodesk that would interfere with this. (The only non-out-of-the-box extension I have is avast! WebRep, however, this is on all other browsers as well.)
All .dwg files are not corrupted and perform normally.
Probably most importantly, my .htaccess file is working and is recognized. To test this I typed in junk text, saved it, then received a 500 error when I tried to reload the page.

Questions on Chinese Encoding

I'm trying to create a webpage in Chinese and I realized that while the text looks fine when I run it on browsers, once I change the Character Encoding, the text becomes gibberish. Here's what's happening:
I create my html file in Emacs, encoded in UTF-8.
I upload it to the server, and view it on my browsers (FF, IE, Chrome, Opera) - no problem.
I try to view the page in other encodings via FF > View > Character Encoding > All those different Chinese encoding systems, e.g. Chinese Simplified (HZ)
Apart from UTF-8, on every other encoding the text becomes gibberish.
I'm assuming this isn't a problem - i.e. browsers are smart enough to know which encoding the page is in, and parse the content accurately. What I'm wondering is why I can't read the Chinese text anymore once I change encoding - is it because I don't have Chinese fonts installed on my OS? Should I stick to UTF-8 if my audience are Chinese or should I choose among one of their many encoding systems?
Thanks in advance for your help/opinions.

UTF isn't a 'catch-all' encoding. It's designed to contain international language character symbols for ease of use, but it is still an encoding, just like the other encodings you've selected. You would have to retype the text in each encoding to make it appear correctly when viewed with that encoding.

Viewer encoding MUST match the file being read. Viewing UTF-8 as something other makes about same sense as renaming .txt to .exe and trying to run it.
You should specify correct encoding in HTML. The option you're using in web browser exist only for those rare occasions when web developer screwed up his job and declared other encoding than actually used OR mixed up 2 different encodings on one page.

Of course changing the encoding in your browser will "break" the text! The browser is taking the stream of UTF-8 codepoints and tries to force another encoding on the raw data. Needless to say, the result ain't pretty. Changing the encoding in the browser is NOT the equivalent of converting.
As you surmised correctly, modern browsers usually guess correctly -- but not always. As Agent_L make sure to declare the encoding in the headers.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string