I'm having an issue in a .NET application where pages served by local IIS display random characters (mostly black diamonds with white question marks in them). This happens in Chrome, Firefox, and Edge. IE displays the pages correctly for some reason.
The same pages in production and in lower pre-prod environments work in all my browsers. This is strictly a local issue.
Here's what I've tried:
Deleted code and re-cloned (also tried switching branches)
Disabled all browser extensions
Ran in incognito mode
Rebooted (you never know)
Deleted temporary ASP.NET files
Looked for corrupt fonts on machine but didn't find any
Other Information:
Running IIS 10.0.17134.1
.NET MVC application with Knockout
I realize there are several other posts regarding black diamonds with question marks, but none of them seem to address my issue.
Please let me know if you need more information.
Thanks for your help!
You are in luck. The explicit purpose of � is to indicate that character encodings are being misused. When users see that, they'll know that we've messed up and lost some of their text data, and we'll know that, at one or more points, our processing and/or configuration is wrong.
(Fonts are not at issue [unless there as no font available to render �]. When there is no font available for a character, it's usually rendered as a white-filled rectangle.)
Character encoding fundamentals are simple: use a sufficient character set (say Unicode), pick an appropriate encoding (say UTF-8), encode text with it to obtain bytes, tell every program and person that gets the bytes that they represent text and which encoding is used. The encoding might be understood from a standard, convention, or specification.
Your editor does the actual encoding.
If the file is part of a project or similar system, a project file might store the intended encoding for all or each text file in the project. If your editor is an IDE, it should understand how the project does that.
Your compiler needs the know the encoding of each text file you give it. A project system would communicate what it knows.
HTML provides an optional way to communicate the encoding. Example: <meta charset="utf-8">. An HTML-aware editor should not allow this indicator to be different than the encoding it uses when saving the file. An HTML-aware editor might discover this indicator when opening the file and use the specified encoding to read the file.
HTTP uses another optional way: Content-Type response header. The web server emits this either statically or in conjunction with code that it runs, such as ASP.NET.
Web browsers use the HTTP way if given.
XHR (AJAX, etc) uses HTTP along with JavaScript processing. If needed the JavaScript processing should apply the HTTP and HTML rules, as appropriate. Note: If the content is JSON, the current RFC requires the encoding to be UTF-8.
No one or thing should have to guess.
Diagnostics
Which character encoding did you intend to use? This century, UTF-8 is so much the norm that if you choose to use a different one, you should have a good reason and document it (for others and your future self).
Compare the bytes in the file with the text you expect it to represent. Does it use the entended encoding? Use an editor or tool that shows bytes in hex.
As suggested by #snakecharmerb, what does the server send? Use a web browser's F12 network tab.
What does the HTTP response header say, if anything?
What does the HTML meta tag say, if anything?
What is the HTML doctype, if any?
Related
I'm having trouble getting started with building a Print Monitor / Print Handler for Windows using Visual Studio 2012 Ultimate with WDK 8. Basically, this is what I am trying to accomplish:
Create a print monitor (something an application can print to) that will generate a file with the content that should be printed (like the default XPS printer or a PDF printer), and then invokes the print handler
Create a print handler that will parse the generated file and do certain actions with it (check to see if certain text is present, upload the file online, etc)
I feel like the print handler part should not be too hard, but starting with the print monitor is what I'm stuck at. What would I do within VS12? I see options for "Printer Driver V4", "Printer Driver V4 Property Bag", and "Printer XPS Render Filter". Should I use one of those templates, and, if so, what would I do within them? Anything pointing me in the right direction would be appreciated!
EDIT:
Just some more clarification - I only need the text from the print output, but I've read from various sources that getting text-only output leads to no output at all from sources like Firefox, etc since they print text as glyphs.
I will be using the print handler to parse the text for keywords and then upload that information to a web server in a specific format. The print monitor just needs to capture and save the text information from whatever application is printing.
As you pointed out in your comments, some applications such as Firefox print using glyph indices instead of characters. In fact, quite a few do and it's becoming more common. What you need is a print driver. The good news is Microsoft has already written it for you and provided you with sample source code in the WDK. Start by reviewing this to understand your options. The Unidriver is perhaps a little simpler but the Postscript driver has the advantage of generating output that can readily be transformed to PDF or other formats that retain text information (as opposed to raster page images that lose all text information). As far as I'm concerned, don't even think about XPS; it's just an all around disaster.
To handle glyph indices, what you'll need to do is add code to the driver's OEMTextOut function that uses the font's cmap tables to translate glyph indices back into character codes. I'm unaware of any public domain libraries that parse font files, so you'll likely have to write your own code to do this. (Hint: If you support only OpenType/TrueType fonts, you'll cover 99% of all printing applications).
Getting the Microsoft sample code to build, install and run is mostly straightforward, but if you're new to the WDK and installing print drivers, plan on spending a week or more on just that. The glyph index translation part is far more complex and you should plan on spending a lot more time on that.
When I create a document with MS Word and upload it to an html server it it correctly displayed when it is a windows server, but not when it is a linux server.
I tried this with both IE and Firefox.
The meta tag in the source says charset=windows-1252
Displaying the source code in the browser shows exactly the same source as I uploaded, so the server is not changing that. Nevertheless are characters like accented e displayed as silly characters when obtained from the linux server.
So somewhere in the tcp/http/??? records that the server sends to the browser makes the browser interpret the characters different from what is ment.
What could that be?
When you create a document in MS Word, there are a lot of characters that you can't see that are actually in the file, such as end of line markers, page breaks, etc. which you will not notice until after you upload the file to the server.
You should always use a plain text editor such as Notepad++, or even bluefish to create these files. Sometimes you can get MSWord to do the trick if you make sure to save the file as a web document(htm or html), but the special characters will usually begin to cause problems depending on your goal.
I have gone thru several relevant looking questions but they did not contain the answer I am looking for. So, here is my question:
I have several web applications at my workplace, which are written using different frameworks and the authors are long gone to ask for feature updates. Hence I have to go thru the same grueling sequence of actions to get, which amounts to a file size of few kilobytes, everyday.
I tried parsing the page source but the programming technique of the authors were all over the place. Some even intentionally obscure the code to not let the data show as text, and there is no reason for this as the code they wrote is company asset. Long story short, I realized if I can copy and paste the textual content of these pages, I can process that data much easily than parsing the page source to get the text (which is sometimes totally impossible)
So, I am now looking for a browser plug-in (in windows or linux environments) or equivalent text based tools on windows or linux, which will load these pages and save the text on the screen to file(s) when invoked.
Despite how hard I tried, I am coming up empty handed.
I do not want to utilize the services of a third party screen-scraping web site, as the data is company confidential and not accessible by outside parties. Everything has to happen on the client end as I do not have access to the servers these apps are running on (mostly IIS on windows front end and a oracle db at the back end. The middle tier, as I have explained before is anyone's wild guess, ranging from native oracle apps to weblogic to tomcat and to some in house developed java/javascript stuff.
Thanks for all the help in advance
After searching for an answer for well over a year, I came to realize, as long as I use windows, a modern version of it that is, autohotkey is my savior.
I open the web page, maximize it, place my cursor (mousemove, x, y) then left click (mouseclick, L) then send ctrl-A followed by ctrl-C.
Voila ! everything is in the clipboard. Then I activate my unix session (winactivate PuTTY) and send appropriate key press commands to launch the editor of my choice (which is vi) and finally send a shift-Insert to paste the clipboard into my document. Then save and exit of course.
As an added bonus, right after my document is saved, I can invoke the script of my choice to parse this file and give me back the portion(s) I am interested in.
I know it is not bullet proof, but for my purpose, it helps to a great extent. As a matter of fact, I can do whatever I want with this method.
What about something like this: http://www.nirsoft.net/utils/htmlastext.html
Freeware that converts an HTML page to text
Any of links, lynx or w3m will do what you want, they are text browsers and you can dump text from a webpage with, for example:
w3m -dump http://www.google.com > g.txt
We are considering allowing user uploaded SVGs in our web app. We've been hesitant to do this before, due to a large number of complex vulnerabilities that we know exist in untrusted SVGs. A coworker found the --vacuum-defs option to Inkscape, and believes that it renders all untrusted SVGS safe for processing.
According to the manpage, that option "Removes all unused items from the section of the SVG file. If this option is invoked in conjunction with --export-plain-svg, only the exported file will be affected. If it is used alone, the specified file will be modified in place." However, according to my coworker, "Scripting is removed, XML transformations are removed, malformations are not tolerated, encoding is removed and external imports are removed.
Is this true? If so, is it enough that we should feel safe accepting untrusted SVGs? Is there any other preprocessing we should do?
As I understand it, the main concern of serving untrusted SVGs is the fact that SVG files can contain Javascript. This is obvious for SVG because embedded javascript is part of the format, but it can happen with every type of uploaded file if the browser is not careful.
Therefore, and even though modern browsers do not execute scripts found in the < img > tags, just in case I think it's good to serve the images from a different domain with no cookies/auth attached to it, so that any executed script will not compromise users' data. That would be my first concern.
Of course if the user downloads the SVG and then opens it from the desktop and happens to open it with the browser, it might execute the potentially malicious load. So back to the original question, --export-plain-svg does remove scripting, but as I don't know of other SVG-specific vulnerabilites, I haven't checked for them.
I'm trying to create a webpage in Chinese and I realized that while the text looks fine when I run it on browsers, once I change the Character Encoding, the text becomes gibberish. Here's what's happening:
I create my html file in Emacs, encoded in UTF-8.
I upload it to the server, and view it on my browsers (FF, IE, Chrome, Opera) - no problem.
I try to view the page in other encodings via FF > View > Character Encoding > All those different Chinese encoding systems, e.g. Chinese Simplified (HZ)
Apart from UTF-8, on every other encoding the text becomes gibberish.
I'm assuming this isn't a problem - i.e. browsers are smart enough to know which encoding the page is in, and parse the content accurately. What I'm wondering is why I can't read the Chinese text anymore once I change encoding - is it because I don't have Chinese fonts installed on my OS? Should I stick to UTF-8 if my audience are Chinese or should I choose among one of their many encoding systems?
Thanks in advance for your help/opinions.
UTF isn't a 'catch-all' encoding. It's designed to contain international language character symbols for ease of use, but it is still an encoding, just like the other encodings you've selected. You would have to retype the text in each encoding to make it appear correctly when viewed with that encoding.
Viewer encoding MUST match the file being read. Viewing UTF-8 as something other makes about same sense as renaming .txt to .exe and trying to run it.
You should specify correct encoding in HTML. The option you're using in web browser exist only for those rare occasions when web developer screwed up his job and declared other encoding than actually used OR mixed up 2 different encodings on one page.
Of course changing the encoding in your browser will "break" the text! The browser is taking the stream of UTF-8 codepoints and tries to force another encoding on the raw data. Needless to say, the result ain't pretty. Changing the encoding in the browser is NOT the equivalent of converting.
As you surmised correctly, modern browsers usually guess correctly -- but not always. As Agent_L make sure to declare the encoding in the headers.