Remove tag &#8203 in WP theme [duplicate] - vim

EDIT: You can see the issue here (look in source).
EDIT2: Interesting, it is not an issue in source. Only with the console (Firebug as well).
I have the following markup in a file called test.html:
​<!DOCTYPE html>
<html>
<head>
<title>Test Harness</title>
<link href='/css/main.css' rel='stylesheet' type='text/css' />
</head>
<body>
<h3>Test Harness</h3>
</body>
</html>
But in Chrome, I see:
<!DOCTYPE html>
<html>
<head>
</head>
<body>
"​
"
<title>Test Harness</title>
<link href='/css/main.css' rel='stylesheet' type='text/css' />
<h3>Test Harness</h3>
</body>
</html>
It looks like &#802 is a zero width space, but what is causing it? I am using Sublime Text 2 with UTF-8 encoding and Google App Engine with Jinja2 (but Jinja is simply loading test.html). Any thoughts?
Thanks in advance.

It is an issue in the source. The live example that you provided starts with the following bytes (i.e., they appear before <!DOCTYPE html>): 0xE2 0x80 0x8B. This can be seen e.g. using Rex Swain’s HTTP Viewer by selecting “Hex” under “Display Format”. Also note that validating the page with the W3C Markup Validator gives information that suggests that there is something very wrong at the start of the document, especially the message “Line 1, Column 1: Non-space characters found without seeing a doctype first.”
What happens in the validator and in the Chrome tools – as well as e.g. in Firebug – is that the bytes 0xE2 0x80 0x8B are taken as character data, which implicitly starts the body element (since character data cannot validly appear in the head element or before it), implying an empty head element before it.
The solution, of course, is to remove those bytes. Browsers usually ignore them, but you should not rely on such error handling, and the bytes prevent useful HTML validation. How you remove them, and how they got there in the first place, depends on your authoring environment.
Since the page is declared (in HTTP headers) as being UTF-8 encoded, those bytes represent the ZERO WIDTH SPACE (U+200B) character. It has no visible glyph and no width, so you won’t notice anything in the visual presentation even though browsers treat it as being data at the start of the body element. The notation ​ is a character reference for it, presumably used by browser tools to indicate the presence of a normally invisible character.
It is possible that the software that produced the HTML document was meant to insert ZERO WIDTH NO-BREAK SPACE (U+FEFF) instead. That would have been valid, since by a special convention, UTF-8 encoded data may start with this character, also known as byte order mark (BOM) when appearing at the start of data. Using U+200B instead of U+FEFF sounds like an error that software is unlikely to make, but human beings may be mistaken that way if they think of the Unicode names of the characters.

I understand that there is a bug in SharePoint 2013 where the HTML editor adds these characters into your content.
I've been dealing with this for a bit and this is the solution I am using which seems to be working. I added this javascript into a file referenced by my masterpage.
var elements = ["h1","h2","h3","h4","p","strong","label","span","a"];
function targetZWS(){
for (var i = 0; i < elements.length; i++) {
jQuery(elements[i]).each(function() {
removeZWS(this);
});
}
}
function removeZWS(target) {
jQuery(target).html(jQuery(target).html().replace(/\u200B/g,''));
}
/*load functions*/
$(document).ready(function() {
_spBodyOnLoadFunctionNames.push("targetZWS");
});
Links I looked into investigating this:
https://social.msdn.microsoft.com/Forums/sharepoint/en-US/23804eed-8f00-4b07-bc63-7662311a35a4/why-does-sharepoint-put-in-character-code-8203-in-a-richtext-field?forum=sharepointdevelopment
https://social.technet.microsoft.com/Forums/office/en-US/e87a82f0-1ab5-4aa7-bb7f-27403a7f46de/finding-8203-unicode-characters-in-my-source-code?forum=sharepointgeneral
http://www.sharepointpals.com/post/Removing-8203-in-RichTextHTML-field-Sharepoint

Try this script. It works for me
$( document ).ready(function() {
var abc = document.body.innerHTML;
var a = String(abc).replace(/\u200B/g,'');
document.body.innerHTML = a;
});

I have experienced this in a major project I was working on.
The trick is to just:
copy the whole code into notepad.
save it as a text file.
close the file. open it again and copy your code back into your IDE
environment.
and its voilà, it's gone.!

I was able to remove these in Sublime by selecting the characters surrounding it and copy/pasting into Find and Replace.

In my case, symbol "​" did not appear in the code editor MS Code and was visible only in the tab Elements Chrome. It helped to delete the tag after which this symbol appeared and the reprint of this tag was handwritten again, apparently this symbol clung to the ctrl+c / ctrl+v while transferring the code.

This “8203;” HTML character is a no width break control.
It can easily find in the Google Chrome Browser inspect elements section. And When you try to remove it from your code, most of the Major IDE not showing to me...(Maybe by my preference).
I found the new text editor Brackets download it and open my code in the editor. It shows the character with red dots. Just remove it check everything is working well.
I found this solution from a blog. What is “8203​” HTML character? Why is being injected into my HTML?
Thank You for saving me hours.

I cannot find where it's being injected on my page. I'll investigate it more later, but for now, I just threw this in my page so I can keep working.
$(function(){
$('body').contents().eq(0).each(function(){
if(this.nodeName.toString()=='#text' && this.data.trim().charCodeAt(0)==8203){
$(this).remove();
}
});
});

Related

Sublime Text adding additional opening tag at the beginning?

So every time I use html snippet or boiler plate with <ht + tab or enter
I get this extra opening tag? What gives?
<<!doctype html> <---- whats that additional tag at the beginning?
<html>
......
....
I got emmet installed by the way. Thanks
It's a snippet. You type html (or less), and press tab, it'll inserts all this content:
<!DOCTYPE html>
<html>
<head>
<title>$1</title>
</head>
<body>
$0
</body>
</html>
Note that if you repress tab again, it'll go to $1, and the last one is $0 (by default it's the end of the content).
So, don't type <ht, just ht, tab, and it'll insert everything for you. I really recommend you find yourself a course about Sublime Text, you're going to miss so much otherwise
That is the doctype decleration this is straight out of hte W3School docs:
The declaration must be the very first thing in your HTML
document, before the tag.
The declaration is not an HTML tag; it is an instruction to
the web browser about what version of HTML the page is written in.
In HTML 4.01, the declaration refers to a DTD, because HTML
4.01 was based on SGML. The DTD specifies the rules for the markup language, so that the browsers render the content correctly.
HTML5 is not based on SGML, and therefore does not require a reference
to a DTD.
Tip: Always add the declaration to your HTML documents, so
that the browser knows what type of document to expect.
You can read more about it here: http://www.w3schools.com/tags/tag_doctype.asp

How to tell Microsoft Edge what it should display in reading mode

Reading mode in Spartan/Edge seems to choose, somehow, which div on the site to display in reading mode. In many pages, it does not find the appropriate div (like bbc.co.uk).
However, on our site, it enables reading mode, but then displays the completely wrong part of the page.
So - how can I tell it to take the right part or at least how to disable it on those pages
You can find information on how to optimize reading view, as well as how to opt-out, here: http://dev.modern.ie/testdrive/demos/readingview/
07/10: Edit to include specific information
Specifically, you may be interested in optimizing your title, body, and image markup to ensure a good reading mode experience.
Title
Your page should include a <title> element in the header. In addition, you should include a <meta title=""> tag that matches your main heading in your content section.
Body text
Ensure your main content does not include a lot of deeply nested elements and that font-sizes and other styles are uniform. Style variations for things like pull quotes, etc. should still be fine.
Images
The first eligible image becomes the dominant image of the article. The dominant image is rendered as the first piece of content and given full column width. All following images are rendered as inline images within the article.
Images are recommended to be wrapped in <figure> tags with no more than two <figcaption> tags.
Opting out
Including this meta tag will disable reading mode in IE11 and, currently, Microsoft Edge.
<meta name="IE_RM_OFF" content="true">
Add the following Tag
<meta name="IE_RM_OFF" content="true">
Check the Below for more details
http://dev.modern.ie/testdrive/demos/readingview/

webpage print without the header and footer from the browser

I have a webpage, and I am using JavaScript .print();. however, I don't want have the header and footer from the browser(date and url). I had some research from the internet, it is within the browser, are controlled at the operating system/printer driver level and are not controllable at the HTML/CSS/DOM level. so my question is, is there any other options to suppress this with coding? like generate a file first and then print it from the file?
I think you could generate a PDF of the content and then set it to be printed. That way you could avoid the header and footer to be printed.
Look into a print.css file. You can actually dictate what gets printed, and only what you want, in a hidden area.
As per this SO thread: Print.css
This may be an old question, but I was able to print a webpage without having the URL and date-time (which the browser adds on its own) printed.
I just added #page { margin: 0; }. I tested this to work with Chrome 70 and Firefox 61.
Do remember to specify it in the print media in css by wrapping it in #media all {<css contents here>} or #media print {<css contents here>}. Alternatively, in html tag, <link rel="stylesheet" media="all" href="<css file url>">

Meta keywords before doctype declaration?

I recently was browsing a local web design firm's portfolio and found all their sites' code begins as such:
<meta name="keywords" content="a whole bunch of keywords for their site">
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<html>
...
I was able to determine that the pages were generated by dreamweaver (at least in part).
Did dreamweaver do this, or did their "developer" just paste the code at the top of the document.
It is my impulse that this is bad practice and it might work incorrectly on some platforms but it got me wondering as to whether or not their may be a reason for this?
That is a terrible practice and invalid HTML. I bet that this would throw IE directly into quirks mode.
But as for your question, either the developer is a script kiddie and shoved the <meta> tag in there with little knowledge of the outcomes, or Dreamweaver did it. I hope it was Dreamweaver...
FYI - just had this issue and Dreamweaver does not put the meta tags in the correct position automatically. Cursor must be placed beforehand into an editable region.

VB/VBA: Fetch HTML string from clipboard (copied via web browser)

It seems that when you copy something from a web browser to the clipboard, at least 2 things are stored:
Plain text
HTML source code
Then it is up to the software that you are pasting into can determine which one it wants.
When pasting into MS Excel 2003, you have a paste special option to paste HTML, which will paste the formatted HTML (as it is displayed by the browser).
What I want to do is paste the actual source code as plain text. Can this be fetched from the clipboard in VBA?
Edit I'm trying to access all the source-code of the copied HTML, including the tags.
This time I've read the question properly and realised coonj wants to get the HTML from the clipboard including tags.
I believe this is reasonably difficult. You need to read the clipboard using Windows API calls. And then, parse the resulting CF_HTML string which has some wacky headers added on top of the HTML.
Microsoft Knowledge Base article with Windows API code to read the CF_HTML from the clipboard (function GetHTMLClipboard).
You will then probably want to ignore the wacky headers. Microsoft documents the format here. An example CF_HTML fragment is shown below. You could probably come up with some guesswork method of skipping the first few lines.
Version:0.9
StartHTML:71
EndHTML:170
StartFragment:140
EndFragment:160
StartSelection:140
EndSelection:160
<!DOCTYPE>
<HTML>
<HEAD>
<TITLE>The HTML Clipboard</TITLE>
<BASE HREF="http://sample/specs">
</HEAD>
<BODY>
<!--StartFragment --> <P>The Fragment</P>
<!--EndFragment -->
</BODY>
</HTML>
It might also be worth thinking whether there's any other way of solving your problem. E,g, Will the browser always be Internet Explorer? Can you get what you need by walking the HTML tree using the COM object model?
EDIT: coonj has tried this now and says "the GetHTMLClipboard function seems to work with both Firefox and IE, and it doesn't look like it is throwing those headers in there"
VB6 has the Clipboard object that allows you to get the clipboard data in different formats. VBA doesn't have this object. But there are windows API calls you can use. You can see a sample implementation for VBA here.

Resources