Facebook scraper closes the head tag on my page when it finds a <noscript> tag - meta-tags

When the Facebook scraper looks at my page it closes the head tag when it finds a noscript tag. If you view the page source you will see that the meta tags are within the head tag of the page. Look at the debugger for the error: http://developers.facebook.com/tools/debug/og/object?q=http%3A%2F%2Fwww.rightmove.co.uk%2Fproperty-for-sale%2Fproperty-34534103.html
On the Facebook open graph debugger you get a different page that has closed the head tag early: http://developers.facebook.com/tools/debug/og/echo?q=http%3A%2F%2Fwww.rightmove.co.uk%2Fproperty-for-sale%2Fproperty-34534103.html
Strange ehy?

A <noscript> tag wouldn't be valid in a <head> element would it? Maybe Facebook's parser is considering the presence of the noscript tag to be implicitly ending the <head>?
See here also: <noscript> in <head>

Related

Is there anyway to make web app's components do not look so small on mobile

I've made a website. It looks nice when using PC, but not good when using smartphone (Too small). Is there anyway to fix that?
Here is my website: https://hai-weather-forecast.herokuapp.com/
Did you try to add the responsive meta tag in the head element?
<head>
...
<meta name="viewport" content="width=device-width">
...
</head>
Here is the background docs
I tested by open the url you provided, setting the same device size with developer tools, and adding that meta tag in the head element editing the html by hand.
Beffore adding meta tag
After adding meta tag
Cheers!

Retrieve relative urls from a text

I have a string of HTML with both absolute and relative URLs and I'm trying to retrieve only the relative URLs. I tried using the get-urls package but this only retrieves absolute URLs.
An example of the string of html received.
<!DOCTYPE>
<html>
<head>
<title>Our first HTML page</title>
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">
</head>
<body>
<h2>Welcome to the web site: this is a heading inside of the heading tags.</h2>
<p>This is a paragraph of text inside the paragraph HTML tags. We can just keep writing ...
</p>
<h3>Now we have an image:</h3>
<div><img src="/images/plantTracing.gif" alt="Graphic of a Mouse Pad"></div>
<h3>
This is another heading inside of another set of headings tags; this time the tag is an 'h3' instead of an 'h2' , that means it is a less important heading.
</h3>
<h4>Yet another heading - right after this we have an HTML list:</h4>
<ol>
<li>First item in the list</li>
<li> Second item in the list</li>
<li>Third item in the list</li>
</ol>
<p>You will notice in the above HTML list, the HTML automatically creates the numbers in the list.</p>
<h3>About the list tags</h3>
</body>
</html>
Currently doing this
getUrls(string of HTML received)
It only returns {https://github.com/}
I want to return {https://github.com/, /modules/example.md}
The get-urls package requires the URL to either start with a scheme such as http:// or to start with a known top-level domain.
In fact, the doc even contains this Require URLs to have a scheme or leading www. to be considered an URL.
Since you're looking for relative paths that have neither of those, that package will not do what you want.
You will probably benefit best from an actual HTML parser such as cheerio which find the HTML attribute based URLs based on HTML context, not on just text matching tricks as that will find all the paths that are relative URLs.

render html from string without affecting formatting [duplicate]

Is there any way to setup Firefox and Chrome to work with escape=false attribute in h:outputText tag. When there is some html that needs to be shown in the browser, Firefox and Chrome show parsed string correctly, but any other links in application are freezed (??).
The example html from db:
<HEAD>
<BASE href="http://"><META content="text/html; charset=utf-8" http-equiv=Content-Type>
<LINK rel=stylesheet type=text/css href=""><META name=GENERATOR content="MSHTML 9.00.8112.16434">
</HEAD>
<BODY><FONT color=#000000 size=2 face="Segoe UI">läuft nicht</FONT></BODY>
Parsed HTML on the page:
läuft nicht
What is very weird, is that in IE everything works (usually it is opposite).
I use primefaces components (v2.2), .xhtml, tomcat 7 and JSF 2.0
You end up with syntactically invalid HTML this way:
<html>
<head></head>
<body>
<head></head>
<body>...</body>
</body>
</html>
This is not right. There can be only one <head> and <body>. The browsers will behave unspecified. You need to remove the entire <head> and the wrapping <body> from that HTML so that you end up with only
<FONT color=#000000 size=2 face="Segoe UI">läuft nicht</FONT>
You'd need to either update the DB to remove unnecessary HTML, or to use Jsoup to parse this piece out on a per-request basis something like as follows:
String bodyContent = Jsoup.parse(htmlFromDB).body().html();
// ...
Alternatively, you could also display it inside a HTML <iframe> instead with help of a servlet. E.g.
<iframe src="htmlFromDBServlet?id=123"></iframe>
Unrelated to the concrete problem:
Storing HTML in a DB is a terrible design.
If the HTML originates from user-controlled input, you've a huge XSS attack hole this way.
The <font> tag is deprecated since 1998.
It seems to me that you're trying to do something that JSF was not really meant to do. Rather than try to insert HTML in your web page, you ought to try having the links already on your page and modifying the "rendered" attribute through an AJAX call.

Code in Header to allow Subtome Button in Browser to find feed

By using the following code in header, I have managed for the rss button in url bar in browser to successful find my feed.
<link rel="alternate" type="application/rss+xml" title="Blog Title" href="http://www.weeblysite.com/1/feed" />
However, the subtome button on browser fails to locate feed. Therefore Feedly, Digg Reader, etc, finds no feed.
How can I implement code in my Weebly generated site to allow all such browser buttons to pick up and subscribe to feed?
Thank you.
Nicholas Boyd Crutchley
http://www.nicholasboydcrutchley.com/infin-story
this is a new blog.. no post...but the old blog has same problem..
The problem comes from the fact that your page has a broken discovery mechanism.
Right now in the <head> section of your HTML page, I can see this:
<link rel='alternate' type='application/rss+xml' title='RSS 2.0' href='http:////feed' />
And clearly, http:////feed is not the right feed url :) You want to have this:
<link rel='alternate' type='application/rss+xml' title='RSS 2.0' href='http://www.nicholasboydcrutchley.com/1/feed' />
And everything should be smooth!

Should Partial Views contain <head> element

I encounter in some web app that some partial view that is used has head element (it loads some Jquery things).
The thing is that with that and the _layout.xml I get this wierd HTML page structure
<head>
...
</head>
<body>
...
</body>
<head>
...
</head>
<body>
....
</body>
doesn't feel right..
What's the best practice to load some .css.js to particular page? is it all done by _layout.xml and bundles?
and in general - only _layout.xml should contain head element? no other view in my solution?
You want only one head. Use layout with sections and add MVC sections in normal pages to add CSS or JScript. See here on basic section usage http://weblogs.asp.net/scottgu/archive/2010/12/30/asp-net-mvc-3-layouts-and-sections-with-razor.aspx. If you want to use partial create a helper to render section from partial see this answer Using sections in Editor/Display templates

Resources