I have a string of HTML with both absolute and relative URLs and I'm trying to retrieve only the relative URLs. I tried using the get-urls package but this only retrieves absolute URLs.
An example of the string of html received.
<!DOCTYPE>
<html>
<head>
<title>Our first HTML page</title>
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">
</head>
<body>
<h2>Welcome to the web site: this is a heading inside of the heading tags.</h2>
<p>This is a paragraph of text inside the paragraph HTML tags. We can just keep writing ...
</p>
<h3>Now we have an image:</h3>
<div><img src="/images/plantTracing.gif" alt="Graphic of a Mouse Pad"></div>
<h3>
This is another heading inside of another set of headings tags; this time the tag is an 'h3' instead of an 'h2' , that means it is a less important heading.
</h3>
<h4>Yet another heading - right after this we have an HTML list:</h4>
<ol>
<li>First item in the list</li>
<li> Second item in the list</li>
<li>Third item in the list</li>
</ol>
<p>You will notice in the above HTML list, the HTML automatically creates the numbers in the list.</p>
<h3>About the list tags</h3>
</body>
</html>
Currently doing this
getUrls(string of HTML received)
It only returns {https://github.com/}
I want to return {https://github.com/, /modules/example.md}
The get-urls package requires the URL to either start with a scheme such as http:// or to start with a known top-level domain.
In fact, the doc even contains this Require URLs to have a scheme or leading www. to be considered an URL.
Since you're looking for relative paths that have neither of those, that package will not do what you want.
You will probably benefit best from an actual HTML parser such as cheerio which find the HTML attribute based URLs based on HTML context, not on just text matching tricks as that will find all the paths that are relative URLs.
Related
Is there any way to setup Firefox and Chrome to work with escape=false attribute in h:outputText tag. When there is some html that needs to be shown in the browser, Firefox and Chrome show parsed string correctly, but any other links in application are freezed (??).
The example html from db:
<HEAD>
<BASE href="http://"><META content="text/html; charset=utf-8" http-equiv=Content-Type>
<LINK rel=stylesheet type=text/css href=""><META name=GENERATOR content="MSHTML 9.00.8112.16434">
</HEAD>
<BODY><FONT color=#000000 size=2 face="Segoe UI">läuft nicht</FONT></BODY>
Parsed HTML on the page:
läuft nicht
What is very weird, is that in IE everything works (usually it is opposite).
I use primefaces components (v2.2), .xhtml, tomcat 7 and JSF 2.0
You end up with syntactically invalid HTML this way:
<html>
<head></head>
<body>
<head></head>
<body>...</body>
</body>
</html>
This is not right. There can be only one <head> and <body>. The browsers will behave unspecified. You need to remove the entire <head> and the wrapping <body> from that HTML so that you end up with only
<FONT color=#000000 size=2 face="Segoe UI">läuft nicht</FONT>
You'd need to either update the DB to remove unnecessary HTML, or to use Jsoup to parse this piece out on a per-request basis something like as follows:
String bodyContent = Jsoup.parse(htmlFromDB).body().html();
// ...
Alternatively, you could also display it inside a HTML <iframe> instead with help of a servlet. E.g.
<iframe src="htmlFromDBServlet?id=123"></iframe>
Unrelated to the concrete problem:
Storing HTML in a DB is a terrible design.
If the HTML originates from user-controlled input, you've a huge XSS attack hole this way.
The <font> tag is deprecated since 1998.
It seems to me that you're trying to do something that JSF was not really meant to do. Rather than try to insert HTML in your web page, you ought to try having the links already on your page and modifying the "rendered" attribute through an AJAX call.
I'm using the linkedIn fork of Dust with Node.JS & Express.
My template hierarchy is having:
1 layout template - The base template
1 Page template - This is the template that will be rendered
Optional number of partials - Might be included by the page template
layout.dust (layout template):
<html>
<head>
<script src="/js/layout.js"></script>
<link rel="stylesheet" href="/css/layout.css">
<script src="/js/home.js"></script>
<link rel="stylesheet" href="/css/home.css">
<script src="/js/sidebar.js"></script>
<link rel="stylesheet" href="/css/sidebar.css">
<script src="/js/widget.js"></script>
<link rel="stylesheet" href="/css/widget.css">
</head>
<body>
{+content}{/content}
</body>
</html>
home.dust (page template):
{>layout/}
{<content}
<div>
{>sidebar/}
</div>
<div>
{>widget/}
</div>
{/content}
When the user visits the website homepage, then home.dust will be rendered, and the user will see a page with the sidebar and some widget. The content of sidebar.dust and widget.dust is irrelevant.
As you can see in layout.dust, there are 4 sets of JavaScript and CSS included in the head section, one for each of the templates and partials. My problem is finding a way to automatically include each asset into the layout (without hardcoding). Ideally I would like to be able to just do this:
{#scripts}
<script src="{.}"></script>
{/scripts}
Different pages may require different assets.
How can I push each script source path into the context of layout.dust?
What do other developers do, do they just hardcode them?
I'd be adding all scripts to the head of the layout without pushing any from the pages that extend from this layout. I'm not sure how knowledgeable you are on javascript minification but it's common practice to bundle all (or most) of your javascript assets into one file and serve them up to the user with a single HTTP request. This speeds up your page a lot; checkout what Google has to say about it here.
It's not hard because there are a few tools to do this for you automatically. You could go for an asset manager or Grunt.
ASSET MANAGER:
There are a few on npm. I found one called Express Asset Manager and another called Asset Pipeline.
GRUNT:
Use contrib-uglify and contrib-concat to handle you minification. There are plenty of others that you should find useful. You can do the exact same thing with all of your CSS too.
Obviously in development you don't really want to try to debug minified code so you can do something like the following:
{?production}
<script src="production-minified-script.js"></script>
{:else}
{#scripts}
<script src="{.}"></script>
{/scripts}
{/production}
where production is a variable passed to your template from process.env.NODE_ENV. To avoid manually adding in each script, you could pass them in as an array by
STILL WANT TO ADD FROM OTHER PAGES?
If you still want to add from other pages, add in a block to your head below your main scripts, something like:
{+otherScripts}{/otherScripts}
I encounter in some web app that some partial view that is used has head element (it loads some Jquery things).
The thing is that with that and the _layout.xml I get this wierd HTML page structure
<head>
...
</head>
<body>
...
</body>
<head>
...
</head>
<body>
....
</body>
doesn't feel right..
What's the best practice to load some .css.js to particular page? is it all done by _layout.xml and bundles?
and in general - only _layout.xml should contain head element? no other view in my solution?
You want only one head. Use layout with sections and add MVC sections in normal pages to add CSS or JScript. See here on basic section usage http://weblogs.asp.net/scottgu/archive/2010/12/30/asp-net-mvc-3-layouts-and-sections-with-razor.aspx. If you want to use partial create a helper to render section from partial see this answer Using sections in Editor/Display templates
How would I output the title of an entry in ExpressionEngine and display it in the browser's title bar?
Here is the content of my page's header:
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
<title>Test Site</title>
<link rel="stylesheet" href="{stylesheet=site/site_css}" type="text/css" media="screen" />
</head>
What I need is for each page to display the title of the entry in my browser's title bar — how can I achieve that?
Part of UPDATED Code:
Here is how i have done it :
{exp:channel:entries channel="news_articles" status="open|Featured Top Story|Top Story" limit="1" disable="member_data|trackbacks|pagination"}
{embed="includes/document_header" page_title=" | {title}"}
<body class="home">
<div id="layoutWrapper">
{embed="includes/masthead_navigation"}
<div id="content">
<div id="article">
<img src="{article_image}" alt="News Article Image" />
<h4>{title}</h4>
<h5><span class="by">By</span> {article_author}</h5>
<p>{entry_date format="%M %d, %Y"} -- Updated {gmt_edit_date format="%M %d, %Y"}</p>
{article_body}
{/exp:channel:entries}
</div>
What do you think?
Another relatively new way to tackle it is using the Stash add-on and a template partials approach. This method knocks you down to one embed, and has the added advantage of giving you a centralized "wrapper" template - one for each major page layout, basically. The example below assumes you've simply added custom fields to handle any entry-specific meta data you're looking to inject into the header. With this idea in mind, here's a simplified view of the basic structure I've been applying recently:
In your template you apply EE tags to determine the logic of what gets sent to the inside-wrapper
{embed="embeds/.inside-wrapper"}
{exp:channel:entries channel="channel_name" limit="1" dynamic="yes" disable="whatever|you|can|live|without"}
{!-- ENTRY SEO META DATA --}
{exp:stash:set name="entry_seo_title" scope="site"}{cf_channelprefix_seo_title}{/exp:stash:set}
{exp:stash:set name="entry_seo_description" scope="site"}{cf_channelprefix_seo_description}{/exp:stash:set}
{exp:stash:set name="entry_seo_keywords" scope="site"}{cf_channelprefix_seo_keywords}{/exp:stash:set}
{!-- ENTRY/PAGE CONTENT --}
{exp:stash:set name="entry_body_content" parse_tags="yes" parse_conditionals="yes" scope="site"}
Your page content here
{/exp:stash:set}
{/exp:channel:entries}
And then in your wrapper template, which would ultimately contain all your wrapping HTML but could be chunked into snippets. for something like the header since it would be shared with other wrapper templates, for example:
<html>
<head>
<title>{exp:stash:get name="entry_seo_title"}</title>
<meta name="description" content="{exp:stash:get name="entry_seo_description"}" />
<meta name="keywords" content="{exp:stash:get name="entry_seo_keywords"}" />
</head>
<body>
{exp:stash:get name="entry_body_content"}
</body>
</html>
If you want to show just the name of your ExpressionEngine site (as defined in CP Home > Admin > General Configuration) use the site name global variable:
<title>{site_name}</title>
If you want to display just the current entry title from a given channel use the following:
<title>
{exp:channel:entries channel="channel_name" limit="1" dynamic="yes"}
{title}
{/exp:weblog:entries}
</title>
Many Web Developers will use an Embed Variable with an Embedded Template to pass the `{entry_title} to a global embed template, allowing for a dynamic page title:
{embed="includes/header" title="{exp:channel:entries channel="{channel_name}"}{title}{/exp:channel:entries}"}
If you're using EE2, the SEO Lite Module takes care of all the hard work for you with a single line of code:
<html lang="en">
<head>
<meta charset="utf-8" />
{exp:seo_lite url_title="{url_title}"}
</head>
Other solutions include the Low Title Plugin (EE1, EE2).
One addition to Ryan's embed method (which is definitely the most flexile method): chances are you can wrap most of your page in an {exp:channel:entries} tag when viewing an individual entry, avoiding the additional (and expensive) channel:entries call. So it would look more like this:
{exp:channel:entries channel="channel_name" limit="1"}
{embed="includes/header" title="{title}"}
<h1>{title}</h1>
{page_content}
{embed="includes/footer"}
{if no_results}{redirect="404"}{/if}
{/exp:channel:entries}
NSM Better Meta is a more complete way to pass channel meta data to the tag.
For smaller sites, I use the String plugin.
https://devot-ee.com/add-ons/string
Very simple syntax.
I am trying to create a modal window with hidden content using thickbox
It opens the window fine , not sure whys its not showing the content inside the id="hiddencontent".
i am following as suggested in the examples for inline http://jquery.com/demo/thickbox/#
-thanks
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1" />
<title>Untitled Document</title>
<script type="text/javascript" src="jquery.js"></script>
<script type="text/javascript" src="thickbox.js"></script>
<link rel="stylesheet" href="thickbox.css" type="text/css" media="screen" />
</head>
<body>
Show Content
<div id="hiddenContent" style="display: none">inline content comes here</div>
</body>
</html>
It seems you don't have css file, you can copy thickbox css on http://jquery.com/demo/thickbox/#sectiona-3 to your page (or save as style.css file).
-- edit --
Yeh, sorry, didn't notice that css is already loaded :(
By the way, just found the solution, try to add p tag inside your hiddenContent div:
<div id="hiddenContent" style="display: none"><p>inline content comes here</p></div>
Hope helps ;)
This is a bug in thickbox. Here is how you can fix it:
Inside thickbox.js
on or about line 221 you should see this line of code:
$("#TB_ajaxContent").append($('#'+params['inlineId']).children());
change it to this:
$("#TB_ajaxContent").html($('#'+params['inlineId']).html())
and then, on or about line 223 you will see this line:
$('#'+params['inlineId']).append($("#TB_ajaxContent").children());
disable the line by adding two slashes before it like this:
//$('#'+params['inlineId']).append($("#TB_ajaxContent").children());
Explanation:
When thickbox copies the content from the hidden div into the thickbox container, it does so by copying all .children() elements. If you have only text inside your hidden div there ARE NO CHILDREN because text is not itself a child element. This is why wrapping your content in a <p> tag will work because now there is a child (the <p> tag).
So if you want to have text only in your hidden div using .html() instead will grab everything in your hidden div. The second line being disabled prevents thickbox from trying to copy the content back to the hidden div when the thickbox closes, which would cause any content within child tags to be duplicated in the hidden div.
There is no need to edit the .js file, the solution is quite simple.
Maybe a bit later :) but I overcomed the issue only changing the ? char in #TB_inline? by &
The issue is on the internal parseQuery tickbox function, that parses match pairs but it blows when the query have a double ? like in the case.
UPDATE: In some cases the <p> fix is also needed ;)
Hope it helps.
The function tb_position() needs to be updated.
this condition
if ( !(jQuery.browser.msie && jQuery.browser.version < 7))
is the reason for error.
jQuery does not support jQuery.browser anymore. For detecting IE6 in this case change the above condition to this
if ( !(/\bMSIE 6/.test(navigator.userAgent)))