Does Watir recognize META HTML element? - watir

Hi for SEO purpose we need to access META tags on the html pages. But Watir does not to support META tag. Is there any other way to access not supported HTML tags?
Any help appreciated.
Hi I found the way to access the elemnts by using getElementsTagByName
b.document.getElementsByTagName('meta')[1].content
test.rb:14:in `[]': can't convert Fixnum into String (TypeError)
Can you help with this?

If you're using 'regular' Watir, you can find the meta tags in the html source.
It will depend on which browser you're using, but it will be something like this:
#browser.html #Internet explorer
#browser.xml #celerity
#browser.html #Firefox
You could then use HPricot or an XML parser to get the meta tags
require 'watir'
require 'hpricot'
$b=IE.start('http://www.google.com')
doc=Hpricot($b.html)
(doc/"meta").each { | tag | puts tag }
This works for me in IE and Firefox.

Looks like watir-webdriver can access meta tags:
require "watir-webdriver"
#=> true
browser = Watir::Browser.start "google.com"
browser.meta(:index => 0).html
#=> "<meta xmlns=\"http://www.w3.org/1999/xhtml\" content=\"text/html; charset=UTF-8\" http-equiv=\"content-type\" />"

Looks like Watir can not access meta tag: meta tag not implemented. You can vote for the ticket, or comment it.

Related

I want to extract the url from the <a #href= '#' onclick="redirectpage(2);return false" >...</a>

I'm using scrapy and passing SplashRequest, I want to extract the url from the #href as usual, but when I inspect the href to get the actual url, it is not assigned the url I'm looking for, but instead I see '#', then when I hover the mouse on that '#' I can see the url I'm looking for.
How can I get that url then follow it using SplashRequest ?
the HTML code is shown below:-
<a #href= '#' onclick="redirectpage(2);return false" >Page 120</a>
When I hover the mouse on #href I see the url I'm looking for as shown below :=
https://example.com/page/120
To get href/url attribute :
//div[#class='---']/a/#href
I believe this is efficient for any page
For getting the URL, you should use some of the dynamic data fetching methods,
Click the particular URL and view the Url in response.
If the content not available in the page source, then its loading dynamically via some scripts.
we should handle things that way.

Insert page break when using "markdown-pdf" nodejs module?

I'm using the node.js module "Markdown-PDF" (https://www.npmjs.com/package/markdown-pdf, version 9.0) to covert markdown to PDF, and I need to add a few page breaks to clean up the presentation in the PDF output.
I tried all the recommendations I could find on this and other forums, including inline HTML tags such as:
<div style="page-break-after: always;"></div>
And some CSS hacks, like applying page breaks to all div tags (as described here: http://forums.apricitysoftware.com/t/include-pdf-pagebreak-instructions-in-markdown/152). None of these are working, all tags in the markdown (source) document appear in the PDF (output) document un-rendered.
Expected (ideal) behavior would be to add the page breaks to the markdown files, and have the PDF reflect the desired changes. Something like this, within my markdown files:
markdown text
markdown text
markdown text
[page break command]
markdown text
markdown text
markdown text
Thanks in advance for any assistance or suggestions that anyone can provide!
Got an assist from a friend and figured this out. Markdown-pdf uses HTML5Boilerplate, so you can edit the index.html file, found here on my system:
/usr/local/lib/node_modules/markdown-pdf/html5bp/index.html
I added the CSS described here: http://forums.apricitysoftware.com/t/include-pdf-pagebreak-instructions-in-markdown/152
And it worked. Was able to include the HTML tags described in the post and force page breaks. Success!
The styled div tag you mentioned only works if the html parameter of remarkable object is set to true in options parameter:
var markdownpdf = require("markdown-pdf")
, fs = require("fs")
let options = {
paperFormat: "A4",
paperOrientation: "landscape",
remarkable: {
html: true
}
}
fs.createReadStream("teste.md")
.pipe( markdownpdf( options))
.pipe( fs.createWriteStream("document.pdf"))
In order to use a md marker (instead of using html div tag), I guess you should use preProcessMd and change a specific pattern to the styled div tag.
Hope I could offer some help!

Get url of a taxonomy field inside an alternate shape cshtml file in Orchard

I use Orchard 1.10.1. I have a CustomContentType that has a "Group" Taxonomy field . In Content-CustomContentType.Detail.cshtml alternate, I want to have a link to a certain taxonomy term. this is my code:
<a href='???'>#Model.ContentItem.CustomContentType.Group.Terms[0].Name</a>
How can I get the url to replace this '???' in above code.
Thanks in advance.
You have a few options available to you. I've just typed them all straight into the browser and Orchard is a tricky beast to navigate its model so if they blow up in your face let me know and I'll dig a bit deeper :)
Let orchard make the entire link
Looking at the TaxonomyItemLink.cshtml file you can see that you can display the link like this:
#using Orchard.Taxonomies.Models
#{
TermPart part = Model.ContentPart;
}
#Html.ItemDisplayLink(part)
So in your case you could use:
#Html.ItemDisplayLink((ContentPart)Model.ContentItem.CustomContentType.Group.Terms[0])
Just get the URL
You can use #Url.ItemDisplayUrl() to turn a routable content item into a url.
<a href="#Url.ItemDisplayUrl((ContentPart)Model.ContentItem.CustomContentType.Group.Terms[0])">
#Model.ContentItem.CustomContentType.Group.Terms[0].Name
</a>
Because it's an extension method you can't pass a dynamic so you will need to cast the type. This is why the (ContentPart) is there.
Just get the Slug
Actually, in this case the TermsPart class already has a .Slug property on it, so this might also work:
<a href="#Model.ContentItem.CustomContentType.Group.Terms[0].Slug">
#Model.ContentItem.CustomContentType.Group.Terms[0].Name
</a>
I'm not sure if the slug just contains the end bit or its full url though.

Capybara can not match xml page

I have problem with matching response text on xml page on capybara.
When I use page.should(have_content(arg1)) capybara raises error that there is no \html element (there shouldn't be as it's xml).
When I use page.should(have_xpath(arg1)) it raises Element at 40 no longer present in the DOM (Capybara::Webkit::NodeNotAttachedError)
What is the correct way to test xml ?
When using capybara-webkit, the driver will try to use a browser's HTML DOM to look for elements. That doesn't work, because you don't have an HTML document.
One workaround is to fall back to Capybara's string implementation:
xml = Capybara.string(page.body)
expect(xml).to have_xpath(arg1)
expect(xml).to have_content(arg1)
Assuming your page returns a content type of text/xml, capybara-webkit won't mess with the response body at all, so you can pass it through to Capybara.string (or directly to Nokogiri if you like).

Custom tag or custom attributes

I would like to know the possibility to develop custom html tags or custom html attributes to node.js , rather in jade, html or another html template enginer. I was looking at PhantomJS and I don't realize any example that accomplish it, either Cheerio as well. My goal is to make some components to easily usage in any kind of popular html engines. Any direction will be very helpful. Thanks!
Node.js is just a webserver, You need something to parse the custom tags, so its either the template engine that will convert it to valid html, or client side with JavaScript (aka AngularJS directives)
You can write your own filter similar to the example
body
:markdown
Woah! jade _and_ markdown, very **cool**
we can even link to [stuff](http://google.com)
That would give you
<body>
<p>Woah! jade <em>and</em> markdown, very <strong>cool</strong> we can even
link to stuff
</p>
</body>

Resources