I am scraping a website and occasionally a page gets thrown up that appears completely empty in the browser. I would like to write an exception for this, but I cannot find a way to check with BeautifulSoup if this is true. I know this seems like an easy problem, but I am a beginner.
The structure of the offending HTML page is as follows:
<!DOCTYPE doctype html>
<html lang="en">
<head>
...
</head>
<body class="something" data-theme="something">
<csn-mobi>
</csn-mobi>
<div id="bitracking" style="display:none">
</div>
<script>
...
</script>
<script>
...
</script>
</body>
</html>
In other words, the <body> has a whole bunch of <script> tags and when I do something like:
BrowserText = soup.body.text
then the problem is that all the script inside the <script> tags gets included as well.
But I want to check if there is anything outside the <script> tags (but within the <body> tag), and if there is nothing, raise an exception.
Any ideas on how to do this?
Related
Using ufront and erazor I ran into the following problem very quickly.
The hello-world example provides the following layout:
<!DOCTYPE html>
<html lang="en">
<head>
<title>#title</title>
<meta charset="utf-8" />
<link rel="stylesheet" href="//maxcdn.bootstrapcdn.com/bootstrap/3.3.1/css/bootstrap.min.css" />
</head>
<body>
<div class="container">
#viewContent
</div>
</body>
<script src="//code.jquery.com/jquery-1.11.1.min.js"></script>
<script src="//maxcdn.bootstrapcdn.com/bootstrap/3.3.1/js/bootstrap.min.js"</script>
</html>
For certain pages I want to add more headers or scripts after Jquery has been loaded.
One way to do so (for the scripts for example), would be to pass the scripts as an array of strings, and construct them on the layout file :
...
<script src="//maxcdn.bootstrapcdn.com/bootstrap/3.3.1/js/bootstrap.min.js"</script>
#for(script in scripts) {
<script src='#script.path'></script>
}
</html>
....
The problem with this approach is that I can't keep meaningful headers + body + scripts on the same template file witch would be great, also needs extra care to pass the scripts and headers as context.
Some template engines like Razor or Laravel allow to do that using 'sections'.
Is it possible to do something similar with erazor? If not what would be a good alternative?
I'm looking for alternatives to Jade templates in express 4.x because I really don't like Jade's syntax. I'm tending towards EJS, because it's basically just HTML on steroids.
However, one really nice feature of Jade templates is the ability to use layouts. I've found https://www.npmjs.org/package/express-ejs-layouts, but it seems to be made for express 3 and its build is failing :/.
I also found https://www.npmjs.org/package/ejs-mate which is made for express 4.x but it only seems to support a single content block (body).
I would like to have something like this:
layout.something:
<html>
<head>
<% block styles %>
<% block scripts %>
</head>
<body>
<% block body %>
</body>
</html>
index.html:
uses layout "layout.somehing"
scripts:
<script src="my_custom_script.js"></script>
styles:
<link rel="stylesheet ...></link>
body:
<h1>This is my body!</h1>
So that this yields:
<html>
<head>
<link rel="stylesheet ...></link>
<script src="my_custom_script.js"></script>
</head>
<body>
<h1>This is my body!</h1>
</body>
</html>
Does anyone know an engine that is capable of that besides Jade?
You can try express-handlebars, it supports layout and partial views.
I am new to meteor and I am trying to do multi-page application where http://www.mydomain.com/page1 will result a totally different page from http://www.mydomain.com/page2.
By totally different I mean that I don't want the page to be rendered by the client.
I tried to use meteor-router but What I got is something like:
<!DOCTYPE html>
<html>
<head>
<link rel="stylesheet" href="/myapp.css?9297ad4aa173c4e0c19aebd27c62a5c43242bb93">
<script type="text/javascript">
__meteor_runtime_config__ = {"ROOT_URL":"http://localhost:3000","serverId":"iMp4kgzzeqDtktJoY"};
</script>
<script type="text/javascript" src="/packages/underscore/underscore.js?6d71e6711255f43c0de8090f2c8b9f60534a699b"></script>
<script type="text/javascript" src="/packages/meteor/client_environment.js?07a7cfbe7a2389cf9855c7db833f12202a656c6b"></script>
<script type="text/javascript" src="/packages/meteor/helpers.js?2968aa157e0a16667da224b8aa48edb17fbccf7c"></script>
...
...MANY MANY MANY SCRIPTS.... ?!?
...
...
<script type="text/javascript" src="/myapp.js?2866dcdb5c828b129cdd3b2498a4bf65da9ea43f"></script>
<title>myapp</title>
</head>
<body>
</body>
</html>
And this is not what I want. I want page1 route will return me:
<!DOCTYPE html>
<html>
<head>
My meta tags
</head>
<body>
page1
</body>
</html>
And I want page2 to return different meta tags with different content.
In order to be clear, lets assume that my clients sometime doesn't have javascript. I don't asking about whether meteor is the right framework! I am asking only if can I do this with meteor.
Meteor works a bit different compared to the traditional LAMP stack. Basically it works by patching out the DOM to only where the changes are needed as opposed to re-downloading the whole web page. It makes for a very satisfying end user experience on modern web browsers.
To use meteor router you need to find a spot that you want to patch out with new data for different pages with {{renderPage}}. You can use something like
<head>
<title>xx</title>
</head>
<body>
{{renderPage}}
</body>
<template name="page1">
<h2>Hello!</h2>
</template>
<template name="page2">
<h2>Ola!</h2>
</template>
Now you need to define a router in your client side javascript:
Meteor.Router.add({
'/page1': 'page1',
'/page2': 'page2'
});
So if you load /page1 you would see Hello! and if you load /page2 you would see Ola! as defined in the <template name="page2">..</template>
With the meta tags you need to use javascript to create them. With something like
$('head').append("<meta...");
Again this depends on your preference, personally I find these type of apps load ridiculously fast between web pages as compared to other 'thin' based websites. (Have a look at meteor.com to see how fast you can swap between the pages). The browser does need javascript, however.
Of note is in production mode there will only be 1 script tag.
I'm new to Haxe, and I'm trying to experiment with Ufront.
I got a problem using Erazor templates: I don't understand how to escape HTML when outputting variables.
With this simple template:
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"
"http://www.w3.org/TR/html4/loose.dtd">
<html>
<head>
<title>Users list</title>
</head>
<body>
<ul>
#for(user in users)
{
<li>#user.name</li>
}
</ul>
</body>
</html>
If any of the users has name '<script>', then the template will simply output <script> for its name.
How can I properly HTML escape in Erazor?
How to HTML escape view arguments
In order to HTML escape an argument in your Erazor views,
you could simply use the HTML helper method encode().
Supposing your argument is called pageContent and its value is:
<script>
alert("BAD things could happens if you don't properly escape!!");
</script>
You can escape it using following code:
#Html.encode(pageContent)
Your template will be safely rendered as
<script>
alert("BAD things could happens if you don't properly escape!!");
</script>
Html.encode() internally uses StringTools.htmlEscape() in order to escape its argument.
Thanks to the kindly help of Franco, I've written a page on the Ufront site to explain how to HTML escape in Ufront.
Ufron automatically includes the helper class that contains the desired method:
<li>#Html.encode(user.name)</li>
I am currently playing around with JSFs ClientBehavior API.
I want to create a client behavior that uses jQuery. Besides inclusion of the *.js files for jQuery another script will be required in the <head> section to bootstrap all the jQuery stuff i.e. create client side widgests.
I tried to follow this approach from victor herrera, but the component system event is never processed. I guess this is because ClientBehaviors do not inherit from UIComponent.
So my question is how to add dynamically created JS to the <head> of the rendered document.
This is what the rendered output in the end should look like:
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<script type="text/javascript">
$(document).ready(function () {
// Dynamically created stuff here
}
</script>
</head>
<body>
...
<input type="text" id="myJSFInputWithClientBehavior" onclick="doSomeStuffWithjQuery()" />
</body>
</html>