add own text inside nested braces + exception - python-3.x

Original question locates here, current question is desire to avoid one problem.
I have this code which works perfect with html_1 data:
from pyparsing import nestedExpr, originalTextFor
html_1 = '''
<html>
<head>
<title><?php echo "title here"; ?></title>
<head>
<body>
<h1 <?php echo "class='big'" ?>>foo</h1>
</body>
</html>
'''
html_2 = '''
<html>
<head>
<title><?php echo "title here"; ?></title>
<head>
<body>
<h1 <?php echo $tpl->showStyle(); ?>>foo</h1>
</body>
</html>
'''
nested_angle_braces = nestedExpr('<', '>')
# for match in nested_angle_braces.searchString(html):
# print(match)
# nested_angle_braces_with_h1 = nested_angle_braces().addCondition(
# lambda tokens: tokens[0][0].lower() == 'h1')
nested_angle_braces_with_h1 = originalTextFor(
nested_angle_braces().addCondition(lambda tokens: tokens[0][0].lower() == 'h1')
)
nested_angle_braces_with_h1.addParseAction(lambda tokens: tokens[0] + 'MY_TEXT')
print(nested_angle_braces_with_h1.transformString(html_1))
Result of html_1 variable is:
<html>
<head>
<title><?php echo "title here"; ?></title>
<head>
<body>
<h1 <?php echo "class='big'" ?>>MY_TEXTfoo</h1>
</body>
</html>
Here is all right, all placed as expected. MY_TEXT located in right region (inside h1 tag).
But let's see result for html_2:
<html>
<head>
<title><?php echo "title here"; ?></title>
<head>
<body>
<h1 <?php echo $tpl->showStyle(); ?>MY_TEXT>foo</h1>
</body>
</html>
Now we got error, MY_TEXT placed inside h1 property area because PHP contains brace inside "$tpl->".
How I can fix it? I need get this result in that region:
<h1 <?php echo $tpl->showStyle(); ?>>MY_TEXTfoo</h1>

The solution requires that we define a special expression for PHP tags, which our simple nestedExpr gets confused by.
# define an expression for a PHP tag
php_tag = Literal('<?') + 'php' + SkipTo('?>', include=True)
We'll need more than simple strings now for the opener and closer, including a negative lookahead when matching a '<' to make sure we aren't at the leading edge of a PHP tag:
# define expressions for opener and closer, such that we don't
# accidentally interpret a PHP tag as a nested expr
opener = ~php_tag + Literal("<")
closer = Literal(">")
If opener and closer aren't simple strings, then we need to give a content expression too. Our content will be very simple to define, just PHP tags or other Words of printables, excluding '<' and '>' (you'll end up wrapping this all back up in originalTextFor anyway):
# define nested_angle_braces to potentially contain PHP tag, or
# some other printable (not including '<' or '>' chars)
nested_angle_braces = nestedExpr(opener, closer,
content=php_tag | Word(printables, excludeChars="<>"))
Now if I use nested_angle_braces.searchString to scan html_2, I get:
for tag in originalTextFor(nested_angle_braces).searchString(html_2):
print(tag)
['<html>']
['<head>']
['<title>']
['</title>']
['<head>']
['<body>']
['<h1 <?php echo $tpl->showStyle(); ?>>']
['</h1>']
['</body>']
['</html>']

Related

Split CSV File, Name Based on Contents, Save As HTML

Click here to view table
I think this is a simple task, but I'm a biologist who only knows a teeny bit of code and after several days of trying to figure this out, I'm out of ideas.
Using terminal on a Mac. I have a CSV file that I want to split into separate files by row (162 rows) and I want to name the file by the content of the first and second column (genus_species). Then I need all 162 genus_species to be saved as HTML files.
I have only attempted the "splitting" part with Ruby (recommendation from StackExchange/overflow). Below are some of my attempts. They are frankensteins of helpful-ish forums, and after each I made a little comment on why it did not work.
Example HTML
<!DOCTYPE html>
<html><head>
<meta charset="UTF-8">
<script type="text/javascript" src="http://cdn.mathjax.org/mathjax/latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML"></script></head>
<body>
<h1><em><!-- Species name --></em> - <!-- Common name --></h1>
<h2>Status</h2>
<p></p>
<h2>Info</h2>
<p></p>
<h2>Time of year this bee is seen</h2>
<p></p>
<h2>Identification</h2>
<p></p>
<h3>Similar Species</h3>
<p></p>
<h2>Flowers</h2>
<p></p>
<h2>Sociality</h2>
<p></p>
<h2>Nest</h2>
<p></p>
<div id="refs" class="references">
--<br>More information:<br> <!-- Bug Guide --></div>
</body></html>
More Info Based on Comments
Here are some lines copied from the text file:
Genus,species,Common name,Status,Info,Time of year this bee is seen,Identification,Similar Species,Flowers,Sociality,Nest,Bug Guide,Discover Life,Other,
Agapostemon,melliventris,Honey-tailed Striped-Sweat bee,Secure G5,Excavates into deep burrows in ground nests,March-December,Agapostemon males have black and yellow stripes on the abdomen. Females have a yellow band on the lower margin of the clypeus.,All other Agapostemon species,Wide variety of plants,Solitary,"Deep, underground excavation",https://bugguide.net/node/view/70932,https://www.discoverlife.org/20/q?search=Agapostemon+melliventris,https://explorer.natureserve.org/Taxon/ELEMENT_GLOBAL.2.928401/Agapostemon_melliventris,
Agapostemon,sericeus,Silky Striped Sweat Bee,Secure G5,"Not choosy about lawn, as long as flowers are present",April-October,Agapostemon males have black and yellow stripes on the abdomen. A. sericeus males have a tooth on its hind femur. Female has metallic green abdomen.,All other Agapostemon species,Wide variety of plants,Solitary,Ground-nester in loamy soils,https://bugguide.net/node/view/83023,https://www.discoverlife.org/mp/20q?search=Agapostemon+sericeus,https://www.sharpeatmanguides.com/sweat-bees,
Agapostemon,splendens,Brown-winged Striped-Sweat Bee,Secure G5,This is the most common Agapostemon found in the southeast region,April-October,Agapostemon males have black and yellow stripes on the abdomen. A. splendens have brown wings. The female abdomen is often somewhat bluish.,All other Agapostemon species,"Jacquemontia reclinata, wide variety of plants",Solitary,Ground-nester in sandy soils,https://bugguide.net/node/view/74478,https://www.discoverlife.org/mp/20q?search=Agapostemon+splendens,,
Updated code I've tried based on comments.
This worked and I think it's heading in the direction I want, but it's hard to tell in the terminal window:
f = File.new("bee_key_fact_sheet .csv")
f.each_line { |line| puts line }
Currently playing with some kind of File.write line to add here and then close?
Attempt #1
file = File.open("bee_key_fact_sheet.csv")
awk
'(NR==1){header=$0;next}
(NR%l==2) {
close(file);
file=sprintf("%s.%0.5d.csv",FILENAME,++c)
sub(/csv[.]/,"",file)
print header > file
}
{f.write}'
File.close
#AWK not recognized, asks to "display all possibilities (y/n)" I tried returning "y" and "yes" and both times it says my answer is not recognized
Attempt #2
file_data = File.read("bee_key_fact_sheet.csv").split
#This works but splits by each comma
Attempt #3
file_data = File.foreach("bee_key_fact_sheet.csv") { |line| puts line}.split
#This returned something slightly less messy than splitting by each comma but got this error message "undefined method `split' for nil:NilClass"
Attempt #4
bee_key_fact_sheet.csv.foreach('so1.csv', :headers => true, :col_sep => ",", :skip_blanks => true) do |row|
id, name = row[0], row[1]
unless (id =~ /#/)
names = name.split
end
#This returned nothing
Your example of CSV input (bee_key_fact_sheet.csv):
Genus,species,Common name,Status,Info,Time of year this bee is seen,Identification,Similar Species,Flowers,Sociality,Nest,Bug Guide,Discover Life,Other,
Agapostemon,melliventris,Honey-tailed Striped-Sweat bee,Secure G5,Excavates into deep burrows in ground nests,March-December,Agapostemon males have black and yellow stripes on the abdomen. Females have a yellow band on the lower margin of the clypeus.,All other Agapostemon species,Wide variety of plants,Solitary,"Deep, underground excavation",https://bugguide.net/node/view/70932,https://www.discoverlife.org/20/q?search=Agapostemon+melliventris,https://explorer.natureserve.org/Taxon/ELEMENT_GLOBAL.2.928401/Agapostemon_melliventris,
Agapostemon,sericeus,Silky Striped Sweat Bee,Secure G5,"Not choosy about lawn, as long as flowers are present",April-October,Agapostemon males have black and yellow stripes on the abdomen. A. sericeus males have a tooth on its hind femur. Female has metallic green abdomen.,All other Agapostemon species,Wide variety of plants,Solitary,Ground-nester in loamy soils,https://bugguide.net/node/view/83023,https://www.discoverlife.org/mp/20q?search=Agapostemon+sericeus,https://www.sharpeatmanguides.com/sweat-bees,
Agapostemon,splendens,Brown-winged Striped-Sweat Bee,Secure G5,This is the most common Agapostemon found in the southeast region,April-October,Agapostemon males have black and yellow stripes on the abdomen. A. splendens have brown wings. The female abdomen is often somewhat bluish.,All other Agapostemon species,"Jacquemontia reclinata, wide variety of plants",Solitary,Ground-nester in sandy soils,https://bugguide.net/node/view/74478,https://www.discoverlife.org/mp/20q?search=Agapostemon+splendens,,
In this CSV, all the lines (including the header) end with a comma, so the last column probably doesn't mean anything and is to be discarded.
Also, you have commas inside the data (fields with double-quotes), so you'll need a real CSV parser to read the content of the file. BTW, you're right in choosing Ruby for this task because it includes a CSV parser in its core library.
Here's one way of reading your CSV (Edit: fixed CSV#Row conversion for older Rubys):
require 'csv'
filepath = 'bee_key_fact_sheet.csv'
CSV.foreach(filepath, headers: true) do |row|
genus, species = row[0], row[1]
#data = row[0...-1] # NOTE: not sure about the Ruby version compatibility
data = row.to_hash.values[0...-1]
filename = "#{genus}_#{species}.txt".tr("\0/",'')
filecontent = " * #{data.join("\n * ")}"
puts "\n#{filename}:\n#{filecontent}"
end
About tr("\0/",''): The characters that are allowed in a filename depend on the filesystem. All the filesystems (that I know of) ban at least the NULL-byte and the slash characters, so I strip them (but you may want to strip a few more).
Question: What exactly is the expected HTML output? A table row?
Update: HTML generation
When generating content programmatically, it's fundamental to escape your data for the right format/language/context. In Ruby you can escape HTML with CGI.escapeHTML
Your example of HTML output:
<!DOCTYPE html>
<html>
<head>
<meta charset="UTF-8">
<script type="text/javascript" src="http://cdn.mathjax.org/mathjax/latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML"></script>
</head>
<body>
<h1><em><!-- Species name --></em> - <!-- Common name --></h1>
<h2>Status</h2>
<p></p>
<h2>Info</h2>
<p></p>
<h2>Time of year this bee is seen</h2>
<p></p>
<h2>Identification</h2>
<p></p>
<h3>Similar Species</h3>
<p></p>
<h2>Flowers</h2>
<p></p>
<h2>Sociality</h2>
<p></p>
<h2>Nest</h2>
<p></p>
<div id="refs" class="references">
--
<br>More information:
<br> <!-- Bug Guide -->
</div>
</body>
</html>
I'll make a few changes to the HTML:
Add a title to the page.
Remove MathJax which seams unnecessary.
Convert the <h3> tag to <h2> because you use it only for "Similar Species". Changing it also permits the use of a loop while generating the HTML.
You have 2 links in the CSV that you don't use in the HTML: "Discover Life" and "Other", don't you want to display them ? I added the code for that ;-)
OK, first, you create a function that, given a CSV row, generates the corresponding HTML. Here I use ERB templating but you can do it directly with string literals (Edit: fixed ERB#result arguments for Ruby < 2.4.0):
require 'cgi'
require 'erb'
def renderHTML row
htmlsafe = row.each_with_object({}) { |(k,v),h| h[k] = CGI.escapeHTML v if v }
template = <<-'EOF'
<!DOCTYPE html>
<html>
<head>
<meta charset="UTF-8">
<title><%= "#{htmlsafe['Genus']} #{htmlsafe['species']}" %></title>
</head>
<body>
<h1><em><%= "#{htmlsafe['Genus']} #{htmlsafe['species']}" %></em> - <%= htmlsafe['Common name'] %></h1>
<% for key in ['Status','Info','Time of year this bee is seen','Identification','Similar Species','Flowers','Sociality','Nest'] %>
<h2><%= key %></h2>
<p><%= htmlsafe[key] %></p>
<% end %>
<div id="refs" class="references">
--
<br>More information:
<% for key in ['Bug Guide', 'Discover Life', 'Other'].select{ |k| htmlsafe[k] } %>
<br><%= key %>
<% end %>
</div>
</body>
</html>
EOF
#ERB.new(template, trim_mode: "<>").result(binding) # NOTE: only for Ruby >= 2.4.0
ERB.new(template, nil, "<>").result(binding)
end
Then you can call the previous function while reading each row of your CSV file:
require 'csv'
filepath = 'bee_key_fact_sheet.csv'
CSV.foreach(filepath, headers: true) do |row|
filename = "#{row['Genus']}_#{row['species']}.html".tr("\0/",'')
html = renderHTML row
puts "\n# #{filename}\n#{html}"
#File.write(filename, html)
end
Note: I commented out the File.write line that will create the HTML files.
Can you try this? It should be reading lines of file
f = File.new("name_of_file")
f.each_line { |line| puts line }
You can later save them as new file, more on that here:
How to create a file in Ruby

python3 soup,replace html element content and save to file

how to replace text content of html tag in file and save them to another(some), file ?
Ex. there is a file index.html
<!DOCTYPE html>
<html>
<head>
</head>
<body>
<p itemprop="someprop">SOME BIG TEXT</p>
</body>
</html>
I need to replace the text "SOME BIG TEXT" in the "p" tag to "ANOTHER BIG TEXT"
from bs4 import BeautifulSoup
with open("index.html","r") as file:
fcontent=file.read()
sp=BeautifulSoup(fcontent,'lxml')
t='new_text_for_replacement'
print(sp.replace(sp.find(itemprop="someprop").text,t))
What am I doing wrong ?
Thank you
Use open() on the output file to write to it.
with open('index.html', 'r') as file:
fcontent = file.read()
sp = BeautifulSoup(fcontent, 'html.parser')
t = 'new_text_for_replacement'
# replace the paragraph using `replace_with` method
sp.find(itemprop='someprop').replace_with(t)
# open another file for writing
with open('output.html', 'w') as fp:
# write the current soup content
fp.write(sp.prettify())
If you want to replace just the inner content of the paragraph instead of the paragraph element itself, you can set the .string property.
sp.find(itemprop='someprop').string = t
The problem relies upon on the way you are searching for the criteria try changing the following code:
print(sp.replace(sp.find(itemprop="someprop").text,t))
to this:
print(sp.replace(sp.find({"itemprop":"someprop"}).text,t))
hopefully, this helps
(PS: based of your questionI'm assuming that you only have one thing to replace)

Including "Yield" in Application Helper in Ruby on Rails

In Michael Hartl's Rails tutorial he suggests you create a "Full Title" helper as below:
module ApplicationHelper
# Returns the full title on a per-page basis.
def full_title(page_title = '')
base_title = "Ruby on Rails Tutorial Sample App"
if page_title.empty?
base_title
else
page_title + " | " + base_title
end
end
end
The following is then added to the application.html.erb file:
<title><%= full_title(yield(:page_title)) %></title>
The above is not human readable and is difficult to parse. This would be much easier to understand and would encapsulate the full logic for generating titles within the helper. Why not move the yield into the helper and use something like this:
<title><%= full_title(:page_title) %></title>
Is there a Ruby/Rails convention against placing "yield" within a helper?
there is no convention yet for this, but you can still improve this a little
#application.html.erb
<head>
<title>Ruby on Rails Tutorial Sample App<%= yield :title %></title>
</head>
#application_helper.rb
def title(title)
content_for(:title) { " | #{title}" }
end
#Any page
<% title "My title" %>
#or a translation
<% title t("titles.my_title") %>

Groovy XmlSlurper get value out of NodeChildren

I'm parsing HTML and trying to get full / not parsed value out of one particular node.
HTML example:
<html>
<body>
<div>Hello <br> World <br> !</div>
<div><object width="420" height="315"></object></div>
</body>
</html>
Code:
def tagsoupParser = new org.ccil.cowan.tagsoup.Parser()
def slurper = new XmlSlurper(tagsoupParser)
def htmlParsed = slurper.parseText(stringToParse)
println htmlParsed.body.div[0]
However it returns only text in case of first node and I get empty string for the second node. Question: how can I retrieve value of the first node such that I get:
Hello <br> World <br> !
This is what I used to get the content from the first div tag (omitting xml declaration and namespaces).
Groovy
#Grab('org.ccil.cowan.tagsoup:tagsoup:1.2.1')
import org.ccil.cowan.tagsoup.Parser
import groovy.xml.*
def html = """<html>
<body>
<div>Hello <br> World <br> !</div>
<div><object width="420" height="315"></object></div>
</body>
</html>"""
def parser = new Parser()
parser.setFeature('http://xml.org/sax/features/namespaces',false)
def root = new XmlSlurper(parser).parseText(html)
println new StreamingMarkupBuilder().bindNode(root.body.div[0]).toString()
Gives
<div>Hello <br clear='none'></br> World <br clear='none'></br> !</div>
N.B. Unless I'm mistaken, Tagsoup is adding the closing tags. If you literally want Hello <br> World <br> !, you might have to use a different library (maybe regex?).
I know it's including the div element in the output... is this a problem?

How to disable up/down pan in UIWebView?

I want to have my web view pannable left and right but not up and down.
Is that possible?
Thanks.
ok, well wrap your one line of html like this:
<html>
<head>
<meta name = "viewport" content = "height = device-height, user-scalable = no, width = WIDTH">
</head>
<body>
...YOUR HTML
</body>
</html>
Replace width with the width of your content, and see how it works.

Resources