How to load wikipedia.page().content when the page title has parentheses

How to load wikipedia.page().content when the page title has parentheses - python-3.x

I would like to load the content of a number of related pages for keyword analysis. But when the page title contains parentheses it strips the parentheses and subsequently gives an error. How would I go about loading the content of pages that contain parentheses in the page title? e.g.
https://en.wikipedia.org/wiki/Oil_pump_(internal_combustion_engine)
import wikipedia
automotivedata = wikipedia.page("Oil_pump_(internal_combustion_engine)").content
PageError: Page id "oil_pump_ internal combustion engine" does not match any pages. Try another id!

Just ignore the brackets. Something like this:
print(wikipedia.page("Oil_pump_internal_combustion_engine").content)
September 23, 2019: Issue opened on GitHub (https://github.com/goldsmith/Wikipedia/issues/214)

Related

Display pagination number on page title

Currently, I have an issue about display pagination title in Shopware, template use Twig, load pagination pages by ajax, multi languages use snippet.
Detail:
The pagination title that need to be displayed: Page "X". X is page number.
The site used multi languages site, ex: english (Page "X"), german (Seite "X") ...
For default url (ex: abc.com/category-name) or page = 1 (ex: abc.com/category-name/?p=1): Not display pagination title.
For other pages (page 2, 3, 4): Display Page 2 ...
Page items will be loaded by use ajax when click the page number.
So, I don't know what to do display the pagination title on page title with multi languages.
Can everyone help me to resolve this issue?
Thank everyone.

You can hook into this method:
ListingPaginationPlugin.onChangePage (see the source code in vendor/shopware/storefront/Resources/app/storefront/src/plugin/listing/listing-pagination.plugin.js)
And after calling the parent method, insert - for a proof of concept - code like this:
document.title = event.target.value;
This would simply show the page number in the title (but losing the original title)
I suggest you back-up the original title and just append the "Page X" / "Seite X" information to it according to your necessary logic.
Now you need the translated word for "Page" available in the Javascript code.
You could attach this as a data-attribute to the title tag in the according twig template and use the normal `|trans' filter. I am not sure if there is a better way to have translations available in Javascript code in Shopware, so I asked.

Incomplete html attribute when using rvest

I'm using rvest to scrape from https://www.psychologytoday.com/ca/therapists/m5g ; in particular what I'm after is the data-myurl html attribute in the div tag with id="results-page" . If you view source you'll see there's only one div with id="results-page" . The data-myurl attribute looks like the main URL except with the addition of a string of numbers separated by a period and underscore, like so
<div id="results-page" data-myurl="https://www.psychologytoday.com/ca/therapists/m5g?sid=1510588046.3852_2969">
The numbers you see will likely be different. To try and extract it, I use the following code:
require(rvest)
fsa <- read_html('https://www.psychologytoday.com/ca/therapists/m5g')
fsa %>% html_node('div #results-page') %>% html_attr("data-myurl")
However, this returns only
[1] "https://www.psychologytoday.com/ca/therapists/m5g"
So everything after the original URL is missing. It doesn't seem like a JS thing since I don't see any script tags when I view source. Does anyone know what these numbers in the URL actually are and how to extract them? Thanks!

You can't do this with rvest.
The page you're trying to scrape is dynamically rendered after loading the initial page. The content itself is always the same, but the sid numbers change the ordering of the results after loading the page. The sid changes on every visit and page reload.
I suspect this was done to avoid a market bias when searching for a therapist.
If you really want the sid number, you need to use a tool that handles dynamic pages like casperjs.
(http://casperjs.org/)
Edit:
Alternatively, if it has to be done in R then you can use RSelenium. (https://cran.r-project.org/web/packages/RSelenium/)
The relevant starting point would be here:
https://cran.r-project.org/web/packages/RSelenium/vignettes/RSelenium-headless.html

Remove teaser from pages

I'm making a website with Hakyll. I successfully created a RSS feed showing showing for each post the teaser section delimited by <!--more-->.
My problem is that this teaser section is shown in the full (templated) pages of these posts. And I would like only that is after <!--more--> and not before.
---
title: My Post
author: JeanJouX
date: 2016-09-06
tags: Haskell
---
The teaser of my post to be shown in the RSS feed. Not in the full page.
<!--more-->
The rest of the post to be shown in the full page of my website
Is it possible to do that with Hakyll?

I don't believe there is a method to do that built into Hakyll.
As I see it you have two options:
write a pass that strips the teaser from the document before rendering it on its own page
keep the teaser in the actual page, but use CSS to hide it
The first option is probably better, but requires mussing about with string manipulation and Hakyll compilers. If you want a place to start, take a look at the implementation of teaserFieldWithSeparator which uses the needlePrefix function from Hakyll.Core.Util.String to extract the teaser from the document body. You'll have to do the opposite: extract everything but the teaser.
If you do take this approach, you could contribute it back into Hakyll, saving the effort for people who want to do the same thing in the future.
The other option is hackier but easier. You can wrap all your teasers in a div with some CSS class:
<div class="teaser">
Some text.
</div>
<!--more-->
Then, in your page template, add a CSS rule that hides the teaser paragraph:
.teaser {
display : none;
}
The text is still in the page's HTML so this is not an ideal solution, but you can make it work without needing to write any Hakyll code.

Maybe it could be easier if you just put this teaser text in a separate metadata field? Like
---
title: My Post
author: JeanJouX
date: 2016-09-06
tags: Haskell
description: The teaser of my post to be shown in the RSS feed. Not in the full page.
---
The rest of the post to be shown in the full page of my website
Then you don't need to make that teaserField any more. You already have all you need in $description$, which you can use in rss, in html meta tags, anywhere.

How can I grab part of MediaWiki page title?

In MediaWiki 1.26.2, how can I grab part of a page title to use it in a #ifexist statement?
I need to link to different but related articles in MediaWiki, alerting the user if a page exists or do not. For that I did the following:
I have a page named ARTICLE_NAME. Associated with it is a page named Notes:ARTICLE_NAME. "Notes:" is not a namespace, but a string as ARTICLE_NAME is. I cannot create a namespace for Notes due to policy restrictions.
In ARTICLE_NAME page, the following code goes to check if the notes exists:
{{#ifexist: Notes:{{PAGENAME}} | {{alert_box}} | }
So if ARTICLE_NAME has a related Notes:ARTICLE_NAME page a I get a nice custom alert box highlighting the fact and linking to it.
My problem begins when I try the inverse. In the Notes:ARTICLE_NAME page, I need the notes to display an alert box if there is a page named ARTICLE_NAME. The code
{{#ifexist: {{PAGENAME}} | {{alert_box}} | }
Does nothing because {{PAGENAME}} brings Notes:ARTICLE_NAME as expected.
How can I get whatever comes after the "Notes:", using that instead of {{PAGENAME}} to check it with the #ifexist code?

It works by combining #pos with #ifeq.
Example:
{{#ifeq:{{#pos:text1|text2}}| |text1 does not contain text2|text1 contains text2}}

So after mucking around I found a way: remove the Notes: from the pagename then put it against the #ifexist.
{{#replace: {{PAGENAME}}|Notes:|}}
This way it replaces the Notes: with nothing. I made that a magic word named ARTICLE_TITLE, so the code that goes to notes is this:
{{#ifexist: :{{ARTICLE_TITLE}} | {{Alert_box}} | }}

MT:Entries not returning results correctly on search page

I'm building out a search results page within a blog. I've rewritten the URL so that going to:
/blog/tag/foo
will return a search results for foo.
In the template, I'd like to return a listing of all the posts that are tagged with 'foo', so I've made an MT:Entries block that starts:
<mt:Entries tag="<$mt:SearchString$>">
but it returns no results. However, placing on the page outputs 'foo' just fine.
So I tried this:
<mt:Entries tag="foo">
and it returns all results correctly that are tagged with foo. I'm not seeing a reason why the other one should work -- any ideas?

You cannot use a tag as a parameter value. You'll have to pass it via a variable, like so:
<mt:setvarblock name="q"><$mt:SearchString$></mt:setvarblock>
<mt:Entries tag="$q">

The reason why <mt:Entries tag="foo"> worked is because you are telling Movable Type to explicitly grab the entries tagged "foo". This is how you should do it in most templates, however the Search Results system template is different.
While the example Francois offers should work, it's not the intended method to get "tag search" results in the Search Results system template.
In the Search Results template, instead of the <mt:Entries> block tag use the <mt:SearchResults> block tag.
You code should look something like this:
<mt:SearchResults>
<mt:IfTagSearch>
<!-- Template tags for "tag search" results -->
</mt:IfTagSearch>
<mt:IfStraightSearch>
<!-- Template tags for "text search" results -->
</mt:IfStraightSearch>
</mt:SearchResults>
For a more detailed example, take a look at the code in the default Search Results template in the "Classic Blog" template set (which ships with Movable Type) and modify the working (and tested) code.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

How to load wikipedia.page().content when the page title has parentheses - python-3.x

Just ignore the brackets. Something like this: print(wikipedia.page("Oil_pump_internal_combustion_engine").content) September 23, 2019: Issue opened on GitHub (https://github.com/goldsmith/Wikipedia/issues/214)

Related

Display pagination number on page title

Incomplete html attribute when using rvest

Remove teaser from pages

How can I grab part of MediaWiki page title?

MT:Entries not returning results correctly on search page

Categories

Resources