Read html table with pandas in python - python-3.x

I'm scraping tables from this website http://www.nowgoal.com/analysis/1514180.html
(in case you click and the match is already gone, from the website Nowgoal you can get any other match with the same structure.
This page has several tables, and so far my code, which worked, was as follows:
name='Head to Head Statistics'
tabla=WebDriverWait(browser,10).until(
EC.presence_of_element_located((By.XPATH,'''//tbody[descendant::*
[contains(text(),"{}")]]'''.format(name))))
tablas=pd.read_html(tabla.get_attribute('outerHTML'),header=0,skiprows=(0,2))[0]
Where I already optimized it adding header and skipping rows. The problem is, when i try the table with name='Live Odds Comparison' I get the following error when trying to read html with pandas
>>ValueError: No tables found
I have debugged a little and the outerHTML attribute works fine and prints what it should, but neither that nor using innerHTML returns the table as it should and as it does with the others. What is happening?

Apparently, though I still don't understand why with this table doesn't work the usual approach, what works is looking for a parental node of the one I was searching. So, for it to work, the only thing that should be changed is
tabla=WebDriverWait(browser,10).until(
EC.presence_of_element_located((By.XPATH,'''//div[contains(#id,"liveCompareDiv") and descendant::*
[contains(text(),"{}")]]'''.format(name))))
This way you find the table I was looking for. My guess is somehow the internal code of tbody node differs from one table to another and that creates the wild behaviour.
Edit because now I know why, I think. After tbody in the tables that worked there are two tr nodes, one containing the title and another containing the data. In this table though there still is the tr node containing the title, the second containing the data is not, and the rows are all uncontained.

Related

Find and save a Specific String Until a semicolon

I have a large dataset (~25GB) and I would like to retrieve the data following 8 specific modifiers. For example, if I have the "AC_afr" tag I searched for, I would also like to keep its data "AC_afr=8855525;". I need a way to search for a tag, and then keep everything after that tag until the semicolon.
I would normally open it up and Excel, but the data is much too large.
I have looked online for grep options, but could not find a solution.
Example of the data:
AC_afr=0;AN_afr=8250;non_neuro_AN_eas_kor=2404;non_neuro_AF_eas_kor=0.00000e+00;non_neuro_nhomalt_eas_kor=0;non_cancer_AF_nfe_seu=0.00000e+00;non_cancer_nhomalt_nfe_seu=0;

Scrapy not extracting data from a certain xpath

I'm trying to extract some data from an amazon product page.
What I'm looking for is getting the images from the products. For example:
https://www.amazon.com/gp/product/B072L7PVNQ?pf_rd_p=1581d9f4-062f-453c-b69e-0f3e00ba2652&pf_rd_r=48QP07X56PTH002QVCPM&th=1&psc=1
By using the XPath
//script[contains(., "ImageBlockATF")]/text()
I get the part of the source code that contains the urls, but 2 options pop up in the chrome XPath helper.
By trying things out with XPaths I ended up using this:
//*[contains(#type, "text/javascript") and contains(.,"ImageBlockATF") and not(contains(.,"jQuery"))]
Which gives me exclusively the data I need.
The problem that I'm having is that, for certain products ( it can happen within 2 pairs of different shoes) sometimes I can extract the data and other times nothing comes out. I extract by doing:
imagenesString = response.xpath('//*[contains(#type, "text/javascript") and contains(.,"ImageBlockATF") and not(contains(.,"jQuery"))]').extract()
If I use the chrome xpath helper, the data always appears with the xpath above but in the program itself sometimes it appears, sometimes not. I know sometimes the script that the console reads is different than the one that appears on the site but I'm struggling with this one, because sometimes it works, sometimes it does not. Any ideas on what could be going on?
I think I found your problem: Its a captcha.
Follow these steps to reproduce:
1. run scrapy shell
scrapy shell https://www.amazon.com/gp/product/B072L7PVNQ?pf_rd_p=1581d9f4-062f-453c-b69e-0f3e00ba2652&pf_rd_r=48QP07X56PTH002QVCPM&th=1&psc=1
2. view response like scrapy
view(respone)
When executing this I sometimes got a captcha.
Hope this points you in the right direction.
Cheers

Find Elements with dynamic id's and Xpaths's

I apologize if this is a duplicate of a question already but nothing I read seemed to do the trick.
I am trying to automate the process of adding my hours for my job. This entails using selenium to mimic the process I do to enter the hours for me.
The problem is, as I navigate through the process, I have run into an instance where one of the elements has a dynamic id and xpath (any maybe other things. I am not very proficient in HTML).
I need to select the "Day" button on the "View" drop down. The highlighted HTML corresponds to that button. I have already checked and both the ID and Xpath change every time I create a new session. I usually do the following to find my elements:
elem = driver.find_element_by_xpath('xpath')
Below is the xpath I currently see:
//*[#id="ab5378a9418345a2a57ad12f066127a6"]
To further complicate things, the xpath for the "Week" selection is the following:
//*[#id="741015164c5547fbb5403c03c46636d3"]
I tried to figure out how to use "contains" with the xpath but even so, the two are not different enough to differentiate by using "#id". The only constant thing and difference I see each time is that the
data-automation-label="Day"
is present on the day element and
data-automation-label="Week"
is present on the week element.
Does anyone have any experience finding the elements when a problem like such occurs? I am working in Python3.6 on a windows 7 computer.
Again, I apologize if this is a duplicate but I tried very hard to find an answer before coming here for help.
Thanks in advance!
You can use two of the below possible selectors
XPATH
//div[#data-automation-label="Day"]
CSS
div[data-automation-label="Day"]
When you use identifier your main focus should be how to find something that is unique to that object. And it really doesn't matter if it is name or id or what not. Use what you think would work the best. And here data-automation-label implies that itself

Asciidoctor nested table

I am trying to create nested tables in my Asciidoctor pdf output but I cannot find the syntax.
If I understand it right, nested tables should be supported in Asciidoctor as of 1.5.0. I am running a Docker container that has 1.5.5 (https://github.com/asciidoctor/docker-asciidoctor).
I've tried as per example in table 11 here: http://www.methods.co.nz/asciidoc/newtables.html but to no avail.
Note that Asciidoc and Asciidoctor are not the same thing.
Therefore, make sure you are looking at the correct documentation.
I have not tried it, but if a nested table is going to work, the cell containing it will have to use the asciidoc style. You will then most likely have put the table in a block and escape all the pipe symbols (using \| instead of | or using some other delimiter).
A web search turned up this open issue in the AsciiDoctor tracker requesting (improvements to) nested table support. So this seems not to be implemented yet at least in some backends. The first comment contains an example of how to specify a nested table.
Are you sure you cannot use something other than nested tables? They are usually not the most readable thing.
In order to make it work, you need to delete two unintended newlines. Here's the modified content.
[width="75%",cols="1,2a"]
|==============================================
|Normal cell |Cell with nested table
[cols="2,1"]
!==============================================
!Nested table cell 1 !Nested table cell 2
!==============================================
|==============================================
I must say I used asciidoctor-pdf first time and although the process has been streamlined as much as possible with the docker image, there is a much quicker way to get rendered feedback: Asciidoctor.js - a Chrome extension that converts your .adoc file to HTML and reloads when you save the file.
Asciidoctor.js comes from the same great team that created and maintain Asciidoctor, so it has latest Asciidoctor under the hood.

Cognos 8.3 No Data Content issues

Upgrading from 8.2 to 8.3 and testing out the new No Data Content functionality. Report looks in order if results are returned. The No Data message does not appear. However if we test the report (pass in parameters expecting no results), we are returned a blank page (pdf, html, excel output). Not even the header or footer appear on the page. And the No Data Content message does not appear as well.
We have very complex reports using Oracle SQL and in most cases the Header content is linked to a SQL statement to render output from the database as well as list the parameters passed in. The issue seems to be related to embedded data objects, i.e. we have a list object embedded within a table object. I've tried stripping out the extra layers with no success thus far.
In 8.2 we used style variables, i.e. RowNumber()=0 or RowNumber() is null to conditionally hide data objects in the body of the report. We've never used any conditions to hide or display the header or the footer and in 8.3 now this seems to be an issue.
This seemed like such a useful enhancement in 8.3 but we haven't gotten it working yet. Any thoughts or suggestions to try?
Thanks for reading this. I appreciate any advice.
Joe
We ran into this same problem when upgrading reports from 8.2 => 8.4. We reported it to Cognos as a bug -- Not sure if they've assigned a bug tracker id to it, but we got the impression it wasn't going to be fixed soon. (Obviously, if it exists in 8.3 and it has been carried forward to the next version, it's not a high priority.)
I'm sorry I don't have an answer at the moment on how to fix it, but I was planning to look into work arounds next week. I'll edit this post with any ideas I come up with.
UPDATE:
Not sure if this is an available feature of 8.3, but in 8.4 there is a new "No Data Contents" property for data containers (lists, blocks, etc.). Setting this value to yes creates two tabs at the top of the page, one for a page to be displayed if data is returned, and another for instances when no records are found. You can customize a message to be displayed using that second page. Pretty cool, actually, but buried in the documentation.
Hope that helps. If you still have problems, check out the Index topic "no data > specify what appears for a data container."
yea it appears that a blank pdf is returned... but in fact the cognos viewer bugs out at the second prompt page if there is no data. Headers and footers and items in which didnt need data to render ... as not showing up as well.
This existed in 8.2 and we were always able to do some sort of work around to get it to atleast show. Seems much more prevalent in 8.3 now.
Id like a solution on this as well! halp! >_<
Edit: seems a slight work around is to create a new report in 8.3 and copy each component starting with queries... then variables.. then objects on the page.. followed by page sets and master detail relationships. in that order for simplicity. Essentially recreating the report from scratch in 8.3 seems to fix the problem.
This works for about 90% of our reports.

Resources