Pandas read_html misses table with several rows - python-3.x

I'm scraping a website and in order to get the table I'm using pd.read_html.
I get the node doing this:
table=WebDriverWait(browser,10).until(EC.presence_of_element_located((
By.XPATH,'//tbody[ancestor::div[contains(#id,"cornerOddsDiv")]]')))
newt=pd.read_html(table.get_attribute('outerHTML'))
This returns:
ValueError: No tables found
Giving the table node this output:
table.get_attribute('outerHTML')
>>'<tbody><tr><th colspan="10" align="center" class="bg1">365 Corner Odds</th></tr><tr bgcolor="#FCEAAB"><td colspan="10" align="center"><strong>Over/Under</strong></td></tr><tr onclick="goCorner(1510721)" style="cursor:pointer;" align="center" class="bg1" id="trCornerTotal" odds="1.19,0.25,0.72"><td width="14%" bgcolor="#EBF2F8">early</td><td width="10%" class="bg2">1 </td><td width="10%" class="bg2">10.5</td><td width="10%" class="bg2">0.8</td><td width="6%" class="bg2">detail</td><td width="14%" bgcolor="#EBF2F8">0.25</td><td width="10%" class="bg2">1</td><td width="10%" class="bg2">0.72</td><td width="10%" class="bg2">0.8</td><td width="6%" class="bg2">detail</td></tr></tbody>'
Why is it not working? I have followed the same procedure for other tables and they did work.

I finally found the answer. The node is of a structure like the following
<div>
<div>
<table>
<tbody>
<tr>..</tr>
<tr>..</tr>
...
</tbody>
Etc
The key is, instead of passing the node of tbody, for reasons unknown to me,I have to pass table node, and then it works just as fine as the others when using tbody.
So, it would be:
table=WebDriverWait(browser,10).until(EC.presence_of_element_located((
By.XPATH,'//table[contains(#class,"bhTable") and
ancestor::div[contains(#id,"cornerOddsDiv")]]')))
and that returns the desired output

Related

Placing BeautifulSoup data into a Pandas dataframe - coming up blank

Goal: I'd like to create a data frame after scraping data from a website and narrowing it down to the table of interest (I am looking to get average meat consumption per capita for all countries in the world)
Problem: I have the table of interest but I am having trouble placing it into a data frame. However, everything I try ends up with a blank data frame.
Output:
<table class="wikitable sortable">
<caption>Countries by meat consumption per capita
</caption>
<tbody><tr>
<th>Country</th>
<th>kg/person (2002)<sup class="reference" id="cite_ref-9">[9]</sup><sup class="reference" id="cite_ref-11">[note 1]</sup></th>
<th>kg/person (2009)<sup class="reference" id="cite_ref-FAO2013_10-1">[10]</sup></th>
<th>kg/person (2017)<sup class="reference" id="cite_ref-12">[11]</sup>
</th></tr>
<tr>
<td><span class="flagicon"><img alt="" class="thumbborder" data-file-height="700" data-file-width="980" decoding="async" height="15" src="//upload.wikimedia.org/wikipedia/commons/thumb/3/36/Flag_of_Albania.svg/21px-Flag_of_Albania.svg.png" srcset="//upload.wikimedia.org/wikipedia/commons/thumb/3/36/Flag_of_Albania.svg/32px-Flag_of_Albania.svg.png 1.5x, //upload.wikimedia.org/wikipedia/commons/thumb/3/36/Flag_of_Albania.svg/42px-Flag_of_Albania.svg.png 2x" width="21"/> </span>Albania</td>
<td>38.2</td>
<td></td>
<td>
</td></tr>
<tr>
<td><span class="flagicon"><img alt="" class="thumbborder" data-file-height="600" data-file-width="900" decoding="async" height="15" src="//upload.wikimedia.org/wikipedia/commons/thumb/7/77/Flag_of_Algeria.svg/23px-Flag_of_Algeria.svg.png" srcset="//upload.wikimedia.org/wikipedia/commons/thumb/7/77/Flag_of_Algeria.svg/35px-Flag_of_Algeria.svg.png 1.5x, //upload.wikimedia.org/wikipedia/commons/thumb/7/77/Flag_of_Algeria.svg/45px-Flag_of_Algeria.svg.png 2x" width="23"/> </span>Algeria</td>
<td>18.3</td>
<td>19.5</td>
<td>17.33
</td></tr>
<tr>
<td><span class="flagicon"><img alt="" class="thumbborder" data-file-height="500" data-file-width="1000" decoding="async" height="12" src="//upload.wikimedia.org/wikipedia/commons/thumb/8/87/Flag_of_American_Samoa.svg/23px-Flag_of_American_Samoa.svg.png" srcset="//upload.wikimedia.org/wikipedia/commons/thumb/8/87/Flag_of_American_Samoa.svg/35px-Flag_of_American_Samoa.svg.png 1.5x, //upload.wikimedia.org/wikipedia/commons/thumb/8/87/Flag_of_American_Samoa.svg/46px-Flag_of_American_Samoa.svg.png 2x" width="23"/> </span>American Samoa</td>
<td>24.9</td>
<td>26.8</td>
<td>
</td></tr>
<tr>
I am looking to pull the following column titles for a chart on meat consumption per capita for all of the countries in the world: Country, kg/person (2002), kg/person (2009), kg/person (2017)
My Code:
A=[]
B=[]
C=[]
for row in table_meat1.findAll('tr'):
cells=row.findAll('td')
if len(cells)==3:
A.append(cells[0].find(text=True))
B.append(cells[1].find(text=True))
C.append(cells[2].find(text=True))
Need help placing the data into a data frame!
The answer to this question would be:
Use Selenium with chrome driver, to do so you can use :
pip install selenium
Then download the appropriate chrome driver from here considering the os I checked version 86.0.4240.22 it worked fine.
unzip and put it somewhere like: /Users/admin/software/chromedriver
Then run this code.
from selenium import webdriver
URL = 'https://www.amazon.com/Metagenics-Ultra-Potent-C-1000-Count/dp/B004GLEUHI/ref=sr_1_2_sspa?crid=11YWA9XFVALBP&dchild=1&keywords=metagenics&qid=1603050330&sprefix=metageni%2Caps%2C224&sr=8-2-spons&psc=1&spLa=ZW5jcnlwdGVkUXVhbGlmaWVyPUFRRDdMVU5GNDFKQ1QmZW5jcnlwdGVkSWQ9QTA1NTc3NzAxSFYxV0k5MlFGUUZTJmVuY3J5cHRlZEFkSWQ9QTA2MzM0MzAyWDBDSjNCNlFGRVJNJndpZGdldE5hbWU9c3BfYXRmJmFjdGlvbj1jbGlja1JlZGlyZWN0JmRvTm90TG9nQ2xpY2s9dHJ1ZQ=='
options = webdriver.ChromeOptions()
options.add_argument('headless')
driver = webdriver.Chrome("/Users/admin/software/chromedriver",chrome_options=options)
driver.implicitly_wait(5)
driver.get(URL)
content = driver.page_source;
soup = BeautifulSoup(content)
price=soup.find('table', class_='wikitable sortable')
print(price)
But be aware that web scraping is forbidden on some websites and you have to use their provided web API.

Import CSV to Django, save data and show results within template

I need your help with uploading data(CSV) to django app and save data to database. Also i want to show row wise results within the template. Attached is the page image for your reference. Please guide how can i implement the same.image
I had worked in some similar project similar, recive a csv and upload the data in different databases.
I used pandas to deal with different types of user friendly tabular extensions, but if you gonna work only with csv you can use the csv lib from python.
you can simply put some form for upload the csv to allow the user to send the file to django application.
these example we used this to catch a xlsx file and then transform in a dataframe(using pandas) to be easy to deal.
archiveXLSX = request.FILES['xlsx_file']
df = pd.read_excel(archiveXLSX,keep_default_na=False)
now you have the data inside the django backends and need transform these data and make shure who these data are Ok to load in the database.
after all verification we put the data in a list(without save this).
list_example = [
ExampleModel(data1,data2,data3),
ExampleModel(data1,data2,data3),
]
To upload to database, is interesting insert the data using bulk_create(to avoid execute multiples inserts) bulk_create can do it more fast than create one per one .
Ex:
ExampleModel.objects.using('database').bulk_create(list_example)
so if you want show these data in a template i recommend you to create some table in the database who will indicate the date of upload, file that user uploaded and the id of the data upload from these file. ex:
id, file_name , date , id_data
1 ,example.csv,20 - 04 - 2020, 2
2 ,example.csv,20 - 04 - 2020, 4
3 ,example.csv,20 - 04 - 2020, 2
Now. To deal whit these two models you need to create the csv model objects while create the model objects for the data from csv (make sure you will save first the model of the Data from csv in these case is ExampleModel).
Let's suppose who after verify these data, processing and these common things. You put them in some dictionary, the function to save them can be like this:
csvDataToCreate = []
dataToCreate = []
for data in dataDict:
example = ExampleModel(data['data1'],data['data2'],data['data3'])
dataToCreate.append(example)
csvData = ExampleCsvModel(filename, datetime.now().date(), example)
csvDataToCreate.append(csvData)
now you have all the information you need.
The view for csvData can be a simple select.
csvData.objects.filter(file_name='file.csv',date='xx-xx-xxxx')
and the template is simple too:
<table class="table table-striped">
<thead>
<tr>
<th scope="col" style="width: 35%;">data1</th>
<th scope="col" style="width: 45%;">data2</th>
<th scope="col" style="width: 5%;">data3</th>
</tr>
</thead>
<tbody>
{% for csv in csvitens %}
<tr>
<td style="width: 35%;"><div class='btn'>{{csv.id_data.data1}}</div></td>
<td style="width: 35%;"><div class='btn'>{{csv.id_data.data2}}</div></td>
<td style="width: 35%;"><div class='btn'>{{csv.id_data.data3}}</div></td>
<tr>
{% endfor %}
</tbody>
</table>
for this task you can use the css framework of your choice, here i ised boostrap4 .

Initialize from html table

I have been trying to make an org chart using the code from this example http://www.getorgchart.com/Demos/Initialize-From-HTML-Table
I am making my table dynamically from xml data (planning to do this for many different xml files so thats why I haven't hardcoded the table). The result does not look right and I'm not sure whats going wrong. I suspect that maybe its trying to make the org chart before my table is made. I have no console errors and I get a plain blue screen with the search bar and arrows when I try to run everything.
Here is my html code:
https://pastebin.com/6b1d0gEC
<body onload="getXML()">
<div id = "conditions"></div>
<div style="float: right; width: 10%; height:100%; text-align:center; display: none;" ></div>
<table id="orgChartData" >
<tr>
<th>title</th>
</tr>
<tr>
Here is my JS code
https://pastebin.com/c7fFerqH
Here is the XML im using to generate the table
https://pastebin.com/0k3xQ5Th

Extracting specific values with Postgresql

I have a table like this:
Table
<!DOCTYPE html>
<html>
<body>
<table border="1" style="width:100%">
<tr>
<td>email</td>
<td>data</td>
</tr>
<tr>
<td>creator_a#creator.com</td>
<td>"vimeo_profile"=>"", "twitter_profile"=>"", "youtube_profile"=>"", "creator_category"=>"production_company", "facebook_profile"=>"", "linkedin_profile"=>"", "personal_website"=>"", "instagram_profile"=>"", "content_expertise_categories"=>"4,5,8"</td>
</tr>
<tr>
<td>creator_b#creator.com</td>
<td>"twitter_profile"=>"", "creator_category"=>"association", "facebook_profile"=>"", "linkedin_profile"=>"", "personal_website"=>"", "content_expertise_type"=>"image", "content_expertise_categories"=>"4, 6"</td>
</tr>
</table>
</body>
</html>
And I want to query this using PostgreSQL, so I only get the values regarding content_expertise_categories:
*Important to mention that the number of values vary. The table has many more entries so I am looking for a solution that helps me extract the values regardless of whether there are 2 or 15 values to pull out.
Result
<!DOCTYPE html>
<html>
<body>
<table border="1" style="width:100%">
<tr>
<td>email</td>
<td>data</td>
</tr>
<tr>
<td>creator_a#creator.com</td>
<td>4,5,8</td>
</tr>
<tr>
<td>creator_b#creator.com</td>
<td>4,6</td>
</tr>
</table>
</body>
</html>
I have tried substring but can't make it to work.
Some help would be much appreciated, thanks!
SELECT
email,
(string_to_array(
data::text,'"content_expertise_categories"=>'::text
)
)[2] as data
FROM users
;
Update:
In your example all strings have "content_expertise_categories" listed last, which allows to think you can just split string to two pieces. If you happen to have more php array definition values after, you'll need an additional split on ',"' and taking [1] part this time...
Mind casting column "data" to ::text before using it in content_expertise_categories function, as it requires text type, and your column appeared to be not such.
I believe more elegant would be this query:
select
email,
data->'content_expertise_categories' as data
from h
;
But when I was posting first query I did not know that you use hstore

JavaFX hide text of column in tableview

I have a tableview and I want to show an image in the first column. My problem is I can't sort the column then. My idea is to set text in the column too and hide the text so it is only for the correct sorting set. Is there a way to do that? Or what other solutions are possible for my problem?
I think this is the perfect example what you wants to do.Still let me know if you have any issue.
Check here
I would have a look at TableColumn.setCellValueFactory() and TableColumn.setCellFactory(). The further is used to provide the actual cell value (used for sorting!), the latter is used to provide the rendering.
In other words: If you need the sort order, you must not change the content, but only the Cell rendering. The methods mentioned above let you do exactly this.
Hope that helps ...
You could do it with just CSS using text-indent. You would also need to set the image as a css background. You did not provide an code of your table, but below is some example:
HTML:
<table width="100%" border="1" cellspacing="1" cellpadding="1">
<tr>
<td class="hidetext image">Text 1</td>
<td>Some text to show</td>
</tr>
<tr>
<td class="hidetext image">Text 2</td>
<td>Some text to show</td>
</tr>
<tr>
<td class="hidetext image">Text 3</td>
<td>Some text to show</td>
</tr>
<tr>
<td class="hidetext image">Text 4</td>
<td>Some text to show</td>
</tr>
</table>
CSS:
.hidetext {text-indent:-9000px}
.image {background:url(http://www.madisoncopy.com/images/jpeg.jpg) no-repeat;}
See how in the left column the text does not show (but it is actually there just indented off the screen).
See this fiddle: http://jsfiddle.net/D297P/

Resources