beautifulsoup: get all table data - python-3.x

I am using django beautifulsoup to get the all data on html tables. I have the code that strips the tables and saves the table data as a list of lists:
soup = bs.BeautifulSoup(html_source, 'lxml')
table = soup.find('table', {'id': 'detail'})
rows = table.findAll('tr')
data = [[td.findChildren(text=True) for td in tr.findAll(['th', 'td'])] for tr in rows]
data = [[u"".join(d).strip() for d in l] for l in data]
This code worked well so far, but somehow it does not capture the entire data of this html table. It gets only the thead rows. I cannot figure out why?
<table class="table_type1" data-tdborder="" id="detail">
<colgroup>
<col width="38">
<col>
<col>
<col width="140">
</colgroup>
<thead>
<tr>
<th>No.</th>
<th>Status</th>
<th>Location</th>
<th>Event Date</th>
</tr>
</thead>
<tbody>
<tr>
<td style="text-align:center;">1</td>
<td class="multi_row" style="line-height:15px;">Empty Container Release to Shipper</td>
<td class="multi_row" style="line-height:15px;">SHANGHAI, SHANGHAI ,CHINA<br> GREATING FORTUNE (SHANGHAI) CONTAIN</td>
<td class="ico_a">2017-10-09 10:51</td>
</tr>
<tr>
<td style="text-align:center;">2</td>
<td class="multi_row" style="line-height:15px;">Gate In to Outbound Terminal</td>
<td class="multi_row" style="line-height:15px;">SHANGHAI, SHANGHAI ,CHINA<br> SHANGHAI SHENDONG INTERNATIONAL CON (DXYS)</td>
<td class="ico_a">2017-10-10 04:43</td>
</tr>
<tr>
<td style="text-align:center;">3</td>
<td class="multi_row" style="line-height:15px;">Loaded on 'NYK LYNX 2610E' at Port of Loading<br> NYK LYNX 2610E</td>
<td class="multi_row" style="line-height:15px;">SHANGHAI, SHANGHAI ,CHINA<br> SHANGHAI SHENDONG INTERNATIONAL CON (DXYS)</td>
<td class="ico_a">2017-10-11 22:58</td>
</tr>
<tr>
<td style="text-align:center;">4</td>
<td class="multi_row" style="line-height:15px;">'NYK LYNX 2610E' Departure from Port of Loading<br> NYK LYNX 2610E</td>
<td class="multi_row" style="line-height:15px;">SHANGHAI, SHANGHAI ,CHINA<br> SHANGHAI SHENDONG INTERNATIONAL CON (DXYS)</td>
<td class="ico_a">2017-10-12 05:00</td>
</tr>
<tr>
<td style="text-align:center;">5</td>
<td class="multi_row" style="line-height:15px;">'NYK LYNX 2610E' Arrival at Port of Discharging<br> NYK LYNX 2610E</td>
<td class="multi_row" style="line-height:15px;">VALPARAISO ,CHILE<br> TERMINAL PACIFICO SUR</td>
<td class="ico_e">2017-11-14 21:00</td>
</tr>
<tr>
<td style="text-align:center;">6</td>
<td class="multi_row" style="line-height:15px;">'NYK LYNX 2610E' POD Berthing Destination<br> NYK LYNX 2610E</td>
<td class="multi_row" style="line-height:15px;">VALPARAISO ,CHILE<br> TERMINAL PACIFICO SUR</td>
<td class="ico_e">2017-11-14 22:00</td>
</tr>
<tr>
<td style="text-align:center;">7</td>
<td class="multi_row" style="line-height:15px;">Unloaded from 'NYK LYNX 2610E' at Port of Discharging<br> NYK LYNX 2610E</td>
<td class="multi_row" style="line-height:15px;">VALPARAISO ,CHILE<br> TERMINAL PACIFICO SUR</td>
<td class="ico_e">2017-11-14 23:30</td>
</tr>
<tr>
<td style="text-align:center;">8</td>
<td class="multi_row" style="line-height:15px;">Gate Out from Inbound Terminal for Delivery to Consignee (or Port Shuttle)</td>
<td class="multi_row" style="line-height:15px;">VALPARAISO ,CHILE<br> TERMINAL PACIFICO SUR</td>
<td class="ico_e">2017-11-15 04:00</td>
</tr>
<tr>
<td style="text-align:center;">9</td>
<td class="multi_row" style="line-height:15px;">Empty Container Returned from Customer</td>
<td class="multi_row" style="line-height:15px;">VALPARAISO ,CHILE<br> </td>
<td class="ico_e">2017-11-15 10:00</td>
</tr>
</tbody>
</table>
Edit
I printed soup object and went through all the html code and surprisingly it contains only the thead of the table and not the tbody, is this a bug in boutifulsoup? This is the only part of the table that beautifulsoup4 captures:
<table class="table_type1" data-tdborder="" id="detail">
<colgroup>
<col width="38"/>
<col/>
<col/>
<col width="140"/>
</colgroup>
<thead>
<tr>
<th>No.</th>
<th>Status</th>
<th>Location</th>
<th>Event Date</th>
</tr>
</thead>
</table>

Related

Python multiple if statements for petrol price movement

Situation:
I buy a certain volume of petrol at price X, which is usually sold within a 30 days. During these 30 days, I will need to adjust my petrol station gas price according to average State gas price. If State price goes up by 1.2% from my purchased price, I will set my price to +1% of my purchased price.
Here is an example:
<style type="text/css">
.tg {border-collapse:collapse;border-spacing:0;}
.tg td{border-color:black;border-style:solid;border-width:1px;font-family:Arial, sans-serif;font-size:14px;
overflow:hidden;padding:10px 5px;word-break:normal;}
.tg th{border-color:black;border-style:solid;border-width:1px;font-family:Arial, sans-serif;font-size:14px;
font-weight:normal;overflow:hidden;padding:10px 5px;word-break:normal;}
.tg .tg-wa1i{font-weight:bold;text-align:center;vertical-align:middle}
.tg .tg-7zrl{text-align:left;vertical-align:bottom}
.tg .tg-0lax{text-align:left;vertical-align:top}
</style>
<table class="tg">
<thead>
<tr>
<th class="tg-wa1i">Time</th>
<th class="tg-wa1i">Average state gas price</th>
<th class="tg-wa1i">Av state price change, %</th>
<th class="tg-wa1i">My price change, %</th>
<th class="tg-wa1i">My station gas price</th>
</tr>
</thead>
<tbody>
<tr>
<td class="tg-7zrl">1</td>
<td class="tg-0lax"> 4.231 </td>
<td class="tg-0lax"> - </td>
<td class="tg-0lax"> - </td>
<td class="tg-0lax"> 4.231 </td>
</tr>
<tr>
<td class="tg-7zrl">2</td>
<td class="tg-0lax"> 4.337 </td>
<td class="tg-0lax"> 2.5%</td>
<td class="tg-0lax"> 2.0%</td>
<td class="tg-0lax"> 4.316 </td>
</tr>
<tr>
<td class="tg-7zrl">3</td>
<td class="tg-0lax"> 4.437 </td>
<td class="tg-0lax"> 4.9%</td>
<td class="tg-0lax"> 4.0%</td>
<td class="tg-0lax"> 4.400 </td>
</tr>
<tr>
<td class="tg-7zrl">4</td>
<td class="tg-0lax"> 4.481 </td>
<td class="tg-0lax"> 5.9%</td>
<td class="tg-0lax"> 5.0%</td>
<td class="tg-0lax"> 4.443 </td>
</tr>
<tr>
<td class="tg-7zrl">5</td>
<td class="tg-0lax"> 4.571 </td>
<td class="tg-0lax"> 8.0%</td>
<td class="tg-0lax"> 8.0%</td>
<td class="tg-0lax"> 4.569 </td>
</tr>
<tr>
<td class="tg-7zrl">6</td>
<td class="tg-0lax"> 4.616 </td>
<td class="tg-0lax"> 9.1%</td>
<td class="tg-0lax"> 9.0%</td>
<td class="tg-0lax"> 4.612 </td>
</tr>
<tr>
<td class="tg-7zrl">7</td>
<td class="tg-0lax"> 4.709 </td>
<td class="tg-0lax"> 11.3%</td>
<td class="tg-0lax"> 11.0%</td>
<td class="tg-0lax"> 4.696 </td>
</tr>
<tr>
<td class="tg-7zrl">8</td>
<td class="tg-0lax"> 4.850 </td>
<td class="tg-0lax"> 14.6%</td>
<td class="tg-0lax"> 14.0%</td>
<td class="tg-0lax"> 4.823 </td>
</tr>
</tbody>
</table>
Here is a python code but it is very clunky, and if gas price moves up by 20%, then I have to write my_priceX up to 20 times. Is there a more elegant solution?
my_price1 = last_gas_price["price"]*1.01
my_price2 = last_gas_price["price"]*1.02
my_price3 = last_gas_price["price"]*1.03
my_price4 = last_gas_price["price"]*1.04
my_price5 = last_gas_price["price"]*1.05
if last_gas_price > my_price1:
adj_price(price = my_price1)
if last_gas_price > my_price2:
adj_price(price = my_price2)
if last_gas_price > my_price3:
adj_price(price = my_price3)
if last_gas_price > my_price4:
adj_price(price = my_price4)
if last_gas_price > my_price5:
adj_price(price = my_price5)

How to make an item always appear last on list (Netsuite Advanced PDF)

I am trying to make an item appear last on the list if it appears on an in the invoice (Advanced PDF,NetSuite).
I was thinking bout trying to sort by and adding some ZZZs to the item name, but thats because I dont really know much outside of basic HTML.
Any Help would be appreciated.
Below is the table code I am dealing with.
<table class="itemtable" style="width: 100%; margin-top: 10px;"><!-- start items --><#list record.item as item><#if item_index==0>
<thead>
<tr>
<th align="center" colspan="3">${item.quantity#label}</th>
<th colspan="12">${item.item#label}</th>
<th colspan="3">${item.options#label}</th>
<th align="right" colspan="4">${item.rate#label}</th>
<th align="right" colspan="4">${item.amount#label}</th>
</tr>
</thead>
</#if><tr>
<td align="center" colspan="3" line-height="150%">${item.quantity}</td>
<td colspan="12"><span class="itemname">${item.item}</span><br />${item.description}</td>
<td colspan="3">${item.options}</td>
<td align="right" colspan="4">${item.rate}</td>
<td align="right" colspan="4">${item.amount}</td>
</tr>
</#list>
I need the to be able to choose a specific itemname to always show at the bottom.
You can use the assign tag to hold the values for your bottom item.
<#assign bottomItemName = 'Bottom Item Name'>
<#assign bottomItem = {}>
<#list record.item as item>
<#if item_index==0>
<thead>
<tr>
<th align="center" colspan="3">${item.quantity#label}</th>
<th colspan="12">${item.item#label}</th>
<th colspan="3">${item.options#label}</th>
<th align="right" colspan="4">${item.rate#label}</th>
<th align="right" colspan="4">${item.amount#label}</th>
</tr>
</thead>
</#if>
<#if item.item == bottomItemName>
<#assign bottomItem = bottomItem + item>
<#else>
<tr>
<td align="center" colspan="3" line-height="150%">${item.quantity}</td>
<td colspan="12"><span class="itemname">${item.item}</span><br />${item.description}</td>
<td colspan="3">${item.options}</td>
<td align="right" colspan="4">${item.rate}</td>
<td align="right" colspan="4">${item.amount}</td>
</tr>
</#if>
</#list>
<#if bottomItem.item??>
<tr>
<td align="center" colspan="3" line-height="150%">${bottomItem.quantity}</td>
<td colspan="12"><span class="itemname">${bottomItem.item}</span><br />${bottomItem.description}</td>
<td colspan="3">${bottomItem.options}</td>
<td align="right" colspan="4">${bottomItem.rate}</td>
<td align="right" colspan="4">${bottomItem.amount}</td>
</tr>
</#if>
You could try looping 2x.
<table class="itemtable" style="width: 100%; margin-top: 10px;"><!-- start items --><#list record.item as item><#if item_index==0>
<thead>
<tr>
<th align="center" colspan="3">${item.quantity#label}</th>
<th colspan="12">${item.item#label}</th>
<th colspan="3">${item.options#label}</th>
<th align="right" colspan="4">${item.rate#label}</th>
<th align="right" colspan="4">${item.amount#label}</th>
</tr>
</thead>
</#if><#if ${item.item}!="Your Item"><tr>
<td align="center" colspan="3" line-height="150%">${item.quantity}</td>
<td colspan="12"><span class="itemname">${item.item}</span><br />${item.description}</td>
<td colspan="3">${item.options}</td>
<td align="right" colspan="4">${item.rate}</td>
<td align="right" colspan="4">${item.amount}</td>
</tr>
</#list>
<#list record.item as item><#if ${item.item}=="Your Item"><tr>
<td align="center" colspan="3" line-height="150%">${item.quantity}</td>
<td colspan="12"><span class="itemname">${item.item}</span><br />${item.description}</td>
<td colspan="3">${item.options}</td>
<td align="right" colspan="4">${item.rate}</td>
<td align="right" colspan="4">${item.amount}</td>
</tr>
</#list>
I haven't tested the if statement, so good luck

Need the full correct html code for this Table

Image table
Please write full html codes for this table. You may see the Table image from above via the link.
<table border="1" width="800">
<tr>
<th>Level1</th>
<th>Level2</th>
<th>Level2</th>
<th>Info</th>
<th>Name</th>
</tr>
<tr>
<td rowspan="6">System</td>
</tr>
<tr>
<td rowspan="4">System Apps</td>
<td rowspan="2">System Memory</td>
</tr>
<tr>
<td rowspan="3">SystemEnv</td>
<td rowspan="1">SystemEnv2</td>
<td rowspan="2">Memeory Test</td>
</tr>
Here is the table code :
<table border="2">
<tr>
<th>Level 1</th>
<th>Level 2</th>
<th>Level 3</th>
<th>info</th>
<th>Name</th>
</tr>
<tr>
<td rowSpan="6">System</td>
<td rowSpan="4">System apps</td>
<td rowSpan="3">SystemEnv</td>
<td>App Text</td>
<td>foo</td>
</tr>
<tr>
<td>App memory</td>
<td>foo</td>
</tr>
<tr>
<td>App test</td>
<td>bar</td>
</tr>
<tr>
<td>Systemenv2</td>
<td>App test</td>
<td>bar</td>
</tr>
<tr>
<td rowSpan="2">System Memory</td>
<td rowSpan="2">Memory test</td>
<td>memory func</td>
<td>foo</td>
</tr>
<tr>
<td>Memory Func</td>
<td>foo</td>
</tr>
</table>

Getting data from Website back to excel

I'm trying to automate a page scrape program in Excel using VBA but having difficulty getting the results from the webpage as the fields I want do not have id's, I have copied the source code below I think its contained within a table? how do you get the data using td Class and class?
<table>
<tbody>
<tr>
<td class="vehicledetailstableleft"><span class="bodytextbold">Date of Liability</span></td>
<td class="vehicledetailstableright"><span class="bodytext">01 07 2014</span></td>
</tr>
<tr>
<td class="vehicledetailstableleft"><span class="bodytextbold">Date of First Registration</span></td>
<td class="vehicledetailstableright"><span class="bodytext">02 07 2013</span></td>
</tr>
<tr>
<td class="vehicledetailstableleft"><span class="bodytextbold">Year of Manufacture</span></td>
<td class="vehicledetailstableright"><span class="bodytext">2013</span></td>
</tr>
<tr>
<td class="vehicledetailstableleft"><span class="bodytextbold">Cylinder Capacity (cc)</span></td>
<td class="vehicledetailstableright"><span class="bodytext">2993cc</span></td>
</tr>
<tr>
<td class="vehicledetailstableleft"><span class="bodytextbold">CO₂ Emissions</span></td>
<td class="vehicledetailstableright"><span class="bodytext">129 g/km</span></td>
</tr>
<tr>
<td class="vehicledetailstableleft"><span class="bodytextbold">Fuel Type</span></td>
<td class="vehicledetailstableright"><span id="fueltype" class="bodytext">HEAVY OIL</span></td>
</tr>
<tr>
<td class="vehicledetailstableleft"><span class="bodytextbold">Export Marker</span></td>
<td class="vehicledetailstableright"><span id="exportmarker" class="bodytext">N</span></td>
</tr>
<tr>
<td class="vehicledetailstableleft"><span class="bodytextbold">Vehicle Status</span></td>
<td class="vehicledetailstableright"><span id="vehiclelicencestatus" class="bodytext">Licence Not Due</span></td>
</tr>
<tr>
<td class="vehicledetailstableleft"><span class="bodytextbold">Vehicle Colour</span></td>
<td class="vehicledetailstableright"><span id="colour" class="bodytext">BLUE</span></td>
</tr>
<tr>
<td class="vehicledetailstableleft"><span class="bodytextbold">Vehicle Type Approval</span></td>
<td class="vehicledetailstableright"><span class="bodytext">M1</span></td>
</tr>
<tr>
<td class="vehicledetailstableleft"><span class="bodytextbold">Date of Last V5C Issued</span>
</td>
<td class="vehicledetailstableright"><span class="bodytext">No Result Found</span>
</td>
</tr>
Tim is suggesting a code heavy way to do it, and it is technically correct. I suggest the same thing repeatedly here:
VBA spliting results from html imported table into excel
Basically, use the macro recorder, and then create a HTML query for data.
see my blog post on this as well.
http://automatic-office.com/?p=344
Many ways to skin the cat, but this is the easy way.

Tinymce copy/paste from excel (Plugin BBcode)

I'm using tinyMCE, below you can see the implementation.
Now the problem: when I copy some records from EXCEL and paste them in my tinymce field. It's displayed good enough for me (with plugin: paste, he will actually show the fields)
When i ask the value from my field I get a return with a table struct depeding what you paste. But I don't want any html, see below what I want.
Implementation code:
tinyMCE.init({
mode : "exact",
elements: "id",
theme : "advanced",
plugins : "bbcode, inlinepopups",
content_css : "tinymce.css",
entity_encoding : "raw",
remove_linebreaks : false,
forced_root_block: false,
force_br_newlines: true,
invalid_elements : "p, div, span",
force_p_newlines: false, t
heme_advanced_buttons1 : $cur_buttons,
theme_advanced_buttons2: "",
theme_advanced_buttons3: "",
init_instance_callback : "tiny_mce_callback"});
Return from tinyMCE object:
<table style="border-collapse: collapse;" width="216" border="0"
cellspacing="0" cellpadding="0">
<!--StartFragment-->
<colgroup>
<col style="mso-width-source: userset; mso-width-alt: 1152;" width="27"/>
<col width="55" />
<col style="mso-width-source: userset; mso-width-alt: 2858;"
span="2" width="67"/>
</colgroup>
<tbody>
<tr style="mso-height-source: userset;">
<td class="xl24" width="27" height="12">1</td>
<td class="xl26" width="55">26/05/12</td>
<td class="xl24" width="67">Amsterdam</td>
<td class="xl24" width="67">Casablanca</td>
</tr>
<tr style="mso-height-source: userset;">
<td class="xl24" height="12">2</td>
<td class="xl25">27/05/12</td>
<td class="xl24">Casablanca</td>
<td class="xl24">Rabat</td>
</tr>
<tr style="mso-height-source: userset;">
<td class="xl24" height="12">3</td>
<td class="xl25">28/05/12</td>
<td class="xl24">Rabat</td>
<td class="xl24">Fes</td>
</tr>
<tr style="mso-height-source: userset;">
<td class="xl24" height="12">4</td>
<td class="xl25">29/05/12</td>
<td class="xl24">Fes</td>
<td class="xl24"> </td>
</tr>
<tr style="mso-height-source: userset;">
<td class="xl24" height="12">5</td>
<td class="xl25">30/05/12</td>
<td class="xl24">Fes</td>
<td class="xl24">Erg Chebbi</td>
</tr>
<tr style="mso-height-source: userset;">
<td class="xl24" height="12">6</td>
<td class="xl25">31/05/12</td>
<td class="xl24">Erg Chebbi</td>
<td class="xl24">Dades Vallei</td>
</tr>
<tr style="mso-height-source: userset;">
<td class="xl24" height="12">7</td>
<td class="xl25">01/06/12</td>
<td class="xl24">Dades Vallei</td>
<td class="xl24">Ouarzazate</td>
</tr>
<tr style="mso-height-source: userset;">
<td class="xl24" height="12">8</td>
<td class="xl25">02/06/12</td>
<td class="xl24">Ouarzazate</td>
<td class="xl24">Marrakesh</td>
</tr>
<tr style="mso-height-source: userset;">
<td class="xl24" height="12">9</td>
<td class="xl25">03/06/12</td>
<td class="xl24">Marrakesh</td>
<td class="xl24"> </td>
</tr>
<tr style="mso-height-source: userset;">
<td class="xl24" height="12">10</td>
<td class="xl25">04/06/12</td>
<td class="xl24">Marrakesh</td>
<td class="xl24"> </td>
</tr>
<tr style="mso-height-source: userset;">
<td class="xl24" height="12">11</td>
<td class="xl25">05/06/12</td>
<td class="xl24">Marrakesh</td>
<td class="xl24">Amsterdam</td>
</tr>
<!--EndFragment--></tbody>
</table>
Expected return:
1 26/05/12 Amsterdam Casablanca
2 27/05/12 Casablanca Rabat
3 28/05/12 Rabat Fes
4 29/05/12 Fes
5 30/05/12 Fes Erg Chebbi
6 31/05/12 Erg Chebbi Dades Vallei
7 01/06/12 Dades Vallei Ouarzazate
8 02/06/12 Ouarzazate Marrakesh
9 03/06/12 Marrakesh
10 04/06/12 Marrakesh
11 05/06/12 Marrakesh Amsterdam
You should have a look at the tinymce configuration options concerning the paste plugin.
Another option is to strip out all unwanted html tags and just use plain text: TinyMCE Paste As Plain Text

Resources