I am using django beautifulsoup to get the all data on html tables. I have the code that strips the tables and saves the table data as a list of lists:
soup = bs.BeautifulSoup(html_source, 'lxml')
table = soup.find('table', {'id': 'detail'})
rows = table.findAll('tr')
data = [[td.findChildren(text=True) for td in tr.findAll(['th', 'td'])] for tr in rows]
data = [[u"".join(d).strip() for d in l] for l in data]
This code worked well so far, but somehow it does not capture the entire data of this html table. It gets only the thead rows. I cannot figure out why?
<table class="table_type1" data-tdborder="" id="detail">
<colgroup>
<col width="38">
<col>
<col>
<col width="140">
</colgroup>
<thead>
<tr>
<th>No.</th>
<th>Status</th>
<th>Location</th>
<th>Event Date</th>
</tr>
</thead>
<tbody>
<tr>
<td style="text-align:center;">1</td>
<td class="multi_row" style="line-height:15px;">Empty Container Release to Shipper</td>
<td class="multi_row" style="line-height:15px;">SHANGHAI, SHANGHAI ,CHINA<br> GREATING FORTUNE (SHANGHAI) CONTAIN</td>
<td class="ico_a">2017-10-09 10:51</td>
</tr>
<tr>
<td style="text-align:center;">2</td>
<td class="multi_row" style="line-height:15px;">Gate In to Outbound Terminal</td>
<td class="multi_row" style="line-height:15px;">SHANGHAI, SHANGHAI ,CHINA<br> SHANGHAI SHENDONG INTERNATIONAL CON (DXYS)</td>
<td class="ico_a">2017-10-10 04:43</td>
</tr>
<tr>
<td style="text-align:center;">3</td>
<td class="multi_row" style="line-height:15px;">Loaded on 'NYK LYNX 2610E' at Port of Loading<br> NYK LYNX 2610E</td>
<td class="multi_row" style="line-height:15px;">SHANGHAI, SHANGHAI ,CHINA<br> SHANGHAI SHENDONG INTERNATIONAL CON (DXYS)</td>
<td class="ico_a">2017-10-11 22:58</td>
</tr>
<tr>
<td style="text-align:center;">4</td>
<td class="multi_row" style="line-height:15px;">'NYK LYNX 2610E' Departure from Port of Loading<br> NYK LYNX 2610E</td>
<td class="multi_row" style="line-height:15px;">SHANGHAI, SHANGHAI ,CHINA<br> SHANGHAI SHENDONG INTERNATIONAL CON (DXYS)</td>
<td class="ico_a">2017-10-12 05:00</td>
</tr>
<tr>
<td style="text-align:center;">5</td>
<td class="multi_row" style="line-height:15px;">'NYK LYNX 2610E' Arrival at Port of Discharging<br> NYK LYNX 2610E</td>
<td class="multi_row" style="line-height:15px;">VALPARAISO ,CHILE<br> TERMINAL PACIFICO SUR</td>
<td class="ico_e">2017-11-14 21:00</td>
</tr>
<tr>
<td style="text-align:center;">6</td>
<td class="multi_row" style="line-height:15px;">'NYK LYNX 2610E' POD Berthing Destination<br> NYK LYNX 2610E</td>
<td class="multi_row" style="line-height:15px;">VALPARAISO ,CHILE<br> TERMINAL PACIFICO SUR</td>
<td class="ico_e">2017-11-14 22:00</td>
</tr>
<tr>
<td style="text-align:center;">7</td>
<td class="multi_row" style="line-height:15px;">Unloaded from 'NYK LYNX 2610E' at Port of Discharging<br> NYK LYNX 2610E</td>
<td class="multi_row" style="line-height:15px;">VALPARAISO ,CHILE<br> TERMINAL PACIFICO SUR</td>
<td class="ico_e">2017-11-14 23:30</td>
</tr>
<tr>
<td style="text-align:center;">8</td>
<td class="multi_row" style="line-height:15px;">Gate Out from Inbound Terminal for Delivery to Consignee (or Port Shuttle)</td>
<td class="multi_row" style="line-height:15px;">VALPARAISO ,CHILE<br> TERMINAL PACIFICO SUR</td>
<td class="ico_e">2017-11-15 04:00</td>
</tr>
<tr>
<td style="text-align:center;">9</td>
<td class="multi_row" style="line-height:15px;">Empty Container Returned from Customer</td>
<td class="multi_row" style="line-height:15px;">VALPARAISO ,CHILE<br> </td>
<td class="ico_e">2017-11-15 10:00</td>
</tr>
</tbody>
</table>
Edit
I printed soup object and went through all the html code and surprisingly it contains only the thead of the table and not the tbody, is this a bug in boutifulsoup? This is the only part of the table that beautifulsoup4 captures:
<table class="table_type1" data-tdborder="" id="detail">
<colgroup>
<col width="38"/>
<col/>
<col/>
<col width="140"/>
</colgroup>
<thead>
<tr>
<th>No.</th>
<th>Status</th>
<th>Location</th>
<th>Event Date</th>
</tr>
</thead>
</table>
Related
Situation:
I buy a certain volume of petrol at price X, which is usually sold within a 30 days. During these 30 days, I will need to adjust my petrol station gas price according to average State gas price. If State price goes up by 1.2% from my purchased price, I will set my price to +1% of my purchased price.
Here is an example:
<style type="text/css">
.tg {border-collapse:collapse;border-spacing:0;}
.tg td{border-color:black;border-style:solid;border-width:1px;font-family:Arial, sans-serif;font-size:14px;
overflow:hidden;padding:10px 5px;word-break:normal;}
.tg th{border-color:black;border-style:solid;border-width:1px;font-family:Arial, sans-serif;font-size:14px;
font-weight:normal;overflow:hidden;padding:10px 5px;word-break:normal;}
.tg .tg-wa1i{font-weight:bold;text-align:center;vertical-align:middle}
.tg .tg-7zrl{text-align:left;vertical-align:bottom}
.tg .tg-0lax{text-align:left;vertical-align:top}
</style>
<table class="tg">
<thead>
<tr>
<th class="tg-wa1i">Time</th>
<th class="tg-wa1i">Average state gas price</th>
<th class="tg-wa1i">Av state price change, %</th>
<th class="tg-wa1i">My price change, %</th>
<th class="tg-wa1i">My station gas price</th>
</tr>
</thead>
<tbody>
<tr>
<td class="tg-7zrl">1</td>
<td class="tg-0lax"> 4.231 </td>
<td class="tg-0lax"> - </td>
<td class="tg-0lax"> - </td>
<td class="tg-0lax"> 4.231 </td>
</tr>
<tr>
<td class="tg-7zrl">2</td>
<td class="tg-0lax"> 4.337 </td>
<td class="tg-0lax"> 2.5%</td>
<td class="tg-0lax"> 2.0%</td>
<td class="tg-0lax"> 4.316 </td>
</tr>
<tr>
<td class="tg-7zrl">3</td>
<td class="tg-0lax"> 4.437 </td>
<td class="tg-0lax"> 4.9%</td>
<td class="tg-0lax"> 4.0%</td>
<td class="tg-0lax"> 4.400 </td>
</tr>
<tr>
<td class="tg-7zrl">4</td>
<td class="tg-0lax"> 4.481 </td>
<td class="tg-0lax"> 5.9%</td>
<td class="tg-0lax"> 5.0%</td>
<td class="tg-0lax"> 4.443 </td>
</tr>
<tr>
<td class="tg-7zrl">5</td>
<td class="tg-0lax"> 4.571 </td>
<td class="tg-0lax"> 8.0%</td>
<td class="tg-0lax"> 8.0%</td>
<td class="tg-0lax"> 4.569 </td>
</tr>
<tr>
<td class="tg-7zrl">6</td>
<td class="tg-0lax"> 4.616 </td>
<td class="tg-0lax"> 9.1%</td>
<td class="tg-0lax"> 9.0%</td>
<td class="tg-0lax"> 4.612 </td>
</tr>
<tr>
<td class="tg-7zrl">7</td>
<td class="tg-0lax"> 4.709 </td>
<td class="tg-0lax"> 11.3%</td>
<td class="tg-0lax"> 11.0%</td>
<td class="tg-0lax"> 4.696 </td>
</tr>
<tr>
<td class="tg-7zrl">8</td>
<td class="tg-0lax"> 4.850 </td>
<td class="tg-0lax"> 14.6%</td>
<td class="tg-0lax"> 14.0%</td>
<td class="tg-0lax"> 4.823 </td>
</tr>
</tbody>
</table>
Here is a python code but it is very clunky, and if gas price moves up by 20%, then I have to write my_priceX up to 20 times. Is there a more elegant solution?
my_price1 = last_gas_price["price"]*1.01
my_price2 = last_gas_price["price"]*1.02
my_price3 = last_gas_price["price"]*1.03
my_price4 = last_gas_price["price"]*1.04
my_price5 = last_gas_price["price"]*1.05
if last_gas_price > my_price1:
adj_price(price = my_price1)
if last_gas_price > my_price2:
adj_price(price = my_price2)
if last_gas_price > my_price3:
adj_price(price = my_price3)
if last_gas_price > my_price4:
adj_price(price = my_price4)
if last_gas_price > my_price5:
adj_price(price = my_price5)
I am trying to make an item appear last on the list if it appears on an in the invoice (Advanced PDF,NetSuite).
I was thinking bout trying to sort by and adding some ZZZs to the item name, but thats because I dont really know much outside of basic HTML.
Any Help would be appreciated.
Below is the table code I am dealing with.
<table class="itemtable" style="width: 100%; margin-top: 10px;"><!-- start items --><#list record.item as item><#if item_index==0>
<thead>
<tr>
<th align="center" colspan="3">${item.quantity#label}</th>
<th colspan="12">${item.item#label}</th>
<th colspan="3">${item.options#label}</th>
<th align="right" colspan="4">${item.rate#label}</th>
<th align="right" colspan="4">${item.amount#label}</th>
</tr>
</thead>
</#if><tr>
<td align="center" colspan="3" line-height="150%">${item.quantity}</td>
<td colspan="12"><span class="itemname">${item.item}</span><br />${item.description}</td>
<td colspan="3">${item.options}</td>
<td align="right" colspan="4">${item.rate}</td>
<td align="right" colspan="4">${item.amount}</td>
</tr>
</#list>
I need the to be able to choose a specific itemname to always show at the bottom.
You can use the assign tag to hold the values for your bottom item.
<#assign bottomItemName = 'Bottom Item Name'>
<#assign bottomItem = {}>
<#list record.item as item>
<#if item_index==0>
<thead>
<tr>
<th align="center" colspan="3">${item.quantity#label}</th>
<th colspan="12">${item.item#label}</th>
<th colspan="3">${item.options#label}</th>
<th align="right" colspan="4">${item.rate#label}</th>
<th align="right" colspan="4">${item.amount#label}</th>
</tr>
</thead>
</#if>
<#if item.item == bottomItemName>
<#assign bottomItem = bottomItem + item>
<#else>
<tr>
<td align="center" colspan="3" line-height="150%">${item.quantity}</td>
<td colspan="12"><span class="itemname">${item.item}</span><br />${item.description}</td>
<td colspan="3">${item.options}</td>
<td align="right" colspan="4">${item.rate}</td>
<td align="right" colspan="4">${item.amount}</td>
</tr>
</#if>
</#list>
<#if bottomItem.item??>
<tr>
<td align="center" colspan="3" line-height="150%">${bottomItem.quantity}</td>
<td colspan="12"><span class="itemname">${bottomItem.item}</span><br />${bottomItem.description}</td>
<td colspan="3">${bottomItem.options}</td>
<td align="right" colspan="4">${bottomItem.rate}</td>
<td align="right" colspan="4">${bottomItem.amount}</td>
</tr>
</#if>
You could try looping 2x.
<table class="itemtable" style="width: 100%; margin-top: 10px;"><!-- start items --><#list record.item as item><#if item_index==0>
<thead>
<tr>
<th align="center" colspan="3">${item.quantity#label}</th>
<th colspan="12">${item.item#label}</th>
<th colspan="3">${item.options#label}</th>
<th align="right" colspan="4">${item.rate#label}</th>
<th align="right" colspan="4">${item.amount#label}</th>
</tr>
</thead>
</#if><#if ${item.item}!="Your Item"><tr>
<td align="center" colspan="3" line-height="150%">${item.quantity}</td>
<td colspan="12"><span class="itemname">${item.item}</span><br />${item.description}</td>
<td colspan="3">${item.options}</td>
<td align="right" colspan="4">${item.rate}</td>
<td align="right" colspan="4">${item.amount}</td>
</tr>
</#list>
<#list record.item as item><#if ${item.item}=="Your Item"><tr>
<td align="center" colspan="3" line-height="150%">${item.quantity}</td>
<td colspan="12"><span class="itemname">${item.item}</span><br />${item.description}</td>
<td colspan="3">${item.options}</td>
<td align="right" colspan="4">${item.rate}</td>
<td align="right" colspan="4">${item.amount}</td>
</tr>
</#list>
I haven't tested the if statement, so good luck
Image table
Please write full html codes for this table. You may see the Table image from above via the link.
<table border="1" width="800">
<tr>
<th>Level1</th>
<th>Level2</th>
<th>Level2</th>
<th>Info</th>
<th>Name</th>
</tr>
<tr>
<td rowspan="6">System</td>
</tr>
<tr>
<td rowspan="4">System Apps</td>
<td rowspan="2">System Memory</td>
</tr>
<tr>
<td rowspan="3">SystemEnv</td>
<td rowspan="1">SystemEnv2</td>
<td rowspan="2">Memeory Test</td>
</tr>
Here is the table code :
<table border="2">
<tr>
<th>Level 1</th>
<th>Level 2</th>
<th>Level 3</th>
<th>info</th>
<th>Name</th>
</tr>
<tr>
<td rowSpan="6">System</td>
<td rowSpan="4">System apps</td>
<td rowSpan="3">SystemEnv</td>
<td>App Text</td>
<td>foo</td>
</tr>
<tr>
<td>App memory</td>
<td>foo</td>
</tr>
<tr>
<td>App test</td>
<td>bar</td>
</tr>
<tr>
<td>Systemenv2</td>
<td>App test</td>
<td>bar</td>
</tr>
<tr>
<td rowSpan="2">System Memory</td>
<td rowSpan="2">Memory test</td>
<td>memory func</td>
<td>foo</td>
</tr>
<tr>
<td>Memory Func</td>
<td>foo</td>
</tr>
</table>
I'm trying to automate a page scrape program in Excel using VBA but having difficulty getting the results from the webpage as the fields I want do not have id's, I have copied the source code below I think its contained within a table? how do you get the data using td Class and class?
<table>
<tbody>
<tr>
<td class="vehicledetailstableleft"><span class="bodytextbold">Date of Liability</span></td>
<td class="vehicledetailstableright"><span class="bodytext">01 07 2014</span></td>
</tr>
<tr>
<td class="vehicledetailstableleft"><span class="bodytextbold">Date of First Registration</span></td>
<td class="vehicledetailstableright"><span class="bodytext">02 07 2013</span></td>
</tr>
<tr>
<td class="vehicledetailstableleft"><span class="bodytextbold">Year of Manufacture</span></td>
<td class="vehicledetailstableright"><span class="bodytext">2013</span></td>
</tr>
<tr>
<td class="vehicledetailstableleft"><span class="bodytextbold">Cylinder Capacity (cc)</span></td>
<td class="vehicledetailstableright"><span class="bodytext">2993cc</span></td>
</tr>
<tr>
<td class="vehicledetailstableleft"><span class="bodytextbold">CO₂ Emissions</span></td>
<td class="vehicledetailstableright"><span class="bodytext">129 g/km</span></td>
</tr>
<tr>
<td class="vehicledetailstableleft"><span class="bodytextbold">Fuel Type</span></td>
<td class="vehicledetailstableright"><span id="fueltype" class="bodytext">HEAVY OIL</span></td>
</tr>
<tr>
<td class="vehicledetailstableleft"><span class="bodytextbold">Export Marker</span></td>
<td class="vehicledetailstableright"><span id="exportmarker" class="bodytext">N</span></td>
</tr>
<tr>
<td class="vehicledetailstableleft"><span class="bodytextbold">Vehicle Status</span></td>
<td class="vehicledetailstableright"><span id="vehiclelicencestatus" class="bodytext">Licence Not Due</span></td>
</tr>
<tr>
<td class="vehicledetailstableleft"><span class="bodytextbold">Vehicle Colour</span></td>
<td class="vehicledetailstableright"><span id="colour" class="bodytext">BLUE</span></td>
</tr>
<tr>
<td class="vehicledetailstableleft"><span class="bodytextbold">Vehicle Type Approval</span></td>
<td class="vehicledetailstableright"><span class="bodytext">M1</span></td>
</tr>
<tr>
<td class="vehicledetailstableleft"><span class="bodytextbold">Date of Last V5C Issued</span>
</td>
<td class="vehicledetailstableright"><span class="bodytext">No Result Found</span>
</td>
</tr>
Tim is suggesting a code heavy way to do it, and it is technically correct. I suggest the same thing repeatedly here:
VBA spliting results from html imported table into excel
Basically, use the macro recorder, and then create a HTML query for data.
see my blog post on this as well.
http://automatic-office.com/?p=344
Many ways to skin the cat, but this is the easy way.
I'm using tinyMCE, below you can see the implementation.
Now the problem: when I copy some records from EXCEL and paste them in my tinymce field. It's displayed good enough for me (with plugin: paste, he will actually show the fields)
When i ask the value from my field I get a return with a table struct depeding what you paste. But I don't want any html, see below what I want.
Implementation code:
tinyMCE.init({
mode : "exact",
elements: "id",
theme : "advanced",
plugins : "bbcode, inlinepopups",
content_css : "tinymce.css",
entity_encoding : "raw",
remove_linebreaks : false,
forced_root_block: false,
force_br_newlines: true,
invalid_elements : "p, div, span",
force_p_newlines: false, t
heme_advanced_buttons1 : $cur_buttons,
theme_advanced_buttons2: "",
theme_advanced_buttons3: "",
init_instance_callback : "tiny_mce_callback"});
Return from tinyMCE object:
<table style="border-collapse: collapse;" width="216" border="0"
cellspacing="0" cellpadding="0">
<!--StartFragment-->
<colgroup>
<col style="mso-width-source: userset; mso-width-alt: 1152;" width="27"/>
<col width="55" />
<col style="mso-width-source: userset; mso-width-alt: 2858;"
span="2" width="67"/>
</colgroup>
<tbody>
<tr style="mso-height-source: userset;">
<td class="xl24" width="27" height="12">1</td>
<td class="xl26" width="55">26/05/12</td>
<td class="xl24" width="67">Amsterdam</td>
<td class="xl24" width="67">Casablanca</td>
</tr>
<tr style="mso-height-source: userset;">
<td class="xl24" height="12">2</td>
<td class="xl25">27/05/12</td>
<td class="xl24">Casablanca</td>
<td class="xl24">Rabat</td>
</tr>
<tr style="mso-height-source: userset;">
<td class="xl24" height="12">3</td>
<td class="xl25">28/05/12</td>
<td class="xl24">Rabat</td>
<td class="xl24">Fes</td>
</tr>
<tr style="mso-height-source: userset;">
<td class="xl24" height="12">4</td>
<td class="xl25">29/05/12</td>
<td class="xl24">Fes</td>
<td class="xl24"> </td>
</tr>
<tr style="mso-height-source: userset;">
<td class="xl24" height="12">5</td>
<td class="xl25">30/05/12</td>
<td class="xl24">Fes</td>
<td class="xl24">Erg Chebbi</td>
</tr>
<tr style="mso-height-source: userset;">
<td class="xl24" height="12">6</td>
<td class="xl25">31/05/12</td>
<td class="xl24">Erg Chebbi</td>
<td class="xl24">Dades Vallei</td>
</tr>
<tr style="mso-height-source: userset;">
<td class="xl24" height="12">7</td>
<td class="xl25">01/06/12</td>
<td class="xl24">Dades Vallei</td>
<td class="xl24">Ouarzazate</td>
</tr>
<tr style="mso-height-source: userset;">
<td class="xl24" height="12">8</td>
<td class="xl25">02/06/12</td>
<td class="xl24">Ouarzazate</td>
<td class="xl24">Marrakesh</td>
</tr>
<tr style="mso-height-source: userset;">
<td class="xl24" height="12">9</td>
<td class="xl25">03/06/12</td>
<td class="xl24">Marrakesh</td>
<td class="xl24"> </td>
</tr>
<tr style="mso-height-source: userset;">
<td class="xl24" height="12">10</td>
<td class="xl25">04/06/12</td>
<td class="xl24">Marrakesh</td>
<td class="xl24"> </td>
</tr>
<tr style="mso-height-source: userset;">
<td class="xl24" height="12">11</td>
<td class="xl25">05/06/12</td>
<td class="xl24">Marrakesh</td>
<td class="xl24">Amsterdam</td>
</tr>
<!--EndFragment--></tbody>
</table>
Expected return:
1 26/05/12 Amsterdam Casablanca
2 27/05/12 Casablanca Rabat
3 28/05/12 Rabat Fes
4 29/05/12 Fes
5 30/05/12 Fes Erg Chebbi
6 31/05/12 Erg Chebbi Dades Vallei
7 01/06/12 Dades Vallei Ouarzazate
8 02/06/12 Ouarzazate Marrakesh
9 03/06/12 Marrakesh
10 04/06/12 Marrakesh
11 05/06/12 Marrakesh Amsterdam
You should have a look at the tinymce configuration options concerning the paste plugin.
Another option is to strip out all unwanted html tags and just use plain text: TinyMCE Paste As Plain Text