I am trying to read keyStat in MorningStar and know the data which is HTML where is warped in a JSON. So far I can put a request that can get the json by Beautifulsoup:
url = 'http://financials.morningstar.com/ajax/keystatsAjax.html?t=tou&culture=en-CA®ion=CAN'
lm_json = requests.get(url).json()
ksContent = BeautifulSoup(lm_json["ksContent"],"html.parser")
Now here is a bit wired to me that the html data as 'ksContent' which contains actual data as a table. I am not a fan of html and wondering how can I just make all it to a nice pandas dataframe? As the table is long, here is some of it:
<table cellpadding="0" cellspacing="0" class="r_table1 text2">
<colgroup>
<col width="23%"/>
<col span="11" width="7%"/>
</colgroup>
<thead>
<tr>
<th align="left" scope="row"></th>
<th align="right" id="Y0" scope="col">2008-12</th>
<th align="right" id="Y1" scope="col">2009-12</th>
<th align="right" id="Y2" scope="col">2010-12</th>
<th align="right" id="Y3" scope="col">2011-12</th>
<th align="right" id="Y4" scope="col">2012-12</th>
<th align="right" id="Y5" scope="col">2013-12</th>
<th align="right" id="Y6" scope="col">2014-12</th>
<th align="right" id="Y7" scope="col">2015-12</th>
<th align="right" id="Y8" scope="col">2016-12</th>
<th align="right" id="Y9" scope="col">2017-12</th>
<th align="right" id="Y10" scope="col">TTM</th>
</tr>
</thead>
<tbody>
<tr class="hr">
<td colspan="12"></td>
</tr>
<tr>
<th class="row_lbl" id="i0" scope="row">Revenue <span>CAD Mil</span></th>
<td align="right" headers="Y0 i0">—</td>
<td align="right" headers="Y1 i0">40</td>
<td align="right" headers="Y2 i0">212</td>
<td align="right" headers="Y3 i0">349</td>
<td align="right" headers="Y4 i0">442</td>
<td align="right" headers="Y5 i0">759</td>
<td align="right" headers="Y6 i0">1,379</td>
<td align="right" headers="Y7 i0">1,074</td>
<td align="right" headers="Y8 i0">1,125</td>
<td align="right" headers="Y9 i0">1,662</td>
<td align="right" headers="Y10 i0">1,760</td>
</tr> ...
It defines a header tr, Y0, Y1 ... Y10 as actual date and next tr refers to it.
your help appreciated!
You can use read_html() to convert it into a list of dataframes
import requests
import pandas as pd
url = 'http://financials.morningstar.com/ajax/keystatsAjax.html?t=tou&culture=en-CA®ion=CAN'
lm_json = requests.get(url).json()
df_list=pd.read_html(lm_json["ksContent"])
You can iterate through it and get the dataframes one by one. You can also use dropna() to get rid of the NaN only rows.
Sample output screenshot from my jupyter Notebook
Related
I am trying to make an item appear last on the list if it appears on an in the invoice (Advanced PDF,NetSuite).
I was thinking bout trying to sort by and adding some ZZZs to the item name, but thats because I dont really know much outside of basic HTML.
Any Help would be appreciated.
Below is the table code I am dealing with.
<table class="itemtable" style="width: 100%; margin-top: 10px;"><!-- start items --><#list record.item as item><#if item_index==0>
<thead>
<tr>
<th align="center" colspan="3">${item.quantity#label}</th>
<th colspan="12">${item.item#label}</th>
<th colspan="3">${item.options#label}</th>
<th align="right" colspan="4">${item.rate#label}</th>
<th align="right" colspan="4">${item.amount#label}</th>
</tr>
</thead>
</#if><tr>
<td align="center" colspan="3" line-height="150%">${item.quantity}</td>
<td colspan="12"><span class="itemname">${item.item}</span><br />${item.description}</td>
<td colspan="3">${item.options}</td>
<td align="right" colspan="4">${item.rate}</td>
<td align="right" colspan="4">${item.amount}</td>
</tr>
</#list>
I need the to be able to choose a specific itemname to always show at the bottom.
You can use the assign tag to hold the values for your bottom item.
<#assign bottomItemName = 'Bottom Item Name'>
<#assign bottomItem = {}>
<#list record.item as item>
<#if item_index==0>
<thead>
<tr>
<th align="center" colspan="3">${item.quantity#label}</th>
<th colspan="12">${item.item#label}</th>
<th colspan="3">${item.options#label}</th>
<th align="right" colspan="4">${item.rate#label}</th>
<th align="right" colspan="4">${item.amount#label}</th>
</tr>
</thead>
</#if>
<#if item.item == bottomItemName>
<#assign bottomItem = bottomItem + item>
<#else>
<tr>
<td align="center" colspan="3" line-height="150%">${item.quantity}</td>
<td colspan="12"><span class="itemname">${item.item}</span><br />${item.description}</td>
<td colspan="3">${item.options}</td>
<td align="right" colspan="4">${item.rate}</td>
<td align="right" colspan="4">${item.amount}</td>
</tr>
</#if>
</#list>
<#if bottomItem.item??>
<tr>
<td align="center" colspan="3" line-height="150%">${bottomItem.quantity}</td>
<td colspan="12"><span class="itemname">${bottomItem.item}</span><br />${bottomItem.description}</td>
<td colspan="3">${bottomItem.options}</td>
<td align="right" colspan="4">${bottomItem.rate}</td>
<td align="right" colspan="4">${bottomItem.amount}</td>
</tr>
</#if>
You could try looping 2x.
<table class="itemtable" style="width: 100%; margin-top: 10px;"><!-- start items --><#list record.item as item><#if item_index==0>
<thead>
<tr>
<th align="center" colspan="3">${item.quantity#label}</th>
<th colspan="12">${item.item#label}</th>
<th colspan="3">${item.options#label}</th>
<th align="right" colspan="4">${item.rate#label}</th>
<th align="right" colspan="4">${item.amount#label}</th>
</tr>
</thead>
</#if><#if ${item.item}!="Your Item"><tr>
<td align="center" colspan="3" line-height="150%">${item.quantity}</td>
<td colspan="12"><span class="itemname">${item.item}</span><br />${item.description}</td>
<td colspan="3">${item.options}</td>
<td align="right" colspan="4">${item.rate}</td>
<td align="right" colspan="4">${item.amount}</td>
</tr>
</#list>
<#list record.item as item><#if ${item.item}=="Your Item"><tr>
<td align="center" colspan="3" line-height="150%">${item.quantity}</td>
<td colspan="12"><span class="itemname">${item.item}</span><br />${item.description}</td>
<td colspan="3">${item.options}</td>
<td align="right" colspan="4">${item.rate}</td>
<td align="right" colspan="4">${item.amount}</td>
</tr>
</#list>
I haven't tested the if statement, so good luck
I have an Inbound Shipment saved search of items coming in listed by container. I have no problem printing the list of the items with quantities, description, etc. but when I add in the "vessel Number" or "shipment number" I don't need it to repeat on every line. I would prefer to show the information that I would normally "group" at the top of the PDF vs. on each line.
I should note that when I print the saved search, I would have already filtered the search down to one container, meaning only one "shipment number" and one "vessel number".
<table align="center" border=".5" cellpadding=".5" cellspacing=".5" class="NATIVE-TABLE" style="width:100%;"><#list results as result><#if result_index == 0>
<thead>
<tr>
<th align="center" scope="col" style="width: 107px;">
<div><big>Shipment #</big></div>
</th>
<th align="center" scope="col" style="width: 103px;">
<div><big>Status</big></div>
</th>
<th align="center" scope="col" style="width: 156px;">
<div><big>Destination</big></div>
</th>
<th align="center" scope="col" style="width: 150px;">
<div><big>Actual Ship Date</big></div>
</th>
<th align="center" scope="col" style="width: 154px;">
<div><big>Expected Delivery Date</big></div>
</th>
<th align="center" scope="col">
<div><big>Carrier</big></div>
</th>
<th align="center" scope="col">
<div><big>Vessel #</big></div>
</th>
</tr>
</thead>
</#if><tr>
<td align="center" style="width: 107px;">${result.shipmentnumber}</td>
<td align="center" style="width: 103px;">${result.status}</td>
<td align="center" style="width: 156px;">${result.custrecord142}</td>
<td align="center" style="width: 150px;">${result.actualshippingdate}</td>
<td align="center" style="width: 154px;">${result.expecteddeliverydate}</td>
<td align="center" style="width: 154px;">${result.custrecord_htd_shipper_info}</td>
<td align="center" style="width: 154px;">${result.vesselnumber}</td>
</tr>
</#list></table>
First: please post your code so we can see where you're up to and respond accordingly - it helps us to help you!
Second: The general pattern would be that you simply use values from the first result to make up your header, and then iterate through all results to give your lines. It would look something like:
<#list results as result>
<#if result_index == 0>
*header information goes here*
</#if>
*line information goes here*
</#list>
Edited to add code
<table align="center" border=".5" cellpadding=".5" cellspacing=".5" class="NATIVE-TABLE" style="width:100%;"><#list results as result><#if result_index == 0>
<thead>
<tr>
<th align="center" scope="col" style="width: 107px;">
<div><big>Shipment #</big></div>
</th>
<th align="center" scope="col" style="width: 103px;">
<div><big>Status</big></div>
</th>
<th align="center" scope="col" style="width: 156px;">
<div><big>Destination</big></div>
</th>
<th align="center" scope="col" style="width: 150px;">
<div><big>Actual Ship Date</big></div>
</th>
<th align="center" scope="col" style="width: 154px;">
<div><big>Expected Delivery Date</big></div>
</th>
<th align="center" scope="col">
<div><big>Carrier</big></div>
</th>
<th align="center" scope="col">
<div><big>Vessel #</big></div>
</th>
</tr>
</thead>
<tr>
<td align="center" style="width: 107px;">${result.shipmentnumber}</td>
<td align="center" style="width: 103px;">${result.status}</td>
<td align="center" style="width: 156px;">${result.custrecord142}</td>
<td align="center" style="width: 150px;">${result.actualshippingdate}</td>
<td align="center" style="width: 154px;">${result.expecteddeliverydate}</td>
<td align="center" style="width: 154px;">${result.custrecord_htd_shipper_info}</td>
<td align="center" style="width: 154px;">${result.vesselnumber}</td>
</tr>
</#if>
<tr>
<td align="center" style="width: 107px;"></td>
<td align="center" style="width: 103px;">${result.status}</td>
<td align="center" style="width: 156px;">${result.custrecord142}</td>
<td align="center" style="width: 150px;">${result.actualshippingdate}</td>
<td align="center" style="width: 154px;">${result.expecteddeliverydate}</td>
<td align="center" style="width: 154px;">${result.custrecord_htd_shipper_info}</td>
<td align="center" style="width: 154px;"></td>
</tr>
</#list>
</table>
Trying to crawl all hidden comments in table rows, after row 2 and 3, but fail to extract.
i have tried the below code to extarct these comments but fails.
below is my code.please help me someone to crack this problem.
from bs4 import BeautifulSoup,Comment
import requests
r =requests.get('http://www.esuppliersindia.com/krishna-agro-
traders/aboutus-p17322178-u10731500-swa.html')
soup = BeautifulSoup(r.text,'lxml')
table = soup.find('table',class_='text-listing')
trs = table.find_all('tr')
for tr in trs[2:3]:
print(tr.text)
for tr in trs[3:4].find_next_sibling('td'):
print(tr.text)
I am not sure though if you are looking after below comments inside table.
from bs4 import BeautifulSoup,Comment
import requests
r =requests.get('http://www.esuppliersindia.com/krishna-agro-traders/aboutus-p17322178-u10731500-swa.html')
soup = BeautifulSoup(r.text,'lxml')
table = soup.find('table',class_='text-listing')
comments=table.find_all(string=lambda text:isinstance(text,Comment))
print(comments[0].split('</tr>')[0])
for i in range(1,len(comments)):
print(comments[i])
I will print the output like that.
<td align="right" bgcolor="#FFFFFF" class="text-f11-b">No. Of Employees</td>
<td bgcolor="#FFFFFF" class="text-f11">10</td>
<tr>
<td align="right" bgcolor="#FFFFFF" class="text-f11-b">Export Turnover</td>
<td bgcolor="#FFFFFF" class="text-f11"></td>
</tr>
<tr>
<td align="right" valign="top" bgcolor="#FFFFFF" class="text-f11-b">Annual Turnover</td>
<td valign="top" bgcolor="#FFFFFF" class="text-f11">10 </td>
</tr>
<tr>
<td align="right" valign="top" bgcolor="#FFFFFF" class="text-f11-b">Import Turnover</td>
<td valign="top" bgcolor="#FFFFFF" class="text-f11"> </td>
</tr>
<tr>
<td align="right" valign="top" bgcolor="#ffffff" class="text-f11-b">Bankers</td>
<td valign="top" bgcolor="#ffffff" class="text-f11">Hdfc Bank </td>
</tr>
I am using django beautifulsoup to get the all data on html tables. I have the code that strips the tables and saves the table data as a list of lists:
soup = bs.BeautifulSoup(html_source, 'lxml')
table = soup.find('table', {'id': 'detail'})
rows = table.findAll('tr')
data = [[td.findChildren(text=True) for td in tr.findAll(['th', 'td'])] for tr in rows]
data = [[u"".join(d).strip() for d in l] for l in data]
This code worked well so far, but somehow it does not capture the entire data of this html table. It gets only the thead rows. I cannot figure out why?
<table class="table_type1" data-tdborder="" id="detail">
<colgroup>
<col width="38">
<col>
<col>
<col width="140">
</colgroup>
<thead>
<tr>
<th>No.</th>
<th>Status</th>
<th>Location</th>
<th>Event Date</th>
</tr>
</thead>
<tbody>
<tr>
<td style="text-align:center;">1</td>
<td class="multi_row" style="line-height:15px;">Empty Container Release to Shipper</td>
<td class="multi_row" style="line-height:15px;">SHANGHAI, SHANGHAI ,CHINA<br> GREATING FORTUNE (SHANGHAI) CONTAIN</td>
<td class="ico_a">2017-10-09 10:51</td>
</tr>
<tr>
<td style="text-align:center;">2</td>
<td class="multi_row" style="line-height:15px;">Gate In to Outbound Terminal</td>
<td class="multi_row" style="line-height:15px;">SHANGHAI, SHANGHAI ,CHINA<br> SHANGHAI SHENDONG INTERNATIONAL CON (DXYS)</td>
<td class="ico_a">2017-10-10 04:43</td>
</tr>
<tr>
<td style="text-align:center;">3</td>
<td class="multi_row" style="line-height:15px;">Loaded on 'NYK LYNX 2610E' at Port of Loading<br> NYK LYNX 2610E</td>
<td class="multi_row" style="line-height:15px;">SHANGHAI, SHANGHAI ,CHINA<br> SHANGHAI SHENDONG INTERNATIONAL CON (DXYS)</td>
<td class="ico_a">2017-10-11 22:58</td>
</tr>
<tr>
<td style="text-align:center;">4</td>
<td class="multi_row" style="line-height:15px;">'NYK LYNX 2610E' Departure from Port of Loading<br> NYK LYNX 2610E</td>
<td class="multi_row" style="line-height:15px;">SHANGHAI, SHANGHAI ,CHINA<br> SHANGHAI SHENDONG INTERNATIONAL CON (DXYS)</td>
<td class="ico_a">2017-10-12 05:00</td>
</tr>
<tr>
<td style="text-align:center;">5</td>
<td class="multi_row" style="line-height:15px;">'NYK LYNX 2610E' Arrival at Port of Discharging<br> NYK LYNX 2610E</td>
<td class="multi_row" style="line-height:15px;">VALPARAISO ,CHILE<br> TERMINAL PACIFICO SUR</td>
<td class="ico_e">2017-11-14 21:00</td>
</tr>
<tr>
<td style="text-align:center;">6</td>
<td class="multi_row" style="line-height:15px;">'NYK LYNX 2610E' POD Berthing Destination<br> NYK LYNX 2610E</td>
<td class="multi_row" style="line-height:15px;">VALPARAISO ,CHILE<br> TERMINAL PACIFICO SUR</td>
<td class="ico_e">2017-11-14 22:00</td>
</tr>
<tr>
<td style="text-align:center;">7</td>
<td class="multi_row" style="line-height:15px;">Unloaded from 'NYK LYNX 2610E' at Port of Discharging<br> NYK LYNX 2610E</td>
<td class="multi_row" style="line-height:15px;">VALPARAISO ,CHILE<br> TERMINAL PACIFICO SUR</td>
<td class="ico_e">2017-11-14 23:30</td>
</tr>
<tr>
<td style="text-align:center;">8</td>
<td class="multi_row" style="line-height:15px;">Gate Out from Inbound Terminal for Delivery to Consignee (or Port Shuttle)</td>
<td class="multi_row" style="line-height:15px;">VALPARAISO ,CHILE<br> TERMINAL PACIFICO SUR</td>
<td class="ico_e">2017-11-15 04:00</td>
</tr>
<tr>
<td style="text-align:center;">9</td>
<td class="multi_row" style="line-height:15px;">Empty Container Returned from Customer</td>
<td class="multi_row" style="line-height:15px;">VALPARAISO ,CHILE<br> </td>
<td class="ico_e">2017-11-15 10:00</td>
</tr>
</tbody>
</table>
Edit
I printed soup object and went through all the html code and surprisingly it contains only the thead of the table and not the tbody, is this a bug in boutifulsoup? This is the only part of the table that beautifulsoup4 captures:
<table class="table_type1" data-tdborder="" id="detail">
<colgroup>
<col width="38"/>
<col/>
<col/>
<col width="140"/>
</colgroup>
<thead>
<tr>
<th>No.</th>
<th>Status</th>
<th>Location</th>
<th>Event Date</th>
</tr>
</thead>
</table>
I need some TVs (weight, dimensions, etc) I've associated with my products to appear in the Cart page of my SimpleCart site.
Problem is I have no idea how to do this. I don't understand how the SimpleCart cart is built and there isn't documentation for this.
Would anyone know how I can show TVs associated with each product in the cart output chunk?
The cart snippet has the following code which gets data from the cart and puts it into Chunks:
$sc = $modx->getService('simplecart','SimpleCart',$modx->getOption('simplecart.core_path',null,$modx->getOption('core_path').'components/simplecart/').'model/simplecart/',$scriptProperties);
if (!($sc instanceof SimpleCart)) return '';
$controller = $sc->loadController('Cart');
$output = $controller->run($scriptProperties);
The output Chunk looks like:
<div id="simplecart">
<form action="[[~[[*id]]]]" method="post" id="form_cartoverview">
<input type="hidden" name="updatecart" value="true" />
<table>
<tr>
<th class="desc">[[%simplecart.cart.description]]</th>
<th class="price">[[%simplecart.cart.price]]</th>
<th class="quantity">[[%simplecart.cart.quantity]]</th>
[[+cart.total.vat_total:notempty=`<th class="quantity">[[%simplecart.cart.vat]]</th>`:isempty=``]]
<th class="subtotal">[[%simplecart.cart.subtotal]]</th>
<th> </th>
</tr>
[[+cart.wrapper]]
[[+cart.total.discount:notempty=`<tr class="total first discount">
<td colspan="[[+cart.total.vat_total:notempty=`3`:isempty=`2`]]"> </td>
<td class="label">[[%simplecart.cart.discount]]</td>
<td class="value">- [[+cart.total.discount_formatted]]</td>
<td class="extra">[[+cart.total.discount_percent:notempty=`([[+cart.total.discount_percent]]%)`:isempty=` `]]</td>
</tr>`:isempty=``]]
[[+cart.total.vat_total:notempty=`
<tr class="total [[+cart.total.discount:notempty=`second`:isempty=`first`]]">
<td colspan="3"> </td>
<td class="label">[[%simplecart.cart.total_ex_vat]]</td>
<td class="value">[[+cart.total.price_ex_vat_formatted]]</td>
<td class="extra"> </td>
</tr>
[[+cart.vat_rates]]
<tr class="total [[+cart.total.discount:notempty=`third`:isempty=`second`]]">
<td colspan="3"> </td>
<td class="label">[[%simplecart.cart.total_vat]]</td>
<td class="value">[[+cart.total.vat_total_formatted]]</td>
<td class="extra"> </td>
</tr>
<tr class="total [[+cart.total.discount:notempty=`fourth`:isempty=`third`]]">
<td colspan="3"> </td>
<td class="label">[[%simplecart.cart.total_in_vat]]</td>
<td class="value">[[+cart.total.price_formatted]]</td>
<td class="extra"> </td>
</tr>
`:isempty=`
<tr class="total [[+cart.total.discount:notempty=`second`:isempty=`first`]]">
<td colspan="2"> </td>
<td class="label">[[%simplecart.cart.total]]</td>
<td class="value">[[+cart.total.price_formatted]]</td>
<td class="extra"> </td>
</tr>
`]]
</table>
<div class="submit">
<input type="submit" value="[[%simplecart.cart.update]]" />
</div>
</form>
This does appear to be documented:
Product Options (TVs)
and to output them:
Modifying the Product Template
It appears that you would just output them normally [[*myProductOptions]]
Though, it appears that your template is using a placeholder, I would try
[[+cart.myProductOptions] as well. If all else fails you might try debugging the simplecart class and dump the array of product data before it populates the chunk, there might be a clue in there.
Found (through trial and error) you must use:
[[+product.tv.name_of_tv]]