Formatting HTML Tables for Excel

Formatting HTML Tables for Excel - excel

I have a page in asp that makes a xls table, however when open the table all the rows are stuffed into the default column width which I would like to set.
My table looks something like this:
<table>
<thead>
'A for loop makes a series of th
</thead>
'another loop pulls db values
<tr><td>value1</td><td>value2</td> 'etc </tr>
</table>
I have tried the following to set the space
width="3.29in"
&nsp; spam (barbaric but sometimes effective)
width="400px"
style="width:300px"
none of the above seem to work.
Additionally here is my header asp incase its relevant
Response.Clear()
Response.Buffer = False
Response.ContentType = "application/vnd.ms-excel"
Response.AddHeader "Content-Disposition", "attachment; filename=blah.xls"
Also on a side note for some reason when I have a dollar value printed as such
<td>$<%=dbvalue%></td>
for some reason this yields '$dollar value and I am not sure how to nuke the single quote.

Do you need the thead tag? Something like this should work:
<html>
<body>
<h1>Report Title</h1>
<table >
<tr>
<th style="width : 300px">header1</th>
<th style="width : 100px">header2</th>
<th style="width : 200px">header..</th>
<th style="width : 300px">....</th>
</tr>
<tr class="row1">
<td >value1</td>
<td >value2</td>
<td >value..</td>
<td >....</td>
</tr>
....
Optionally you can put the table row with th tags inside a <thead> tag

Related

Colspan not working properly using Python Pandas

I have some data that i need to convert into an Excel sheet which needs to look like this at the end of the day:
I've tried the following code:
import pandas as pd
result = pd.read_html(
"""<table>
<tr>
<th colspan="2">Status N</th>
</tr>
<tr>
<td style="font-weight: bold;">Merchant</td>
<td>Count</td>
</tr>
<tr>
<td>John Doe</td>
<td>10</td>
</tr>
</table>"""
)
writer = pd.ExcelWriter('out/test_pd.xlsx', engine='xlsxwriter')
print(result[0])
result[0].to_excel(writer, sheet_name='Sheet1', index=False)
writer.save()
This issue here is that the colspan is not working properly. The output is like this instead:
Can someone help me on how i can use colspan on Python Pandas?
It would be better if i don't have to use read_html() and do it directly on python code but if it's not possible, i can use read_html()

Since Pandas can't recognize the values and columns title you should introduce them, if you convert HTML text to the standard format, then pandas can handle it correctly. use thead and tbody to split header and values like this.
result = pd.read_html("""
<table>
<thead>
<tr>
<th colspan="2">Status N</th>
</tr>
<tr>
<td style="font-weight: bold;">Merchant</td>
<td>Count</td>
</tr>
</thead>
<tbody>
<tr>
<td>John Doe</td>
<td>10</td>
</tr>
</tbody>
</table>
"""
)
To write Dataframe to an excel file you can use the pandas to_excel method.
result[0].to_excel("out.xlsx")

Create Pyspark Dataframe from Key Value Pair log file with variable contents

I have a log file in key=value pair format and would like to read the contents into an rdd, process the rdd into a data frame, and perform aggregations/analysis with spark SQL. I can read the raw data to rdd but I haven't been able to find an example of how to process key value pairs into a tabular format.
To complicate matters, the log can and does have missing key value pairs, so the format is variable. I would hope to be able to get around this by having NULL values in rows where that 'column'/key=value is missing once processed to data frame.
Below is an example of the log :
"Date"="2017-07-11T15:55:07-07:00","recordType"="ap_data","apName"="ap1","numClients"="5","version"="2.1"
"Date"="2017-07-11T15:55:07-07:00","recordType"="ap_data","apName"="ap2","numClients"="4","version"="2.1"
"Date"="2017-07-11T15:55:07-07:00","recordType"="ap_data","apName"="ap3","version"="2.1"
Notice the third event is missing the "numClients" key-value pair.
All I've managed to do so far is read the raw content to RDD:
#Initializing PySpark
from pyspark import SparkContext, SparkConf
from pyspark.context import SparkContext
from pyspark.sql.types import Row
sc = SparkContext.getOrCreate()
# Read raw contents to a new RDD and print first 2 results
raw_data = sc.textFile("log_sample.log")
raw_data.take(2)
Kindly please provide some help with reading key-value pair formatted data and processing to tabular format. Else, if this is not the right approach, I'm open to suggestion(s). Thank you!
Below is the data frame structure I hope to produce:
EDIT: Apologies, for clarity I'm not trying to produce any HTML, just wanted to show an example of tabular result, not sure why the html is showing and not just rendering the table.
<style type="text/css">
.tg {border-collapse:collapse;border-spacing:0;}
.tg td{font-family:Arial, sans-serif;font-size:14px;padding:10px 5px;border-style:solid;border-width:1px;overflow:hidden;word-break:normal;}
.tg th{font-family:Arial, sans-serif;font-size:14px;font-weight:normal;padding:10px 5px;border-style:solid;border-width:1px;overflow:hidden;word-break:normal;}
.tg .tg-yw4l{vertical-align:top}
</style>
<table class="tg">
<tr>
<th class="tg-yw4l">Date</th>
<th class="tg-yw4l">recordType</th>
<th class="tg-yw4l">apName</th>
<th class="tg-yw4l">numClients</th>
<th class="tg-yw4l">version</th>
</tr>
<tr>
<td class="tg-yw4l">2017-07-11T15:55:07-07:00</td>
<td class="tg-yw4l">ap_data</td>
<td class="tg-yw4l">ap1</td>
<td class="tg-yw4l">5</td>
<td class="tg-yw4l">2.1</td>
</tr>
<tr>
<td class="tg-yw4l">2017-07-11T15:55:07-07:00</td>
<td class="tg-yw4l">ap_data</td>
<td class="tg-yw4l">ap2</td>
<td class="tg-yw4l">4</td>
<td class="tg-yw4l">2.1</td>
</tr>
<tr>
<td class="tg-yw4l">2017-07-11T15:55:07-07:00</td>
<td class="tg-yw4l">ap_data</td>
<td class="tg-yw4l">ap3</td>
<td class="tg-yw4l"></td>
<td class="tg-yw4l">2.1</td>
</tr>
</table>

VBA Excel get text inside HTMLObject

I know this is really easy for some of you out there. But I have been going deep on the internet and I can not find an answer. I need to get the company name that is inside the
tbody tr td a eBay-tradera.com
and
td class="bS aR" 970,80
/td /tr /tbody
<tbody id="matrix1_group0">
<tr class="oR" onmouseover="onMouseOver(this, false)" onmouseout="onMouseOut(this, false)" onclick="onClick(this, false)">
<td class="bS"> </td>
<td>
<a href="aProgramInfoApplyRead.action?programId=175&affiliateId=2014848" title="http://www.tradera.com/" target="_blank">
eBay-Tradera.com
</a>
</td>
<td class="aR">
175</td>
<td class="bS aR">0</td><td class="bS aR">0</td><td class="bS aR">187</td>
<td class="aR">0,00%</td><td class="bS aR">124</td>
<td class="aR">0,00%</td>
<td class="bS aR">26</td>
<td class="aR">20,97%</td>
<td class="bS aR">32</td>
<td class="aR">60,80</td>
<td class="aR">25,81%</td>
<td class="bS aR">5 102,00</td>
<td class="bS aR">0,00</td>
<td class="aR">0,00</td>
<td class="bS aR">
970,80
</td>
</tr>
</tbody>
This is my code, where I only try to get the a tag to start of with but I cant get that to work either
Set TDelements = document.getElementById("matrix1_group0").document.getElementsbytagname("a").innerHTML
r = 0
C = 0
For Each TDelement In TDelements
Blad1.Range("A1").Offset(r, C).Value = TDelement.innerText
r = r + 1
Next
Thanks on beforehand I know that this might be to simple. But I hope that other people might have the same issue and this will be helpful for them as well. The reason for the "r = r + 1" is because there are many more companies on this list. I just wanted to make it as easy as I could. Thanks again!

You will need to specify the element location in the table. Ebay seems to be obfuscating the class-names so we cannot rely on those being consistent. Nor would I usually rely on the elements by their table index being consistent but I don't see any way around this.
I am assuming that this is the HTML document you are searching
<tbody id="matrix1_group0">
<tr class="oR" onmouseover="onMouseOver(this, false)" onmouseout="onMouseOut(this, false)" onclick="onClick(this, false)">
<td class="bS"> </td>
<td>
<a href="aProgramInfoApplyRead.action?programId=175&affiliateId=2014848" title="http://www.tradera.com/" target="_blank">
eBay-Tradera.com <!-- <=== You want this? -->
</a>
</td>
<!-- ... -->
</tr>
<!-- ... -->
</tbody>
We can ignore the rest of the document as the table element has an ID. In short, we assume that
.getElementById("matrix1_group0").getElementsByTagName("TR")
will return a collection of html row objects sorted by their appearance.
Set matrix = document.getElementById("matrix1_group0")
Set firstRow = matrix.getElementsByTagName("TR")(1)
Set firstRowSecondCell = firstRow.getElementsByTagName("TD")(2)
traderaName = firstRowSecondCell.innerText
Of course you could inline this all as
document.getElementById("matrix1_group0").getElementsByTagName("TR")(1).getElementsByTagName("TD")(2).innerText
but that would make debugging harder. Also if the web-page is ever presented to you in a different format then this won't work. Ebay is deliberately making it hard for you to scrape data off of it for security.

With only the HTML you have shown you can use CSS selectors to obtain these:
a[href*='aProgramInfoApplyRead.action?programId']
Which says a tag with attribute href that contains the string 'aProgramInfoApplyRead.action?programId'. This matches two elements but the first is the one you want.
CSS Selector:
VBA:
You can use .querySelector method of .document to retrieve the first match
Debug.Print ie.document.querySelector("a[href*='aProgramInfoApplyRead.action?programId']").innerText

JavaFX hide text of column in tableview

I have a tableview and I want to show an image in the first column. My problem is I can't sort the column then. My idea is to set text in the column too and hide the text so it is only for the correct sorting set. Is there a way to do that? Or what other solutions are possible for my problem?

I think this is the perfect example what you wants to do.Still let me know if you have any issue.
Check here

I would have a look at TableColumn.setCellValueFactory() and TableColumn.setCellFactory(). The further is used to provide the actual cell value (used for sorting!), the latter is used to provide the rendering.
In other words: If you need the sort order, you must not change the content, but only the Cell rendering. The methods mentioned above let you do exactly this.
Hope that helps ...

You could do it with just CSS using text-indent. You would also need to set the image as a css background. You did not provide an code of your table, but below is some example:
HTML:
<table width="100%" border="1" cellspacing="1" cellpadding="1">
<tr>
<td class="hidetext image">Text 1</td>
<td>Some text to show</td>
</tr>
<tr>
<td class="hidetext image">Text 2</td>
<td>Some text to show</td>
</tr>
<tr>
<td class="hidetext image">Text 3</td>
<td>Some text to show</td>
</tr>
<tr>
<td class="hidetext image">Text 4</td>
<td>Some text to show</td>
</tr>
</table>
CSS:
.hidetext {text-indent:-9000px}
.image {background:url(http://www.madisoncopy.com/images/jpeg.jpg) no-repeat;}
See how in the left column the text does not show (but it is actually there just indented off the screen).
See this fiddle: http://jsfiddle.net/D297P/

Watir: How to access a table without an ID or NAME

I am trying to write my watir script to grab the following data (the table body headers and the table row data, but I am having trouble trying to figure out how to access the table. (Once I get that, teh rest is a piece of cake).
Can anyone come up with something that will help me access the table? It doesn't have a name or an ID...
<div id="income">
<table class="tHe" cellspacing="0">
<thead>
<tr>
<th id="companyLabel" class="tFirst" style="width:30%"> Quarter Ending </th>
<th id="201004" class="tFirst right">Apr 10 </th>
<th id="201001" class="tFirst right">Jan 10 </th>
<th id="200910" class="tFirst right">Oct 09 </th>
<th id="200907" class="tFirst right">Jul 09 </th>
<th id="200904" class="tFirst right">Apr 09 </th>
</tr>
</thead>
<tbody id="revenueBody">
<tr>
<td class="indtr">Totals</dfn></td>
<td class="right">2849.00</td>
<td class="right">3177.00</td>
<td class="right">5950.00</td>
<td class="right">4451.00</td>
<td class="right">3351.00</td>
</tr>
...

ie.table(:class=>'tHe') should work if there's no other tables with the same class name
ie.table(:after?, ie.div(:id, 'income')) should work if there's no other div with id 'income'
or ie.table(:index=>0) - you would need to check your page to see what the correct index value for your table is.

But wait, there is more! :)
browser.div(:id => "income").table(:class => 'tHe')
browser.div(:id => "income").table(:index => 1)
...

There is also XPath if you are stuck.
If you fire up the page and access it through Firebug or your browser's native developer tools, you can find the xpath expression for the table and then plug that into the Watir API call.
I think it was in later versions of Watir 1.5.x that support for advanced page querying came in (basically your problem, where there are no ID tags). This page on the watir wiki should help:
Ways Available To Identify HTML Element

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Formatting HTML Tables for Excel - excel

Related

Colspan not working properly using Python Pandas

Create Pyspark Dataframe from Key Value Pair log file with variable contents

VBA Excel get text inside HTMLObject

JavaFX hide text of column in tableview

Watir: How to access a table without an ID or NAME

Categories

Resources