In Excel How to copy column headers - excel

In Excel how to copy column headers for all the rows which have a Yes against them and show them as comma separated values at the end of the row. For eg: as in the answer column below:
<table width="652">
<tbody>
<tr>
<td width="64"> </td>
<td width="64">petrol</td>
<td width="64">diesel</td>
<td width="64">oil</td>
<td width="64">screw</td>
<td width="64">nut</td>
<td width="64">bolt</td>
<td width="64">engine</td>
<td width="140">Answer</td>
</tr>
<tr>
<td>car</td>
<td>Yes</td>
<td> </td>
<td>Yes</td>
<td>Yes</td>
<td> </td>
<td> </td>
<td>Yes</td>
<td>Petrol,oil,screw,engine</td>
</tr>
<tr>
<td>bus</td>
<td>Yes</td>
<td> </td>
<td>Yes</td>
<td> </td>
<td> </td>
<td> </td>
<td>Yes</td>
<td>Petrol,oil,engine</td>
</tr>
<tr>
<td>bike</td>
<td> </td>
<td>Yes</td>
<td>Yes</td>
<td> </td>
<td>Yes</td>
<td> </td>
<td> </td>
<td>Diesel,oil,nut</td>
</tr>
<tr>
<td>scooter</td>
<td> </td>
<td>Yes</td>
<td> </td>
<td> </td>
<td> </td>
<td> </td>
<td> </td>
<td>Diesel</td>
</tr>
</tbody>
</table>
<table>
<tbody>
<tr>
<td> </td>
<td> </td>
<td> </td>
<td> </td>
<td> </td>
<td> </td>
<td> </td>
<td> </td>
<td> </td>
</tr>
<tr>
<td> </td>
<td> </td>
<td> </td>
<td> </td>
<td> </td>
<td> </td>
<td> </td>
<td> </td>
<td> </td>
</tr>
<tr>
<td> </td>
<td> </td>
<td> </td>
<td> </td>
<td> </td>
<td> </td>
<td> </td>
<td> </td>
<td> </td>
</tr>
<tr>
<td> </td>
<td> </td>
<td> </td>
<td> </td>
<td> </td>
<td> </td>
<td> </td>
<td> </td>
<td> </td>
</tr>
<tr>
<td> </td>
<td> </td>
<td> </td>
<td> </td>
<td> </td>
<td> </td>
<td> </td>
<td> </td>
<td> </td>
</tr>
<tr>
<td> </td>
<td> </td>
<td> </td>
<td> </td>
<td> </td>
<td> </td>
<td> </td>
<td> </td>
<td> </td>
</tr>
</tbody>
</table>

Related

How do I parse this html with Python lxml & xpath that finds the parent table of a specific span id?

Here is the HTML I don't have any control over. This is condensed HTML of the real page.
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<title>Little League</title>
</head>
<body>
<table>
<span>lot of unrelated text</span>
</table>
<table>
<span>lot of unrelated text</span>
</table>
<table>
<span>lot of unrelated text</span>
</table>
<table>
<tbody>
<tr>
<td class="rightTD">
<p>
<span id="teams_players">Player Teams</span>
</p>
</td>
</tr>
<tr>
<td>
<table border="1" cellspacing="0" cellpadding="0" class="tableBorder table table-bordered" width="100%">
<tbody>
<tr>
<td>
<table border="0" width="100%" class="tableData">
<tbody>
<tr id="team_listings">
<td colspan="3">Team Listings
<br>
<br>
</td>
</tr>
<tr>
<td>(a) </td>
<td colspan="2">Team Name </td>
</tr>
<tr>
<td></td>
<td colspan="2">
<span class="blue_color">Foxes</span>
</td>
</tr>
<tr>
<td>(b) </td>
<td colspan="2">Team Rank</td>
</tr>
<tr>
<td></td>
<td colspan="2">
<span class="blue_color">1</span>
</td>
</tr>
<tr>
<td>(c) </td>
<td colspan="2">Team Location
</td>
</tr>
<tr>
<td></td>
<td colspan="2">
<table width="100%">
<tbody>
<tr>
<td>City:
<br>
<span class="blue_color">Tualatin</span>
</td>
<td>State:
<br>
<span class="blue_colorLined"></span>
<br>
<span class="blue_color">Oregon</span>
</td>
<td>Country:
<br>
<span class="blue_colorLined"></span>
<br>
<span class="blue_color">United States</span>
</td>
</tr>
</tbody>
</table>
</td>
</tr>
</tbody>
</table>
</td>
</tr>
</tbody>
</table>
<br>
<table border="1" cellspacing="0" cellpadding="0" class="tableBorder table table-bordered" width="100%">
<tbody>
<tr>
<td>
<table border="0" width="100%" class="tableData">
<tbody>
<tr>
<td>(a) </td>
<td colspan="2">Team Name </td>
</tr>
<tr>
<td></td>
<td colspan="2">
<span class="blue_color">Tigers</span>
</td>
</tr>
<tr>
<td>(b) </td>
<td colspan="2">Team Rank</td>
</tr>
<tr>
<td></td>
<td colspan="2">
<span class="blue_color">3</span>
</td>
</tr>
<tr>
<td>(c) </td>
<td colspan="2">Team Location
</td>
</tr>
<tr>
<td></td>
<td colspan="2">
<table width="100%">
<tbody>
<tr>
<td>City:
<br>
<span class="blue_color">Tigard</span>
</td>
<td>State:
<br>
<span class="blue_colorLined"></span>
<br>
<span class="blue_color">Oregon</span>
</td>
<td>Country:
<br>
<span class="blue_colorLined"></span>
<br>
<span class="blue_color">United States</span>
</td>
</tr>
</tbody>
</table>
</td>
</tr>
</tbody>
</table>
</td>
</tr>
</tbody>
</table>
</td>
</tr>
</tbody>
</table>
</body>
</html>
I am trying to get to the table tag immediately preceding the span tag with id team_players.
I tried these but failed -
//table/span[#id="teams_players"]
ancestor::table[span[#id="teams_players"][position() = 1]]
This works but is not elegant and I prefer not to hardcode it -
//span[#id="teams_players"]/../../../../..
While //table[#class="tableData"] this might seem like it should work, there are many such tables in the HTML that has the same class with unrelated data. So this is ruled out.
Here is the code so far with my attempts (definitely not efficient, once I find a way of fetching both tables, I plan on looping through them to extract the data -
def parse_team():
# team data structure
teams = []
team_dict = { 'team': '', 'rank': '', 'location': { 'city': '', 'state': '', 'country': '' } }
filename = f'team.html'
f = open(filename, encoding="utf8").read()
parser = etree.HTMLParser()
tree = etree.parse(StringIO(f), parser)
# fetch the table dom and parse each team table
# fetch the parent table that contains teams_players span id
team_tables = tree.xpath('ancestor::table[span[#id="teams_players"][position() = 1]]')
print(team_tables)
root_tables = tree.xpath('//table/span[#id="teams_players"]')
print("root tables", root_tables)
# this provides each team table but in full html, the same class is being used for other unrelated data
name = tree.xpath('//table[#class="tableData"]')
print(name)
eachvaltr = name[0].xpath('.//tr')
teamname = name[0].xpath('.//td[contains(text(),"Team Name")]//parent::tr/following-sibling::tr[1]//span[#class="blue_color"]/text()')
print("teamname", teamname)
teamrank = name[0].xpath(
'.//td[contains(text(),"Team Rank")]//parent::tr/following-sibling::tr[1]//span[#class="blue_color"]/text()')
print("teamrank", teamrank)
city = name[0].xpath(
'.//td[contains(text(),"City")]//span[#class="blue_color"]/text()')
state = name[0].xpath(
'.//td[contains(text(),"State")]//span[#class="blue_color"]/text()')
country = name[0].xpath(
'.//td[contains(text(),"Country")]//span[#class="blue_color"]/text()')
print(city[0], state[0], country[0])
team_dict['team'] = teamname
team_dict['rank'] = teamrank
team_dict['location']['city'] = city[0]
team_dict['location']['state'] = state[0]
team_dict['location']['country'] = country[0]
print(team_dict)
Desired output is a list of teams where each team is a dict.
[{'team': ['Foxes'], 'rank': ['1'], 'location': {'city': 'Tualatin', 'state': 'Oregon', 'country': 'United States'}}]
//table[.//span[#id="teams_players"]]
or
//span[#id="teams_players"]/ancestor::table

Need the full correct html code for this Table

Image table
Please write full html codes for this table. You may see the Table image from above via the link.
<table border="1" width="800">
<tr>
<th>Level1</th>
<th>Level2</th>
<th>Level2</th>
<th>Info</th>
<th>Name</th>
</tr>
<tr>
<td rowspan="6">System</td>
</tr>
<tr>
<td rowspan="4">System Apps</td>
<td rowspan="2">System Memory</td>
</tr>
<tr>
<td rowspan="3">SystemEnv</td>
<td rowspan="1">SystemEnv2</td>
<td rowspan="2">Memeory Test</td>
</tr>
Here is the table code :
<table border="2">
<tr>
<th>Level 1</th>
<th>Level 2</th>
<th>Level 3</th>
<th>info</th>
<th>Name</th>
</tr>
<tr>
<td rowSpan="6">System</td>
<td rowSpan="4">System apps</td>
<td rowSpan="3">SystemEnv</td>
<td>App Text</td>
<td>foo</td>
</tr>
<tr>
<td>App memory</td>
<td>foo</td>
</tr>
<tr>
<td>App test</td>
<td>bar</td>
</tr>
<tr>
<td>Systemenv2</td>
<td>App test</td>
<td>bar</td>
</tr>
<tr>
<td rowSpan="2">System Memory</td>
<td rowSpan="2">Memory test</td>
<td>memory func</td>
<td>foo</td>
</tr>
<tr>
<td>Memory Func</td>
<td>foo</td>
</tr>
</table>

beautifulsoup for looping and getting text and Href

I'm in a bit of a quinch here:
its an ASP site which is really messy that I am trying to get data from:
I'm trying to use a for loop to get an href and the text of all the rows of the 4th table that is on the site, so I first did:
table = soup.findAll('table')[3]
Then from this table I need to get all text inside the <tr> tags and the href's of the <a> inside.
i tried something like this:
for product in table.findAll('tbody'):
product_title = product.find('tr').text
product_link = product.find('a')['href']
print (product_title, product_link)
But I get nothing in return
The table Im working on:
<tr bgcolor="#EFEFEF">
<td>
<a href="free.asp?detail=hide&c_id=4342141">
<img align="absmiddle" border="0" hspace="0" src="pic/bullet.gif" vspace="0"/>
</a>
</td>
<td>
4342141
</td>
<td width="10">
</td>
<td>
25.07.2018 09:00
</td>
<td width="10">
</td>
<td>
Ankara
</td>
<td width="10">
-
</td>
<td>
Konya
</td>
<td colspan="2">
</td>
</tr>
<tr bgcolor="#EFEFEF" height="3">
<td colspan="10">
</td>
</tr>
<tr bgcolor="#FFFFFF" height="1">
<td colspan="10">
</td>
</tr>
<tr bgcolor="#DDDDDD" height="6">
<td colspan="10">
</td>
</tr>
<tr bgcolor="#FFFFFF" height="1">
<td colspan="10">
</td>
</tr>
<tr bgcolor="#DEE3E7" height="3">
<td colspan="10">
</td>
</tr>
<tr bgcolor="#DEE3E7">
<td>
<a href="free.asp?detail=hide&c_id=4134123">
<img align="absmiddle" border="0" hspace="0" src="pic/bullet.gif" vspace="0"/>
</a>
</td>
<td>
4134123
</td>
<td width="10">
</td>
<td>
26.07.2018 09:00
</td>
<td width="10">
</td>
<td>
Van
</td>
<td width="10">
-
</td>
<td>
Istanbul
</td>
<td colspan="2">
</td>
</tr>
Instead of extracting text from tbody from table, you can directly get all tr tags.
Based on your snippet you can refer to this code snippet for data extraction from table.
soup = BeautifulSoup(text, 'html.parser')
all_products = []
for tr in soup.find_all('tr'):
text = tr.get_text(separator=' ', strip=True)
if text:
a_tag = tr.find('a')
if a_tag:
product_link = a_tag.attrs['href']
all_text = text + ' ' + product_link
all_products.append(all_text.split(' '))
print(all_products)
Output is:
[['4342141', '25.07.2018', '09:00', 'Ankara', '-', 'Konya', 'free.asp?detail=hide&c_id=4342141'], ['4134123', '26.07.2018', '09:00', 'Van', '-', 'Istanbul', 'free.asp?detail=hide&c_id=4134123']]

Search for a value and move it to another spreadsheet

I have a little problem using Excel 2010. The problem is simple but I think I start in a wrong direction. Let's say I have a document with the following structure:
<table>
<thead>
<th> Col A</th>
<th> Col B</th>
<th> Col C</th>
</thead>
<tr>
<td> NONE</td>
<td> TEST</td>
<td> NONE</td>
<td> TEST2</td>
<td> TEST3</td>
<td> TEST4</td>
</tr>
<tr>
<td> NONE</td>
<td> TEST</td>
<td> NONE</td>
<td> TEST2</td>
<td> TEST3</td>
<td> TEST4</td>
</tr>
<tr>
<td> TEST</td>
<td> TEST</td>
<td> NONE</td>
<td> TEST2</td>
<td> TEST3</td>
<td> TEST4</td>
</tr>
<tbody>
</tbody>
</table>
I want to search for NONE and move it to another spreadsheet. I want to do this with formula. I tried with SEACH but I don't know how to move it to another spreadsheet.

Scala find location of string in a string

I have this string:
var htmlString;
Assigned to:
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" >
<html>
<head>
<title>Payment Receipt</title>
<link rel="stylesheet" type="text/css" href="content/PaymentForm.css">
<style type="text/css">
</style>
<meta content='text/html; charset=UTF-8' http-equiv='Content-Type'/>
</head>
<body>
<div id="divPageOuter" class="PageOuter">
<div id="divPage" class="Page">
<!--[1]-->
<div id="divThankYou">
Thank you for your order!
</div>
<hr class="HrTop">
<div id="divReceiptMsg">
You may print this receipt page for your records.
</div>
<div class="SectionBar">
Order Information
</div>
<table id="tablePaymentDetails1Rcpt">
<tr>
<td class="LabelColInfo1R">
Merchant:
</td>
<td class="DataColInfo1R">
<!--Merchant.val-->
Ryan
<!--end-->
</td>
</tr>
<tr>
<td class="LabelColInfo1R">
Description:
</td>
<td class="DataColInfo1R">
<!--x_description.val-->
Rasmussenpayment
<!--end-->
</td>
</tr>
</table>
<table id="tablePaymentDetails2Rcpt" cellspacing="0" cellpadding="0">
<tr>
<td id="tdPaymentDetails2Rcpt1">
<table>
<tr>
<td class="LabelColInfo1R">
Date/Time:
</td>
<td class="DataColInfo1R">
<!--Date/Time.val-->
09-Jul-2012 12:26:46 PM PT
<!--end-->
</td>
</tr>
<tr>
<td class="LabelColInfo1R">
Customer ID:
</td>
<td class="DataColInfo1R">
<!--x_cust_id.val-->
<!--end-->
</td>
</tr>
</table>
</td>
<td id="tdPaymentDetails2Rcpt2">
<table>
<tr>
<td class="LabelColInfo1R">
Invoice Number:
</td>
<td class="DataColInfo1R">
<!--x_invoice_num.val-->
176966244
<!--end-->
</td>
</tr>
</table>
</td>
</tr>
</table>
<hr id="hrBillingShippingBefore">
<table id="tableBillingShipping">
<tr>
<td id="tdBillingInformation">
<div class="Label">
Billing Information
</div>
<div id="divBillingInformation">
Test14 Rasmussen<br>
1234 test st<br>
San Diego, CA 92107 <br>
</div>
</td>
<td id="tdShippingInformation">
<div class="Label">
Shipping Information
</div>
<div id="divShippingInformation">
</div>
</td>
</tr>
</table>
<hr id="hrBillingShippingAfter">
<div id="divOrderDetailsBottomR">
<table id="tableOrderDetailsBottom">
<tr>
<td class="LabelColTotal">
Total:
</td>
<td class="DescrColTotal">
</td>
<td class="DataColTotal">
<!--x_amount.val-->
US $250.00
<!--end-->
</td>
</tr>
</table>
<!-- tableOrderDetailsBottom -->
</div>
<div id="divOrderDetailsBottomSpacerR">
</div>
<div class="SectionBar">
Visa ****0027
</div>
<table class="PaymentSectionTable" cellspacing="0" cellpadding="0">
<tr>
<td class="PaymentSection1">
<table>
<tr>
<td class="LabelColInfo2R">
Date/Time:
</td>
<td class="DataColInfo2R">
<!--Date/Time.1.val-->
09-Jul-2012 12:26:46 PM PT
<!--end-->
</td>
</tr>
<tr>
<td class="LabelColInfo2R">
Transaction ID:
</td>
<td class="DataColInfo2R">
<!--Transaction ID.1.val-->
2173493354
<!--end-->
</td>
</tr>
<tr>
<td class="LabelColInfo2R">
Authorization Code:
</td>
<td class="DataColInfo2R">
<!--x_auth_code.1.val-->
07I3DH
<!--end-->
</td>
</tr>
<tr>
<td class="LabelColInfo2R">
Payment Method:
</td>
<td class="DataColInfo2R">
<!--x_method.1.val-->
Visa ****0027
<!--end-->
</td>
</tr>
</table>
</td>
<td class="PaymentSection2">
<table>
</table>
</td>
</tr>
</table>
<div class="PaymentSectionSpacer">
</div>
</div>
<!-- entire BODY -->
</div>
<div class="PageAfter">
</div>
</body>
</html>
And I want to find the location of "x_auth_code.1.val" in the string. And then I want to obtain a string from the location plus a certain number of characters. The goal would be to return the Authorization code.
You can use indexOfSlice, and then slice() in StringOps
scala> val myString = "Hello World!"
myString: java.lang.String = Hello World!
scala> val index = myString.indexOfSlice("Wo")
index: Int = 6
scala> val slice = myString.slice(index, index+5)
slice: String = World
With your html string:
scala> htmlString.indexOfSlice("x_auth_code.1.val")
res4: Int = 2771
Why aren't you using an XML parser? Don't treat XML as strings -- you'll get bitten if you do.
Here's a regex to do it, but my advice is: DO NOT USE IT! Use xml tools.
"""\Qx_auth_code.1.val\E[^>]*>([^<]*)""".r.findFirstMatchIn(htmlString).map(_ group 1)

Resources