I've got a document from a client which has a GIANT table in it which looks something like this:
<table id="someid">
<tr>
<td>Product</td>
<td class="product1">Product 1</td>
<td class="product2">Product 2</td>
<td class="product3">Product 3</td>
<td class="product4">Product 4</td>
<td class="product5">Product 5</td>
</tr>
<tr>
<td>Boiling Point</td>
<td>72</td>
<td>91</td>
<td>38</td>
<td>21</td>
<td>41</td>
</tr>
[ 45 more rows here]
</table>
Only there are actually 15 products, and instead of "product1" and "product2" they have the actual name of the products as their preexisting classes.
The client has asked me to add classes to each of the appropriate td elements so that they are matched up with their product like class="product1" added to each 2nd td for every row.
Everything is static... I'm wondering if there's a quick way to do this in vim? Is it possible to tell vim to add a string to a certain position on every 18th line? Or am I stuck manually adding all the classes?
Suppose that the relation between your class name and the line number can be described as an expression, you can use :[range]substitute and s/\= to replace lines with any expression. For example,
1,10s/<td/\=submatch(0) . ' class="product' . (line('.') % 5 - 1) . '"'
will change
<tr>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td></td>
<td></td>
<td></td>
</tr>
to
<tr>
<td class="product1"></td>
<td class="product2"></td>
<td class="product3"></td>
</tr>
<tr>
<td class="product1"></td>
<td class="product2"></td>
<td class="product3"></td>
</tr>
You can adapt those arguments in the above command to suit your needs.
For complex replacement, you can define a helper function in a temporary vim file such as foo.vim
function! GetClassName()
let order = line('.') % 5
if order == 1
return 'a'
endif
if order == 2
return 'b'
endif
endfunction
then source it by :source %.
Next, you can switch to your file and use it as follows
1,10s/^\s\+<td/\=submatch(0) . ' class="' . GetClassName() . '"'
Related
<html>
<body>
<table border=1>
<tr>
<th>label</th>
<th>rev</th>
</tr>
<tr>
<td>0</td>
<td>[ story man unnatural feelings pig...] </td>
</tr>
<tr>
<td>0</td>
<td>[ airport starts brand new luxury ...] </td></tr>
<tr>
<td>0</td>
<td>[ film lacked something couldnt pu...] </td></tr>
<tr>
<td>0</td>
<td>[ sorry everyone know supposed art...] </td></tr>
<tr>
<td>0</td>
<td>[ little parents took along theate..]</td></tr>
</table>
</body>
</html>
IMAGE-> [1]: https://i.stack.imgur.com/j2EAK.jpg
My dataframe looks like above, I tried the below code to stem it :
from nltk.stem.porter import PorterStemmer
ps=PorterStemmer()
da.rev=[ps.stem(word) for word in da.loc[:,'rev']]
but it was resulting in the same data frame again, can't point out what went wrong.
Any help will be dearly appreciated. Thank you for your time
Hard to say without seeing your exact code but if each item in the series is a list of strings you could try
da.rev.apply(lambda x: [ps.stem(word) for word in x])
I have data in table as follow:
<table>
<tr>
<th>Priority</th>
<th>State</th>
</tr>
<tr>
<td>High</td>
<td>Work - QA</td>
</tr>
<tr>
<td>High</td>
<td>In progress</td>
</tr>
<tr>
<td>Low</td>
<td>Investigating</td>
</tr>
<tr>
<td>High</td>
<td>Ready for Deployment - QA</td>
</tr>
<tr>
<td>Critical</td>
<td>Investigating</td>
</tr>
<tr>
<td>Critical</td>
<td>Work - QA</td>
</tr>
<tr>
<td>Critical</td>
<td>Work - Dev</td>
</tr>
</table>
wanted to generate report as follow:
Summary Report for DEV
------------------------
Critical 1
High 2
Low 1
The defects in following status represents that they belong to DEV "Work - Dev, Investigating, In progress". So when I create summary report I have to take count of all the defects belonging to above state field.
I did try something like this but failed.
=COUNTIFS(Table1[[#All],[Priority]],"High",Table1[[#All],[State]],{"Investigation", "In progress", "Work - Dev"})
So close. Your formula returns an array {0,1,1}, so you need to SUM it:
=SUM(COUNTIFS(Table1[[#All],[Priority]],"High",Table1[[#All],[State]],{"Investigation","In progress","Work - Dev"}))
I know this is really easy for some of you out there. But I have been going deep on the internet and I can not find an answer. I need to get the company name that is inside the
tbody tr td a eBay-tradera.com
and
td class="bS aR" 970,80
/td /tr /tbody
<tbody id="matrix1_group0">
<tr class="oR" onmouseover="onMouseOver(this, false)" onmouseout="onMouseOut(this, false)" onclick="onClick(this, false)">
<td class="bS"> </td>
<td>
<a href="aProgramInfoApplyRead.action?programId=175&affiliateId=2014848" title="http://www.tradera.com/" target="_blank">
eBay-Tradera.com
</a>
</td>
<td class="aR">
175</td>
<td class="bS aR">0</td><td class="bS aR">0</td><td class="bS aR">187</td>
<td class="aR">0,00%</td><td class="bS aR">124</td>
<td class="aR">0,00%</td>
<td class="bS aR">26</td>
<td class="aR">20,97%</td>
<td class="bS aR">32</td>
<td class="aR">60,80</td>
<td class="aR">25,81%</td>
<td class="bS aR">5 102,00</td>
<td class="bS aR">0,00</td>
<td class="aR">0,00</td>
<td class="bS aR">
970,80
</td>
</tr>
</tbody>
This is my code, where I only try to get the a tag to start of with but I cant get that to work either
Set TDelements = document.getElementById("matrix1_group0").document.getElementsbytagname("a").innerHTML
r = 0
C = 0
For Each TDelement In TDelements
Blad1.Range("A1").Offset(r, C).Value = TDelement.innerText
r = r + 1
Next
Thanks on beforehand I know that this might be to simple. But I hope that other people might have the same issue and this will be helpful for them as well. The reason for the "r = r + 1" is because there are many more companies on this list. I just wanted to make it as easy as I could. Thanks again!
You will need to specify the element location in the table. Ebay seems to be obfuscating the class-names so we cannot rely on those being consistent. Nor would I usually rely on the elements by their table index being consistent but I don't see any way around this.
I am assuming that this is the HTML document you are searching
<tbody id="matrix1_group0">
<tr class="oR" onmouseover="onMouseOver(this, false)" onmouseout="onMouseOut(this, false)" onclick="onClick(this, false)">
<td class="bS"> </td>
<td>
<a href="aProgramInfoApplyRead.action?programId=175&affiliateId=2014848" title="http://www.tradera.com/" target="_blank">
eBay-Tradera.com <!-- <=== You want this? -->
</a>
</td>
<!-- ... -->
</tr>
<!-- ... -->
</tbody>
We can ignore the rest of the document as the table element has an ID. In short, we assume that
.getElementById("matrix1_group0").getElementsByTagName("TR")
will return a collection of html row objects sorted by their appearance.
Set matrix = document.getElementById("matrix1_group0")
Set firstRow = matrix.getElementsByTagName("TR")(1)
Set firstRowSecondCell = firstRow.getElementsByTagName("TD")(2)
traderaName = firstRowSecondCell.innerText
Of course you could inline this all as
document.getElementById("matrix1_group0").getElementsByTagName("TR")(1).getElementsByTagName("TD")(2).innerText
but that would make debugging harder. Also if the web-page is ever presented to you in a different format then this won't work. Ebay is deliberately making it hard for you to scrape data off of it for security.
With only the HTML you have shown you can use CSS selectors to obtain these:
a[href*='aProgramInfoApplyRead.action?programId']
Which says a tag with attribute href that contains the string 'aProgramInfoApplyRead.action?programId'. This matches two elements but the first is the one you want.
CSS Selector:
VBA:
You can use .querySelector method of .document to retrieve the first match
Debug.Print ie.document.querySelector("a[href*='aProgramInfoApplyRead.action?programId']").innerText
I need to store two text value and use them as numbers for subtraction:
<span id="user-account-balance">593 455,07</span> $
<span id="user-account-balance-points">12454</span> P
I need to subtract both values, but it doesn't work for me:
<tr>
<td>storeText</td>
<td>//a/span[#id='user-account-balance']</td>
<td>a</td>
</tr>
<tr>
<td>storeEval</td>
<td>storedVars['a'].match(/^\d+/);</td>
<td>one</td>
</tr>
<tr>
<td>storeText</td>
<td>//span[#id='user-account-balance-points']</td>
<td>c</td>
</tr>
<tr>
<td>storeEval</td>
<td>storedVars['c'].match(/^\d+/);</td>
<td>two</td>
</tr>
<tr>
<td>store</td>
<td>javascript{storedVars['one']+storedVars['two']}</td>
<td>r</td>
</tr>
<tr>
<td>echo</td>
<td>${r}</td>
<td></td>
</tr>
The result is [info] echo: 59312454. So there are two problems, the first number is cut out after the space and it doesn't even subtract anyway
So first, your regex
/^\d+/
will only capture the uninterrupted sequence of numbers at the very beginning of the string. You will need to modify the first .match() regex to handle textual numbers with spaces as thousand separators and commas as decimal separators. Documentation on match says that a //g regex will return an array of strings, so you could probably do something like
storedVars['a'].match(/\d+(,\d\d)?/g).join('').replace(',', '.');
to capture the currency string and store it in a string-to-number convertible manner. Then your last JavaScript expression would be something like
storeEval
parseFloat(storedVars['one']) + parseInt(storedVars['two'])
This worked for me when total was stored as $xxx,xxx.xx:
<tr>
<td>storeText</td>
<td>xpath=.//*[#id='table-header']/tbody/tr[1]/td[5]/b</td>
<td>Total1</td>
</tr>
<tr>
<td>storeEval</td>
<td>storedVars['Total'].replace(',', '').replace('$', '');</td>
<td>Amount1</td>
</tr>
I have a page in asp that makes a xls table, however when open the table all the rows are stuffed into the default column width which I would like to set.
My table looks something like this:
<table>
<thead>
'A for loop makes a series of th
</thead>
'another loop pulls db values
<tr><td>value1</td><td>value2</td> 'etc </tr>
</table>
I have tried the following to set the space
width="3.29in"
&nsp; spam (barbaric but sometimes effective)
width="400px"
style="width:300px"
none of the above seem to work.
Additionally here is my header asp incase its relevant
Response.Clear()
Response.Buffer = False
Response.ContentType = "application/vnd.ms-excel"
Response.AddHeader "Content-Disposition", "attachment; filename=blah.xls"
Also on a side note for some reason when I have a dollar value printed as such
<td>$<%=dbvalue%></td>
for some reason this yields '$dollar value and I am not sure how to nuke the single quote.
Do you need the thead tag? Something like this should work:
<html>
<body>
<h1>Report Title</h1>
<table >
<tr>
<th style="width : 300px">header1</th>
<th style="width : 100px">header2</th>
<th style="width : 200px">header..</th>
<th style="width : 300px">....</th>
</tr>
<tr class="row1">
<td >value1</td>
<td >value2</td>
<td >value..</td>
<td >....</td>
</tr>
....
Optionally you can put the table row with th tags inside a <thead> tag