Stemming a list of words in a dataframe

Stemming a list of words in a dataframe - python-3.x

<html>
<body>
<table border=1>
<tr>
<th>label</th>
<th>rev</th>
</tr>
<tr>
<td>0</td>
<td>[ story man unnatural feelings pig...] </td>
</tr>
<tr>
<td>0</td>
<td>[ airport starts brand new luxury ...] </td></tr>
<tr>
<td>0</td>
<td>[ film lacked something couldnt pu...] </td></tr>
<tr>
<td>0</td>
<td>[ sorry everyone know supposed art...] </td></tr>
<tr>
<td>0</td>
<td>[ little parents took along theate..]</td></tr>
</table>
</body>
</html>
IMAGE-> [1]: https://i.stack.imgur.com/j2EAK.jpg
My dataframe looks like above, I tried the below code to stem it :
from nltk.stem.porter import PorterStemmer
ps=PorterStemmer()
da.rev=[ps.stem(word) for word in da.loc[:,'rev']]
but it was resulting in the same data frame again, can't point out what went wrong.
Any help will be dearly appreciated. Thank you for your time

Hard to say without seeing your exact code but if each item in the series is a list of strings you could try
da.rev.apply(lambda x: [ps.stem(word) for word in x])

Related

Colspan not working properly using Python Pandas

I have some data that i need to convert into an Excel sheet which needs to look like this at the end of the day:
I've tried the following code:
import pandas as pd
result = pd.read_html(
"""<table>
<tr>
<th colspan="2">Status N</th>
</tr>
<tr>
<td style="font-weight: bold;">Merchant</td>
<td>Count</td>
</tr>
<tr>
<td>John Doe</td>
<td>10</td>
</tr>
</table>"""
)
writer = pd.ExcelWriter('out/test_pd.xlsx', engine='xlsxwriter')
print(result[0])
result[0].to_excel(writer, sheet_name='Sheet1', index=False)
writer.save()
This issue here is that the colspan is not working properly. The output is like this instead:
Can someone help me on how i can use colspan on Python Pandas?
It would be better if i don't have to use read_html() and do it directly on python code but if it's not possible, i can use read_html()

Since Pandas can't recognize the values and columns title you should introduce them, if you convert HTML text to the standard format, then pandas can handle it correctly. use thead and tbody to split header and values like this.
result = pd.read_html("""
<table>
<thead>
<tr>
<th colspan="2">Status N</th>
</tr>
<tr>
<td style="font-weight: bold;">Merchant</td>
<td>Count</td>
</tr>
</thead>
<tbody>
<tr>
<td>John Doe</td>
<td>10</td>
</tr>
</tbody>
</table>
"""
)
To write Dataframe to an excel file you can use the pandas to_excel method.
result[0].to_excel("out.xlsx")

Vim Replace On Each Line With Different String

I've got a document from a client which has a GIANT table in it which looks something like this:
<table id="someid">
<tr>
<td>Product</td>
<td class="product1">Product 1</td>
<td class="product2">Product 2</td>
<td class="product3">Product 3</td>
<td class="product4">Product 4</td>
<td class="product5">Product 5</td>
</tr>
<tr>
<td>Boiling Point</td>
<td>72</td>
<td>91</td>
<td>38</td>
<td>21</td>
<td>41</td>
</tr>
[ 45 more rows here]
</table>
Only there are actually 15 products, and instead of "product1" and "product2" they have the actual name of the products as their preexisting classes.
The client has asked me to add classes to each of the appropriate td elements so that they are matched up with their product like class="product1" added to each 2nd td for every row.
Everything is static... I'm wondering if there's a quick way to do this in vim? Is it possible to tell vim to add a string to a certain position on every 18th line? Or am I stuck manually adding all the classes?

Suppose that the relation between your class name and the line number can be described as an expression, you can use :[range]substitute and s/\= to replace lines with any expression. For example,
1,10s/<td/\=submatch(0) . ' class="product' . (line('.') % 5 - 1) . '"'
will change
<tr>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td></td>
<td></td>
<td></td>
</tr>
to
<tr>
<td class="product1"></td>
<td class="product2"></td>
<td class="product3"></td>
</tr>
<tr>
<td class="product1"></td>
<td class="product2"></td>
<td class="product3"></td>
</tr>
You can adapt those arguments in the above command to suit your needs.
For complex replacement, you can define a helper function in a temporary vim file such as foo.vim
function! GetClassName()
let order = line('.') % 5
if order == 1
return 'a'
endif
if order == 2
return 'b'
endif
endfunction
then source it by :source %.
Next, you can switch to your file and use it as follows
1,10s/^\s\+<td/\=submatch(0) . ' class="' . GetClassName() . '"'

select specific rows with class names

I am parsing an HTML which has bunch of rows that I want to select. Here are example of those rows
<tr class="constantstring-randomvalue1-row" onmouseover="this.className='constantstring-light-row-cp-h'" onmouseout="this.className='constantstring-randomvalue1-row'" onclick="if(ignoreOnClick==false)window.location='find.ashx?cv3dsw'" valign="top">
<tr class="constantstring-randomvalue1-row" onmouseover="this.className='constantstring-light-row-cp-h'" onmouseout="this.className='constantstring-randomvalue1-row'" onclick="if(ignoreOnClick==false)window.location='find.ashx?cv3dsw'" valign="top">
<tr class="constantstring-randomvalue2-row-2" onmouseover="this.className='constantstring-light-row-cp-h'" onmouseout="this.className='constantstring-randomvalue2-row-2'" onclick="if(ignoreOnClick==false)window.location='find.ashx?cv3dsw'" valign="top">
<tr class="constantstring-randomvalue2-row-2" onmouseover="this.className='constantstring-light-row-cp-h'" onmouseout="this.className='constantstring-randomvalue2-row-2'" onclick="if(ignoreOnClick==false)window.location='find.ashx?cv3dsw'" valign="top">
What i was trying to do is use BeautifulSoup4 and find_all using a regex find_all(re.compile(regext))
However, the problem is that i am unable to come up with a good regext which will select all rows that i am interested in.
all the rows that i want start with constantstring-. I don't care what it is followed by. What would be the proper way, should i use re.compile and if so, what will be the correct regex?

If you want to accomplish this with RE the following will do, I added an extra row to demo it not picking up the final row.
http://rextester.com/OSSFB8621
from bs4 import BeautifulSoup
import re
html ="""
<tr class="constantstring-randomvalue1-row" onmouseover="this.className='constantstring-light-row-cp-h'" onmouseout="this.className='constantstring-randomvalue1-row'" onclick="if(ignoreOnClick==false)window.location='find.ashx?cv3dsw'" valign="top">
<tr class="constantstring-randomvalue1-row" onmouseover="this.className='constantstring-light-row-cp-h'" onmouseout="this.className='constantstring-randomvalue1-row'" onclick="if(ignoreOnClick==false)window.location='find.ashx?cv3dsw'" valign="top">
<tr class="constantstring-randomvalue2-row-2" onmouseover="this.className='constantstring-light-row-cp-h'" onmouseout="this.className='constantstring-randomvalue2-row-2'" onclick="if(ignoreOnClick==false)window.location='find.ashx?cv3dsw'" valign="top">
<tr class="constantstring-randomvalue2-row-2" onmouseover="this.className='constantstring-light-row-cp-h'" onmouseout="this.className='constantstring-randomvalue2-row-2'" onclick="if(ignoreOnClick==false)window.location='find.ashx?cv3dsw'" valign="top">
<tr class="axcconstantstring-randomvalue2-row-2" onmouseover="this.className='constantstring-light-row-cp-h'" onmouseout="this.className='constantstring-randomvalue2-row-2'" onclick="if(ignoreOnClick==false)window.location='find.ashx?cv3dsw'" valign="top">
"""
bs = BeautifulSoup(html,'lxml')
for tr in bs.find_all("tr", {"class" : re.compile('^(constantstring)')}):
print(tr)

Instead of regex you can use in-built string methods for the same task. Like,
rows = soup.find_all('tr)'
selected_rows = [i for i in rows if str(i).startswith('tr class="constantstring-randomvalue')]
If you miss str() the if condition will fail.
Hope this helps! Cheers!

COUNTIFS function excel with multiple values

I have data in table as follow:
<table>
<tr>
<th>Priority</th>
<th>State</th>
</tr>
<tr>
<td>High</td>
<td>Work - QA</td>
</tr>
<tr>
<td>High</td>
<td>In progress</td>
</tr>
<tr>
<td>Low</td>
<td>Investigating</td>
</tr>
<tr>
<td>High</td>
<td>Ready for Deployment - QA</td>
</tr>
<tr>
<td>Critical</td>
<td>Investigating</td>
</tr>
<tr>
<td>Critical</td>
<td>Work - QA</td>
</tr>
<tr>
<td>Critical</td>
<td>Work - Dev</td>
</tr>
</table>
wanted to generate report as follow:
Summary Report for DEV
------------------------
Critical 1
High 2
Low 1
The defects in following status represents that they belong to DEV "Work - Dev, Investigating, In progress". So when I create summary report I have to take count of all the defects belonging to above state field.
I did try something like this but failed.
=COUNTIFS(Table1[[#All],[Priority]],"High",Table1[[#All],[State]],{"Investigation", "In progress", "Work - Dev"})

So close. Your formula returns an array {0,1,1}, so you need to SUM it:
=SUM(COUNTIFS(Table1[[#All],[Priority]],"High",Table1[[#All],[State]],{"Investigation","In progress","Work - Dev"}))

JavaFX hide text of column in tableview

I have a tableview and I want to show an image in the first column. My problem is I can't sort the column then. My idea is to set text in the column too and hide the text so it is only for the correct sorting set. Is there a way to do that? Or what other solutions are possible for my problem?

I think this is the perfect example what you wants to do.Still let me know if you have any issue.
Check here

I would have a look at TableColumn.setCellValueFactory() and TableColumn.setCellFactory(). The further is used to provide the actual cell value (used for sorting!), the latter is used to provide the rendering.
In other words: If you need the sort order, you must not change the content, but only the Cell rendering. The methods mentioned above let you do exactly this.
Hope that helps ...

You could do it with just CSS using text-indent. You would also need to set the image as a css background. You did not provide an code of your table, but below is some example:
HTML:
<table width="100%" border="1" cellspacing="1" cellpadding="1">
<tr>
<td class="hidetext image">Text 1</td>
<td>Some text to show</td>
</tr>
<tr>
<td class="hidetext image">Text 2</td>
<td>Some text to show</td>
</tr>
<tr>
<td class="hidetext image">Text 3</td>
<td>Some text to show</td>
</tr>
<tr>
<td class="hidetext image">Text 4</td>
<td>Some text to show</td>
</tr>
</table>
CSS:
.hidetext {text-indent:-9000px}
.image {background:url(http://www.madisoncopy.com/images/jpeg.jpg) no-repeat;}
See how in the left column the text does not show (but it is actually there just indented off the screen).
See this fiddle: http://jsfiddle.net/D297P/

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Stemming a list of words in a dataframe - python-3.x

Hard to say without seeing your exact code but if each item in the series is a list of strings you could try da.rev.apply(lambda x: [ps.stem(word) for word in x])

Related

Colspan not working properly using Python Pandas

Vim Replace On Each Line With Different String

select specific rows with class names

COUNTIFS function excel with multiple values

JavaFX hide text of column in tableview

Categories

Resources