Sort Beautiful soup elements by text value - python-3.x

I've search about this on the internet on how I could do this but it seems to have a different type of question & goal.
So I have this soup right here. You can have a test on this code.
from bs4 import BeautifulSoup
html = '''
<table border="1" cellspacing="0" width="300">
<tbody><tr>
<td width="50%">lesson 1</td>
<td>lesson 7</td>
</tr>
<tr>
<td width="50%">lesson 2</td>
<td>lesson 8</td>
</tr>
<tr>
<td width="50%">lesson 3</td>
<td>lesson 9</td>
</tr>
<tr>
<td width="50%">lesson 4</td>
<td>lesson 10</td>
</tr>
<tr>
<td width="50%">lesson 5</td>
<td>lesson 11</td>
</tr>
<tr>
<td width="50%">lesson 6</td>
<td>lesson 12</td>
</tr>
</tbody></table>
'''
soup = BeautifulSoup(html,"html5lib")#use any parser you want.
for td in soup.find_all("td"):
print(td)# This outputs the not sorted <td> tags
Not sorted <td> tags.
<td width="50%">lesson 1</td>
<td>lesson 7</td>
<td width="50%">lesson 2</td>
<td>lesson 8</td>
<td width="50%">lesson 3</td>
<td>lesson 9</td>
<td width="50%">lesson 4</td>
<td>lesson 10</td>
<td width="50%">lesson 5</td>
<td>lesson 11</td>
<td width="50%">lesson 6</td>
<td>lesson 12</td>
Now you can see on the .text of every <td> tag there is a text for example "lesson 1" , "lesson 7", "lesson 2" and so on... Now what I wanted to have is sort this <td> tags by its text value also by the number. So I'd want to have an output like this.
<td width="50%">lesson 1</td>
<td width="50%">lesson 2</td>
<td width="50%">lesson 3</td>
<td width="50%">lesson 4</td>
<td width="50%">lesson 5</td>
<td width="50%">lesson 6</td>
<td>lesson 7</td>
<td>lesson 8</td>
<td>lesson 9</td>
<td>lesson 10</td>
<td>lesson 11</td>
<td>lesson 12</td>
Thank you so much! I really appreciate your help.

This should do the job and give you a list of the sorted td tags that contain an a tag.
from bs4 import BeautifulSoup
html = '''
<table border="1" cellspacing="0" width="300">
<tbody><tr>
<td width="50%">lesson 1</td>
<td>lesson 7</td>
</tr>
<tr>
<td width="50%">lesson 2</td>
<td>lesson 8</td>
</tr>
<tr>
<td width="50%">lesson 3</td>
<td>lesson 9</td>
</tr>
<tr>
<td width="50%">lesson 4</td>
<td>lesson 10</td>
</tr>
<tr>
<td width="50%">lesson 5</td>
<td>lesson 11</td>
</tr>
<tr>
<td width="50%">lesson 6</td>
<td>lesson 12</td>
</tr>
</tbody></table>
'''
soup = BeautifulSoup(html, 'html.parser')
# Function takes one <td> tag, finds it's child which is an <a> tag
# it then finds the text inside it and then splits it to get the number
# this is then returned to the sorted function as an int
def sort_soup(item):
item = list(item.children)[0].text
data = item.split(" ")
return int(data[1])
out = soup.findAll('td')
out = sorted(out, key= lambda elem: sort_soup(elem))
print(out)

Related

Python multiple if statements for petrol price movement

Situation:
I buy a certain volume of petrol at price X, which is usually sold within a 30 days. During these 30 days, I will need to adjust my petrol station gas price according to average State gas price. If State price goes up by 1.2% from my purchased price, I will set my price to +1% of my purchased price.
Here is an example:
<style type="text/css">
.tg {border-collapse:collapse;border-spacing:0;}
.tg td{border-color:black;border-style:solid;border-width:1px;font-family:Arial, sans-serif;font-size:14px;
overflow:hidden;padding:10px 5px;word-break:normal;}
.tg th{border-color:black;border-style:solid;border-width:1px;font-family:Arial, sans-serif;font-size:14px;
font-weight:normal;overflow:hidden;padding:10px 5px;word-break:normal;}
.tg .tg-wa1i{font-weight:bold;text-align:center;vertical-align:middle}
.tg .tg-7zrl{text-align:left;vertical-align:bottom}
.tg .tg-0lax{text-align:left;vertical-align:top}
</style>
<table class="tg">
<thead>
<tr>
<th class="tg-wa1i">Time</th>
<th class="tg-wa1i">Average state gas price</th>
<th class="tg-wa1i">Av state price change, %</th>
<th class="tg-wa1i">My price change, %</th>
<th class="tg-wa1i">My station gas price</th>
</tr>
</thead>
<tbody>
<tr>
<td class="tg-7zrl">1</td>
<td class="tg-0lax"> 4.231 </td>
<td class="tg-0lax"> - </td>
<td class="tg-0lax"> - </td>
<td class="tg-0lax"> 4.231 </td>
</tr>
<tr>
<td class="tg-7zrl">2</td>
<td class="tg-0lax"> 4.337 </td>
<td class="tg-0lax"> 2.5%</td>
<td class="tg-0lax"> 2.0%</td>
<td class="tg-0lax"> 4.316 </td>
</tr>
<tr>
<td class="tg-7zrl">3</td>
<td class="tg-0lax"> 4.437 </td>
<td class="tg-0lax"> 4.9%</td>
<td class="tg-0lax"> 4.0%</td>
<td class="tg-0lax"> 4.400 </td>
</tr>
<tr>
<td class="tg-7zrl">4</td>
<td class="tg-0lax"> 4.481 </td>
<td class="tg-0lax"> 5.9%</td>
<td class="tg-0lax"> 5.0%</td>
<td class="tg-0lax"> 4.443 </td>
</tr>
<tr>
<td class="tg-7zrl">5</td>
<td class="tg-0lax"> 4.571 </td>
<td class="tg-0lax"> 8.0%</td>
<td class="tg-0lax"> 8.0%</td>
<td class="tg-0lax"> 4.569 </td>
</tr>
<tr>
<td class="tg-7zrl">6</td>
<td class="tg-0lax"> 4.616 </td>
<td class="tg-0lax"> 9.1%</td>
<td class="tg-0lax"> 9.0%</td>
<td class="tg-0lax"> 4.612 </td>
</tr>
<tr>
<td class="tg-7zrl">7</td>
<td class="tg-0lax"> 4.709 </td>
<td class="tg-0lax"> 11.3%</td>
<td class="tg-0lax"> 11.0%</td>
<td class="tg-0lax"> 4.696 </td>
</tr>
<tr>
<td class="tg-7zrl">8</td>
<td class="tg-0lax"> 4.850 </td>
<td class="tg-0lax"> 14.6%</td>
<td class="tg-0lax"> 14.0%</td>
<td class="tg-0lax"> 4.823 </td>
</tr>
</tbody>
</table>
Here is a python code but it is very clunky, and if gas price moves up by 20%, then I have to write my_priceX up to 20 times. Is there a more elegant solution?
my_price1 = last_gas_price["price"]*1.01
my_price2 = last_gas_price["price"]*1.02
my_price3 = last_gas_price["price"]*1.03
my_price4 = last_gas_price["price"]*1.04
my_price5 = last_gas_price["price"]*1.05
if last_gas_price > my_price1:
adj_price(price = my_price1)
if last_gas_price > my_price2:
adj_price(price = my_price2)
if last_gas_price > my_price3:
adj_price(price = my_price3)
if last_gas_price > my_price4:
adj_price(price = my_price4)
if last_gas_price > my_price5:
adj_price(price = my_price5)

Check for vertical scrollbar in html local file

I have a VBA code that opens local HTML files using selenium like that
bot.Get "file:///" & Environ("USERPROFILE") & "\Desktop\OutputHTML.html"
Sometimes I got an error Out of memory as I am trying to take a screen shot for large size of the table (inside the HTML body)
Is it possible to check if there is a vertical scrollbar of the selenium browser and if there is a vertical scrollbar, I need to zoom in till this scrollbar disappears?
This is the code for you to test
Private bot As Selenium.ChromeDriver
Sub Test()
Dim eleTable As Selenium.WebElement, strPath As String, img As Image
Set bot = New Selenium.ChromeDriver
strPath = ThisWorkbook.Path & "\"
bot.Get "file:///" & Environ("USERPROFILE") & "\Desktop\OutputHTML.html"
Set eleTable = Nothing
On Error Resume Next
Set eleTable = bot.FindElementByCss(".table-striped")
On Error GoTo 0
If Not eleTable Is Nothing Then
Set img = eleTable.ScrollIntoView().TakeScreenshot()
img.SaveAs strPath & "Output.png"
Else
Debug.Print "Error In Taking Screenshot"
End If
End Sub
and this is example of the HTML local file
<bdo dir="rtl"><meta http-equiv="Content-Type" content="text/html; charset=windows-1256"/>
<style type="text/css">
th,td{
text-align: center !important;
}
</style>
<table class="table table-striped">
<tr class="">
<th width="5%" style="text-align:center;">م</th>
<th width="15%" style="text-align:center;">التاريخ</th>
<th style="text-align:center;">العملية</th>
</tr>
<tr class="resultRowOdd" align="center" style="line-height:25px;">
<td>1</td>
<td dir="ltr">2022-02-23</td>
<td style="padding-right:10px;">اعادة فتح ملف )مغلق(</td>
</tr>
<tr class="resultRowEven" align="center" style="line-height:25px;">
<td>2</td>
<td dir="ltr">2022-02-22</td>
<td style="padding-right:10px;">اغلاق ملف</td>
</tr>
<tr class="resultRowOdd" align="center" style="line-height:25px;">
<td>3</td>
<td dir="ltr">2022-02-22</td>
<td style="padding-right:10px;">ايصال توريد</td>
</tr>
<tr class="resultRowEven" align="center" style="line-height:25px;">
<td>4</td>
<td dir="ltr">2022-02-14</td>
<td style="padding-right:10px;">اذن صرف</td>
</tr>
<tr class="resultRowOdd" align="center" style="line-height:25px;">
<td>5</td>
<td dir="ltr">2022-02-14</td>
<td style="padding-right:10px;">استمارة صــرف</td>
</tr>
<tr class="resultRowEven" align="center" style="line-height:25px;">
<td>6</td>
<td dir="ltr">2022-01-19</td>
<td style="padding-right:10px;">اعادة فتح ملف )مغلق(</td>
</tr>
<tr class="resultRowOdd" align="center" style="line-height:25px;">
<td>7</td>
<td dir="ltr">2022-01-18</td>
<td style="padding-right:10px;">اغلاق ملف</td>
</tr>
<tr class="resultRowEven" align="center" style="line-height:25px;">
<td>8</td>
<td dir="ltr">2022-01-18</td>
<td style="padding-right:10px;">ايصال توريد</td>
</tr>
<tr class="resultRowOdd" align="center" style="line-height:25px;">
<td>9</td>
<td dir="ltr">2021-12-27</td>
<td style="padding-right:10px;">اعادة فتح ملف )مغلق(</td>
</tr>
<tr class="resultRowEven" align="center" style="line-height:25px;">
<td>10</td>
<td dir="ltr">2021-12-26</td>
<td style="padding-right:10px;">اغلاق ملف</td>
</tr>
<tr class="resultRowOdd" align="center" style="line-height:25px;">
<td>11</td>
<td dir="ltr">2021-12-26</td>
<td style="padding-right:10px;">ايصال توريد</td>
</tr>
<tr class="resultRowEven" align="center" style="line-height:25px;">
<td>12</td>
<td dir="ltr">2021-11-22</td>
<td style="padding-right:10px;">اعادة فتح ملف )مغلق(</td>
</tr>
<tr class="resultRowOdd" align="center" style="line-height:25px;">
<td>13</td>
<td dir="ltr">2021-11-21</td>
<td style="padding-right:10px;">اغلاق ملف</td>
</tr>
<tr class="resultRowEven" align="center" style="line-height:25px;">
<td>14</td>
<td dir="ltr">2021-11-21</td>
<td style="padding-right:10px;">ايصال توريد</td>
</tr>
<tr class="resultRowOdd" align="center" style="line-height:25px;">
<td>15</td>
<td dir="ltr">2021-11-15</td>
<td style="padding-right:10px;">اذن صرف</td>
</tr>
<tr class="resultRowEven" align="center" style="line-height:25px;">
<td>16</td>
<td dir="ltr">2021-11-15</td>
<td style="padding-right:10px;">استمارة صــرف</td>
</tr>
<tr class="resultRowOdd" align="center" style="line-height:25px;">
<td>17</td>
<td dir="ltr">2021-10-19</td>
<td style="padding-right:10px;">اعادة فتح ملف )مغلق(</td>
</tr>
<tr class="resultRowEven" align="center" style="line-height:25px;">
<td>18</td>
<td dir="ltr">2021-10-18</td>
<td style="padding-right:10px;">اغلاق ملف</td>
</tr>
<tr class="resultRowOdd" align="center" style="line-height:25px;">
<td>19</td>
<td dir="ltr">2021-10-18</td>
<td style="padding-right:10px;">ايصال توريد</td>
</tr>
<tr class="resultRowEven" align="center" style="line-height:25px;">
<td>20</td>
<td dir="ltr">2021-09-15</td>
<td style="padding-right:10px;">اعادة فتح ملف )مغلق(</td>
</tr>
<tr class="resultRowOdd" align="center" style="line-height:25px;">
<td>21</td>
<td dir="ltr">2021-09-14</td>
<td style="padding-right:10px;">اغلاق ملف</td>
</tr>
<tr class="resultRowEven" align="center" style="line-height:25px;">
<td>22</td>
<td dir="ltr">2021-09-14</td>
<td style="padding-right:10px;">ايصال توريد</td>
</tr>
<tr class="resultRowOdd" align="center" style="line-height:25px;">
<td>23</td>
<td dir="ltr">2021-09-12</td>
<td style="padding-right:10px;">اذن صرف</td>
</tr>
<tr class="resultRowEven" align="center" style="line-height:25px;">
<td>24</td>
<td dir="ltr">2021-09-12</td>
<td style="padding-right:10px;">استمارة صــرف</td>
</tr>
<tr class="resultRowOdd" align="center" style="line-height:25px;">
<td>25</td>
<td dir="ltr">2021-09-12</td>
<td style="padding-right:10px;">ايصال توريد</td>
</tr>
<tr class="resultRowEven" align="center" style="line-height:25px;">
<td>26</td>
<td dir="ltr">2021-08-23</td>
<td style="padding-right:10px;">اعادة فتح ملف )مغلق(</td>
</tr>
<tr class="resultRowOdd" align="center" style="line-height:25px;">
<td>27</td>
<td dir="ltr">2021-08-22</td>
<td style="padding-right:10px;">اغلاق ملف</td>
</tr>
<tr class="resultRowEven" align="center" style="line-height:25px;">
<td>28</td>
<td dir="ltr">2021-08-22</td>
<td style="padding-right:10px;">ايصال توريد</td>
</tr>
<tr class="resultRowOdd" align="center" style="line-height:25px;">
<td>29</td>
<td dir="ltr">2021-08-02</td>
<td style="padding-right:10px;">اعادة فتح ملف )مغلق(</td>
</tr>
<tr class="resultRowEven" align="center" style="line-height:25px;">
<td>30</td>
<td dir="ltr">2021-08-01</td>
<td style="padding-right:10px;">اغلاق ملف</td>
</tr>
</table>
</html></bdo>
I tried such a line to check the vertical scrollbar
Debug.Print bot.ExecuteScript("return document.documentElement.scrollHeight>document.documentElement.clientHeight;")
But this returns False although there's a scroll bar
I tried this line bot.ExecuteScript "document.body.style.zoom='90%';" to make the zoom out but I got another error Element outside the screenshot

How to make an item always appear last on list (Netsuite Advanced PDF)

I am trying to make an item appear last on the list if it appears on an in the invoice (Advanced PDF,NetSuite).
I was thinking bout trying to sort by and adding some ZZZs to the item name, but thats because I dont really know much outside of basic HTML.
Any Help would be appreciated.
Below is the table code I am dealing with.
<table class="itemtable" style="width: 100%; margin-top: 10px;"><!-- start items --><#list record.item as item><#if item_index==0>
<thead>
<tr>
<th align="center" colspan="3">${item.quantity#label}</th>
<th colspan="12">${item.item#label}</th>
<th colspan="3">${item.options#label}</th>
<th align="right" colspan="4">${item.rate#label}</th>
<th align="right" colspan="4">${item.amount#label}</th>
</tr>
</thead>
</#if><tr>
<td align="center" colspan="3" line-height="150%">${item.quantity}</td>
<td colspan="12"><span class="itemname">${item.item}</span><br />${item.description}</td>
<td colspan="3">${item.options}</td>
<td align="right" colspan="4">${item.rate}</td>
<td align="right" colspan="4">${item.amount}</td>
</tr>
</#list>
I need the to be able to choose a specific itemname to always show at the bottom.
You can use the assign tag to hold the values for your bottom item.
<#assign bottomItemName = 'Bottom Item Name'>
<#assign bottomItem = {}>
<#list record.item as item>
<#if item_index==0>
<thead>
<tr>
<th align="center" colspan="3">${item.quantity#label}</th>
<th colspan="12">${item.item#label}</th>
<th colspan="3">${item.options#label}</th>
<th align="right" colspan="4">${item.rate#label}</th>
<th align="right" colspan="4">${item.amount#label}</th>
</tr>
</thead>
</#if>
<#if item.item == bottomItemName>
<#assign bottomItem = bottomItem + item>
<#else>
<tr>
<td align="center" colspan="3" line-height="150%">${item.quantity}</td>
<td colspan="12"><span class="itemname">${item.item}</span><br />${item.description}</td>
<td colspan="3">${item.options}</td>
<td align="right" colspan="4">${item.rate}</td>
<td align="right" colspan="4">${item.amount}</td>
</tr>
</#if>
</#list>
<#if bottomItem.item??>
<tr>
<td align="center" colspan="3" line-height="150%">${bottomItem.quantity}</td>
<td colspan="12"><span class="itemname">${bottomItem.item}</span><br />${bottomItem.description}</td>
<td colspan="3">${bottomItem.options}</td>
<td align="right" colspan="4">${bottomItem.rate}</td>
<td align="right" colspan="4">${bottomItem.amount}</td>
</tr>
</#if>
You could try looping 2x.
<table class="itemtable" style="width: 100%; margin-top: 10px;"><!-- start items --><#list record.item as item><#if item_index==0>
<thead>
<tr>
<th align="center" colspan="3">${item.quantity#label}</th>
<th colspan="12">${item.item#label}</th>
<th colspan="3">${item.options#label}</th>
<th align="right" colspan="4">${item.rate#label}</th>
<th align="right" colspan="4">${item.amount#label}</th>
</tr>
</thead>
</#if><#if ${item.item}!="Your Item"><tr>
<td align="center" colspan="3" line-height="150%">${item.quantity}</td>
<td colspan="12"><span class="itemname">${item.item}</span><br />${item.description}</td>
<td colspan="3">${item.options}</td>
<td align="right" colspan="4">${item.rate}</td>
<td align="right" colspan="4">${item.amount}</td>
</tr>
</#list>
<#list record.item as item><#if ${item.item}=="Your Item"><tr>
<td align="center" colspan="3" line-height="150%">${item.quantity}</td>
<td colspan="12"><span class="itemname">${item.item}</span><br />${item.description}</td>
<td colspan="3">${item.options}</td>
<td align="right" colspan="4">${item.rate}</td>
<td align="right" colspan="4">${item.amount}</td>
</tr>
</#list>
I haven't tested the if statement, so good luck

how to extract blue color hidden text between rows in table using python beautifulsoup

Trying to crawl all hidden comments in table rows, after row 2 and 3, but fail to extract.
i have tried the below code to extarct these comments but fails.
below is my code.please help me someone to crack this problem.
from bs4 import BeautifulSoup,Comment
import requests
r =requests.get('http://www.esuppliersindia.com/krishna-agro-
traders/aboutus-p17322178-u10731500-swa.html')
soup = BeautifulSoup(r.text,'lxml')
table = soup.find('table',class_='text-listing')
trs = table.find_all('tr')
for tr in trs[2:3]:
print(tr.text)
for tr in trs[3:4].find_next_sibling('td'):
print(tr.text)
I am not sure though if you are looking after below comments inside table.
from bs4 import BeautifulSoup,Comment
import requests
r =requests.get('http://www.esuppliersindia.com/krishna-agro-traders/aboutus-p17322178-u10731500-swa.html')
soup = BeautifulSoup(r.text,'lxml')
table = soup.find('table',class_='text-listing')
comments=table.find_all(string=lambda text:isinstance(text,Comment))
print(comments[0].split('</tr>')[0])
for i in range(1,len(comments)):
print(comments[i])
I will print the output like that.
<td align="right" bgcolor="#FFFFFF" class="text-f11-b">No. Of Employees</td>
<td bgcolor="#FFFFFF" class="text-f11">10</td>
<tr>
<td align="right" bgcolor="#FFFFFF" class="text-f11-b">Export Turnover</td>
<td bgcolor="#FFFFFF" class="text-f11"></td>
</tr>
<tr>
<td align="right" valign="top" bgcolor="#FFFFFF" class="text-f11-b">Annual Turnover</td>
<td valign="top" bgcolor="#FFFFFF" class="text-f11">10 </td>
</tr>
<tr>
<td align="right" valign="top" bgcolor="#FFFFFF" class="text-f11-b">Import Turnover</td>
<td valign="top" bgcolor="#FFFFFF" class="text-f11"> </td>
</tr>
<tr>
<td align="right" valign="top" bgcolor="#ffffff" class="text-f11-b">Bankers</td>
<td valign="top" bgcolor="#ffffff" class="text-f11">Hdfc Bank </td>
</tr>

Tinymce copy/paste from excel (Plugin BBcode)

I'm using tinyMCE, below you can see the implementation.
Now the problem: when I copy some records from EXCEL and paste them in my tinymce field. It's displayed good enough for me (with plugin: paste, he will actually show the fields)
When i ask the value from my field I get a return with a table struct depeding what you paste. But I don't want any html, see below what I want.
Implementation code:
tinyMCE.init({
mode : "exact",
elements: "id",
theme : "advanced",
plugins : "bbcode, inlinepopups",
content_css : "tinymce.css",
entity_encoding : "raw",
remove_linebreaks : false,
forced_root_block: false,
force_br_newlines: true,
invalid_elements : "p, div, span",
force_p_newlines: false, t
heme_advanced_buttons1 : $cur_buttons,
theme_advanced_buttons2: "",
theme_advanced_buttons3: "",
init_instance_callback : "tiny_mce_callback"});
Return from tinyMCE object:
<table style="border-collapse: collapse;" width="216" border="0"
cellspacing="0" cellpadding="0">
<!--StartFragment-->
<colgroup>
<col style="mso-width-source: userset; mso-width-alt: 1152;" width="27"/>
<col width="55" />
<col style="mso-width-source: userset; mso-width-alt: 2858;"
span="2" width="67"/>
</colgroup>
<tbody>
<tr style="mso-height-source: userset;">
<td class="xl24" width="27" height="12">1</td>
<td class="xl26" width="55">26/05/12</td>
<td class="xl24" width="67">Amsterdam</td>
<td class="xl24" width="67">Casablanca</td>
</tr>
<tr style="mso-height-source: userset;">
<td class="xl24" height="12">2</td>
<td class="xl25">27/05/12</td>
<td class="xl24">Casablanca</td>
<td class="xl24">Rabat</td>
</tr>
<tr style="mso-height-source: userset;">
<td class="xl24" height="12">3</td>
<td class="xl25">28/05/12</td>
<td class="xl24">Rabat</td>
<td class="xl24">Fes</td>
</tr>
<tr style="mso-height-source: userset;">
<td class="xl24" height="12">4</td>
<td class="xl25">29/05/12</td>
<td class="xl24">Fes</td>
<td class="xl24"> </td>
</tr>
<tr style="mso-height-source: userset;">
<td class="xl24" height="12">5</td>
<td class="xl25">30/05/12</td>
<td class="xl24">Fes</td>
<td class="xl24">Erg Chebbi</td>
</tr>
<tr style="mso-height-source: userset;">
<td class="xl24" height="12">6</td>
<td class="xl25">31/05/12</td>
<td class="xl24">Erg Chebbi</td>
<td class="xl24">Dades Vallei</td>
</tr>
<tr style="mso-height-source: userset;">
<td class="xl24" height="12">7</td>
<td class="xl25">01/06/12</td>
<td class="xl24">Dades Vallei</td>
<td class="xl24">Ouarzazate</td>
</tr>
<tr style="mso-height-source: userset;">
<td class="xl24" height="12">8</td>
<td class="xl25">02/06/12</td>
<td class="xl24">Ouarzazate</td>
<td class="xl24">Marrakesh</td>
</tr>
<tr style="mso-height-source: userset;">
<td class="xl24" height="12">9</td>
<td class="xl25">03/06/12</td>
<td class="xl24">Marrakesh</td>
<td class="xl24"> </td>
</tr>
<tr style="mso-height-source: userset;">
<td class="xl24" height="12">10</td>
<td class="xl25">04/06/12</td>
<td class="xl24">Marrakesh</td>
<td class="xl24"> </td>
</tr>
<tr style="mso-height-source: userset;">
<td class="xl24" height="12">11</td>
<td class="xl25">05/06/12</td>
<td class="xl24">Marrakesh</td>
<td class="xl24">Amsterdam</td>
</tr>
<!--EndFragment--></tbody>
</table>
Expected return:
1 26/05/12 Amsterdam Casablanca
2 27/05/12 Casablanca Rabat
3 28/05/12 Rabat Fes
4 29/05/12 Fes
5 30/05/12 Fes Erg Chebbi
6 31/05/12 Erg Chebbi Dades Vallei
7 01/06/12 Dades Vallei Ouarzazate
8 02/06/12 Ouarzazate Marrakesh
9 03/06/12 Marrakesh
10 04/06/12 Marrakesh
11 05/06/12 Marrakesh Amsterdam
You should have a look at the tinymce configuration options concerning the paste plugin.
Another option is to strip out all unwanted html tags and just use plain text: TinyMCE Paste As Plain Text

Resources