NodeJS Server pdf Document Generation Issue - node.js

I am working on an API which generates a pdf document based on an EJS template which then gets emailed to some users. When running Locally everything works as expected however when my API is pushed to heroku the pdf is generated with some odd formatting errors. Its like the page is essentially cut down the middle vertically.
Does anyone know a possible cause of this?
I am using ejs, and html-pdf
Here is the code and template
document generation function
ejs.renderFile(path.join(__dirname, '../views/', "offer.ejs"), {offerData}, (err, data) => {
if(err)
reject(err)
else
{
const options = {
"height": "11.25in",
"width": "8.5in",
"header": {
"height": "10mm"
},
"footer": {
"height": "10mm",
}
};
//creates the actual pdf doc for sending
pdf.create(data, options).toFile(`./offer_${inqID}.pdf`, function(err, res) {
if (err)
reject(err)
else
resolve(res.filename)
});
}
});
html pdf template
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<title>Trade-in Disclosure Form</title>
<link href="https://fonts.googleapis.com/css2?family=Roboto:wght#300&display=swap" rel="stylesheet">
<link rel="stylesheet" href="https://stackpath.bootstrapcdn.com/bootstrap/4.5.1/css/bootstrap.min.css" integrity="sha384-VCmXjywReHh4PwowAiWNagnWcLhlEJLA5buUprzK8rxFgeH0kww/aWY76TfkUoSX" crossorigin="anonymous">
<style>
body{
font-family: 'Roboto', sans-serif !important;
}
h1{
text-decoration: underline !important;
}
table.center {
margin-left:auto;
margin-right:auto;
}
td.rows{
padding-bottom: 10px;
text-align: left !important;
}
td.question{
padding-right: 15px;
}
span{
font-weight:bold !important;
}
</style>
</head>
<body style="border: solid black 2px; margin-left: 10px; margin-right:10px;">
<header class="text-center">
<h1 >The Auto Broker</h1>
<h2> Trade-In Vehicle Disclosure Form</h2>
<p style="margin-left: 6em; margin-right: 6em">
This Form has been filled out by the customer requesting a quote. It contains some basic details to help inform your evaluation of the vehicle.
If the customer provides false information you have the right to recind or adjust your offer to the customer; otherwise include the value of
this vehicle in your total value for the quote. Please ensure to carefully review all the data before providing a quote. Quotes are final and
cannot be altered once submitted.
</p>
</header>
<div style="padding-top:20px;">
<div class="form-container container">
<div class="row" style="background-color: black;">
<h3 style="color: aliceblue;" class="ml-2"> Vehicle History</h3>
</div>
<table class="center">
<thead>
<tr>
<th style="padding-bottom: 20px;"> <h4>Question </h4></th>
<th style="padding-bottom: 20px;"> <h4>Response </h4></th>
</tr>
</thead>
<tbody>
<tr>
<td class="rows question">Has the Vehicle been in an accident?</td>
<td class="rows"> <%= questionnaire.inaccident %> </td>
</tr>
<tr>
<td class="rows question">Have any panels been repainted, repaired <br> or replaced?</td>
<td class="rows"> <%= questionnaire.panelrepairs %> </td>
</tr>
<tr>
<td class="rows question">Are you the original owner?</td>
<td class="rows"> <%= questionnaire.originalowner %> </td>
</tr>
<tr>
<td class="rows question">Is this an American Vehicle?</td>
<td class="rows"> <%= questionnaire.usvehicle %> </td>
</tr>
<tr>
<td class="rows question">Has the Vehicle been registered in any other <br> Province/State?</td>
<td class="rows"> <%= questionnaire.outofprovreg %> </td>
</tr>
<tr>
<td class="rows question">Is the Odometer faulty, replaced or rolled back? </td>
<td class="rows"> <%= questionnaire.faultyodometer %> </td>
</tr>
<tr>
<td class="rows question">Does the vehicle have any lights on the dashboard?</td>
<td class="rows"> <%= questionnaire.dashlights %> </td>
</tr>
<tr>
<td class="rows question">Does the vehicle have any factory warranty?</td>
<td class="rows"> <%= questionnaire.factorywarranty %> </td>
</tr>
<tr>
<td class="rows question">Does the vehicle have extended warranty?</td>
<td class="rows"> <%= questionnaire.extwarrenty %> </td>
</tr>
<tr>
<td class="rows question">Was the vehicle ever used as a daily rental, <br> police vehicle, or taxi/limousine?</td>
<td class="rows">
<% if (questionnaire.rental) { %>
Rental <br>
<% } %>
<% if (questionnaire.taxilimo) { %>
Taxi / Limo <br>
<% } %>
<% if (questionnaire.policecar){ %>
Police Vehicle <br>
<% } %>
</td>
</tr>
<tr>
<td class="rows question">Does the vehicle require repairs to the following: <br>
<span style="margin-left: 10px;">Engine </span> <br>
<span style="margin-left: 10px;">Transmission/Powertrain</span> <br>
<span style="margin-left: 10px;">Electrical System </span> <br>
<span style="margin-left: 10px;">Air Conditioning System</span>
</td>
<td class="rows">
<%= questionnaire.enginerepair %> <br>
<%= questionnaire.transrepair %> <br>
<%= questionnaire.electricalrepair %> <br>
<%= questionnaire.acrepairs %> <br>
</td>
</tr>
<tr>
<td class="rows question">Additional Repair Details</td>
<td class="rows"> <%= questionnaire.repairdetails %> </td>
</tr>
<tr>
<td class="rows question">Outstanding Balance</td>
<td class="rows">
<% if (questionnaire.outstandingbalance){ %>
<%= questionnaire.outstandingbalance %>
<% } %>
</td>
</tr>
<tr>
<td class="rows question">Total Current Milage</td>
<td class="rows"> <%= questionnaire.milage %> K.M </td>
</tr>
<tr>
<td class="rows question">V.I.N</td>
<td class="rows"> <%= questionnaire.vin %> </td>
</tr>
</tbody>
</table>
</div>
</div>
</body>
</html>
Proper Img
Broken Img

Was able to solve this issue by adding a single CSS style Rule.
html {
zoom: 0.55;
}
This is most likely required due to the default zoom setting of the html-pdf package. This setting and more can also be configured with options; however, i am unsure if the zoom options can be configured to < 1.

I'm facing the same prblm. I think u have to use inline CSS... u can't use bootstrap. I was facing the same problem. but now it is ok with puppeteer

Related

Retrieve 'tr' by text of children

I want to select a tr by the text it contains, including the text of the children.
My html is as follows:
<table>
<tbody>
<tr>
<td>
<span id="ctl00_ContentPlaceHolder1_GridView1_ctl21_Label4">Sanskrit</span>
</td>
<td>
<span id="ctl00_ContentPlaceHolder1_GridView1_ctl21_Label2">0655-0700 </span>
</td>
<td>
<a id="ctl00_ContentPlaceHolder1_GridView1_ctl21_LinkButtonDownloadPdf" href="javascript:__doPostBack('ctl00$ContentPlaceHolder1$GridView1$ctl21$LinkButtonDownloadPdf','')" style="color: Navy;
font-weight: bold;">Download</a>
</td>
<td>
<span id="ctl00_ContentPlaceHolder1_GridView1_ctl21_Label3">24 October</span>
</td>
</tr>
<tr>
<td>
<span id="ctl00_ContentPlaceHolder1_GridView1_ctl22_Label4">Sanskrit</span>
</td>
<td>
<span id="ctl00_ContentPlaceHolder1_GridView1_ctl22_Label2">1810-1815 </span>
</td>
<td>
<a id="ctl00_ContentPlaceHolder1_GridView1_ctl22_LinkButtonDownloadPdf" href="javascript:__doPostBack('ctl00$ContentPlaceHolder1$GridView1$ctl22$LinkButtonDownloadPdf','')" style="color: Navy;
font-weight: bold;">Download</a>
</td>
<td>
<span id="ctl00_ContentPlaceHolder1_GridView1_ctl22_Label3">23 October</span>
</td>
</tr>
</tbody>
</table>
I load it thus _soup = soup(html, "html.parser").
If i run _soup.find("span", text="Sanskrit").parent.parent.text then I get the result '\n\nSanskrit\n\n\n0655-0700 \n\n\nDownload\n\n\n24 October\n\n'
but if i run print(_soup.find("tr", text='\n\nSanskrit\n\n\n0655-0700 \n\n\nDownload\n\n\n24 October\n\n'))
i get None
Issue here is that text needs an exact match - You could regex or use css selectors and :-soup-contains():
soup.select('tr:has(:-soup-contains("Sanskrit"))')
or based on comment, check if your text is in <tr>s text:
for row in soup.select('tr'):
if '\n\nSanskrit\n\n\n0655-0700 \n\n\nDownload\n\n\n24 October\n\n' in row.text:
print(row)
Example
from bs4 import BeautifulSoup
html = '''
<table>
<tbody>
<tr>
<td>
<span id="ctl00_ContentPlaceHolder1_GridView1_ctl21_Label4">Sanskrit</span>
</td>
<td>
<span id="ctl00_ContentPlaceHolder1_GridView1_ctl21_Label2">0655-0700 </span>
</td>
<td>
<a id="ctl00_ContentPlaceHolder1_GridView1_ctl21_LinkButtonDownloadPdf" href="javascript:__doPostBack('ctl00$ContentPlaceHolder1$GridView1$ctl21$LinkButtonDownloadPdf','')" style="color: Navy;
font-weight: bold;">Download</a>
</td>
<td>
<span id="ctl00_ContentPlaceHolder1_GridView1_ctl21_Label3">24 October</span>
</td>
</tr>
<tr>
<td>
<span id="ctl00_ContentPlaceHolder1_GridView1_ctl22_Label4">Sanskrit</span>
</td>
<td>
<span id="ctl00_ContentPlaceHolder1_GridView1_ctl22_Label2">1810-1815 </span>
</td>
<td>
<a id="ctl00_ContentPlaceHolder1_GridView1_ctl22_LinkButtonDownloadPdf" href="javascript:__doPostBack('ctl00$ContentPlaceHolder1$GridView1$ctl22$LinkButtonDownloadPdf','')" style="color: Navy;
font-weight: bold;">Download</a>
</td>
<td>
<span id="ctl00_ContentPlaceHolder1_GridView1_ctl22_Label3">23 October</span>
</td>
</tr>
</tbody>
</table>'''
soup = BeautifulSoup(html)
soup.select('tr:has(:-soup-contains("Sanskrit"))')

How do I parse this html with Python lxml & xpath that finds the parent table of a specific span id?

Here is the HTML I don't have any control over. This is condensed HTML of the real page.
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<title>Little League</title>
</head>
<body>
<table>
<span>lot of unrelated text</span>
</table>
<table>
<span>lot of unrelated text</span>
</table>
<table>
<span>lot of unrelated text</span>
</table>
<table>
<tbody>
<tr>
<td class="rightTD">
<p>
<span id="teams_players">Player Teams</span>
</p>
</td>
</tr>
<tr>
<td>
<table border="1" cellspacing="0" cellpadding="0" class="tableBorder table table-bordered" width="100%">
<tbody>
<tr>
<td>
<table border="0" width="100%" class="tableData">
<tbody>
<tr id="team_listings">
<td colspan="3">Team Listings
<br>
<br>
</td>
</tr>
<tr>
<td>(a) </td>
<td colspan="2">Team Name </td>
</tr>
<tr>
<td></td>
<td colspan="2">
<span class="blue_color">Foxes</span>
</td>
</tr>
<tr>
<td>(b) </td>
<td colspan="2">Team Rank</td>
</tr>
<tr>
<td></td>
<td colspan="2">
<span class="blue_color">1</span>
</td>
</tr>
<tr>
<td>(c) </td>
<td colspan="2">Team Location
</td>
</tr>
<tr>
<td></td>
<td colspan="2">
<table width="100%">
<tbody>
<tr>
<td>City:
<br>
<span class="blue_color">Tualatin</span>
</td>
<td>State:
<br>
<span class="blue_colorLined"></span>
<br>
<span class="blue_color">Oregon</span>
</td>
<td>Country:
<br>
<span class="blue_colorLined"></span>
<br>
<span class="blue_color">United States</span>
</td>
</tr>
</tbody>
</table>
</td>
</tr>
</tbody>
</table>
</td>
</tr>
</tbody>
</table>
<br>
<table border="1" cellspacing="0" cellpadding="0" class="tableBorder table table-bordered" width="100%">
<tbody>
<tr>
<td>
<table border="0" width="100%" class="tableData">
<tbody>
<tr>
<td>(a) </td>
<td colspan="2">Team Name </td>
</tr>
<tr>
<td></td>
<td colspan="2">
<span class="blue_color">Tigers</span>
</td>
</tr>
<tr>
<td>(b) </td>
<td colspan="2">Team Rank</td>
</tr>
<tr>
<td></td>
<td colspan="2">
<span class="blue_color">3</span>
</td>
</tr>
<tr>
<td>(c) </td>
<td colspan="2">Team Location
</td>
</tr>
<tr>
<td></td>
<td colspan="2">
<table width="100%">
<tbody>
<tr>
<td>City:
<br>
<span class="blue_color">Tigard</span>
</td>
<td>State:
<br>
<span class="blue_colorLined"></span>
<br>
<span class="blue_color">Oregon</span>
</td>
<td>Country:
<br>
<span class="blue_colorLined"></span>
<br>
<span class="blue_color">United States</span>
</td>
</tr>
</tbody>
</table>
</td>
</tr>
</tbody>
</table>
</td>
</tr>
</tbody>
</table>
</td>
</tr>
</tbody>
</table>
</body>
</html>
I am trying to get to the table tag immediately preceding the span tag with id team_players.
I tried these but failed -
//table/span[#id="teams_players"]
ancestor::table[span[#id="teams_players"][position() = 1]]
This works but is not elegant and I prefer not to hardcode it -
//span[#id="teams_players"]/../../../../..
While //table[#class="tableData"] this might seem like it should work, there are many such tables in the HTML that has the same class with unrelated data. So this is ruled out.
Here is the code so far with my attempts (definitely not efficient, once I find a way of fetching both tables, I plan on looping through them to extract the data -
def parse_team():
# team data structure
teams = []
team_dict = { 'team': '', 'rank': '', 'location': { 'city': '', 'state': '', 'country': '' } }
filename = f'team.html'
f = open(filename, encoding="utf8").read()
parser = etree.HTMLParser()
tree = etree.parse(StringIO(f), parser)
# fetch the table dom and parse each team table
# fetch the parent table that contains teams_players span id
team_tables = tree.xpath('ancestor::table[span[#id="teams_players"][position() = 1]]')
print(team_tables)
root_tables = tree.xpath('//table/span[#id="teams_players"]')
print("root tables", root_tables)
# this provides each team table but in full html, the same class is being used for other unrelated data
name = tree.xpath('//table[#class="tableData"]')
print(name)
eachvaltr = name[0].xpath('.//tr')
teamname = name[0].xpath('.//td[contains(text(),"Team Name")]//parent::tr/following-sibling::tr[1]//span[#class="blue_color"]/text()')
print("teamname", teamname)
teamrank = name[0].xpath(
'.//td[contains(text(),"Team Rank")]//parent::tr/following-sibling::tr[1]//span[#class="blue_color"]/text()')
print("teamrank", teamrank)
city = name[0].xpath(
'.//td[contains(text(),"City")]//span[#class="blue_color"]/text()')
state = name[0].xpath(
'.//td[contains(text(),"State")]//span[#class="blue_color"]/text()')
country = name[0].xpath(
'.//td[contains(text(),"Country")]//span[#class="blue_color"]/text()')
print(city[0], state[0], country[0])
team_dict['team'] = teamname
team_dict['rank'] = teamrank
team_dict['location']['city'] = city[0]
team_dict['location']['state'] = state[0]
team_dict['location']['country'] = country[0]
print(team_dict)
Desired output is a list of teams where each team is a dict.
[{'team': ['Foxes'], 'rank': ['1'], 'location': {'city': 'Tualatin', 'state': 'Oregon', 'country': 'United States'}}]
//table[.//span[#id="teams_players"]]
or
//span[#id="teams_players"]/ancestor::table

Layout for table using Bootstrap 4.2 not aligning as expected

In the process of moving from Bootstrap 3 to Bootstrap 4.2. I have a table inside a Ajax modal screen. Here is the code snippet:
<div class="card-body">
<div class="row">
<table class="table table-hover table-responsive-lg">
<thead>
<tr>
<td class="text-center col-sm-3"><strong>ACTIVITY</strong></td>
<td class="text-center col-sm-3"><strong>QTY</strong></td>
<td class="text-center col-sm-3"><strong>RATE</strong></td>
<td class="text-center col-sm-3"><strong>AMOUNT</strong></td>
</tr>
</thead>
<tbody>
<tr>
<td>
<asp:Label ID="Label1" runat="server" Text="Closing Service" CssClass="form-con"></asp:Label></td>
<td class="text-center ">
<asp:TextBox ID="txtClosing_QTY" runat="server" CssClass="form-control"></asp:TextBox></td>
<td class="text-center ">
<asp:TextBox ID="txtClosing_Rate" runat="server" CssClass="form-control"></asp:TextBox></td>
<td class="text-center ">
<asp:TextBox ID="txtClosing_Total" runat="server" CssClass="form-control" ClientIDMode="Static"></asp:TextBox></td>
</tr>
</tbody>
</table>
</div>
</div>
In my system the columns showing QTY is too small to see the full number, even the number 1.
UPDATE
I changed the headers to col-sm-4 and the results are much the same. Here is an image of the outcome:
UPDATE 2
Suggested adding ROW class to the <tr> results in:
You can try adding class="row" to each tr instead of to the outside div
Example:
<html>
<head>
<meta charset="UTF-8">
<title>Document</title>
<link rel="stylesheet" href="https://cdnjs.cloudflare.com/ajax/libs/twitter-bootstrap/4.2.1/css/bootstrap.min.css">
</head>
<body>
<div style="width: 572px;">
<div class="card-body">
<div class="">
<table class="table table-hover table-responsive-lg">
<thead>
<tr class="row">
<td class="text-center col-sm-3"><strong>ACTIVITY</strong></td>
<td class="text-center col-sm-3"><strong>QTY</strong></td>
<td class="text-center col-sm-3"><strong>RATE</strong></td>
<td class="text-center col-sm-3"><strong>AMOUNT</strong></td>
</tr>
</thead>
<tbody>
<tr class="row">
<td class="col-sm-3 text-center "><asp:Label ID="Label1" runat="server" Text="Closing Service" CssClass="form-con"></asp:Label></td>
<td class="text-center col-sm-3">
<asp:TextBox ID="txtClosing_QTY" runat="server" CssClass="form-control"></asp:TextBox></td>
<td class="text-center col-sm-3">
<asp:TextBox ID="txtClosing_Rate" runat="server" CssClass="form-control"></asp:TextBox></td>
<td class="text-center col-sm-3">
<asp:TextBox ID="txtClosing_Total" runat="server" CssClass="form-control" ClientIDMode="Static"></asp:TextBox></td>
</tr>
</tbody>
</table>
</div>
</div>
</div>
</body>
</html>
I wanted to close this out with a solution that worked for me. It was simple to fix. In the <thead> section of the table I removed the col-sm-3 tag and the table behaved as expected.
Thanks everyone!

ejs template placing things out of order

I am using an ejs template to loop through an array and add rows to a table. Here is the following code in my template:
<div class="col-md-8 col-md-offset-2">
<table class="table table-hover">
<thead>
<tr>
<th>
Title
</th>
<th>
Creator
</th>
</tr>
</thead>
<tbody>
<tr class="createNewBoard">
<th scope="row">
Add New
</th>
<td>
+
</td>
</tr>
<% boards.forEach(function(board) { %>
<a href="/<%= board._id %>">
<tr>
<th scope="row">
<%= board.title %>
</th>
<td>
<%= board.creator %>
</td>
</tr>
</a>
<% }) %>
</tbody>
</table>
</div>
And here is what is produced:
<div class="col-md-8 col-md-offset-2">
<table class="table table-hover">
<thead>
<tr>
<th>
Title
</th>
<th>
Creator
</th>
</tr>
</thead>
<tbody>
<tr class="createNewBoard">
<th scope="row">
Add New
</th>
<td>
+
</td>
</tr>
<tr>
<th scope="row">
Yooo
</th>
<td>
gus.henry#me.com
</td>
</tr>
<tr>
<th scope="row">
Swag
</th>
<td>
gus.henry#me.com
</td>
</tr>
</tbody>
</table>
</div>
Why is it placing the anchor tags at the top, rather than in the table rows? Other than ejs, using bootstrap.
Thanks.
This is to do with the way tables are parsed in compliant browsers (i.e. chrome).
your <a> tag should be inside a <td></td> to be valid HTML, and chrome for one will move it out of the <tbody> when it renders the page.
Because ejs is interpreted by the server, the browser never sees any of the ejs <% %> stuff - so do a 'view source' on your output page and you'll see the <a> tags where you expected them to be.
It looks like you're trying to cause a click on a table row to navigate to your /[board id] page. You can instead do something like the this:
<% boards.forEach(function(board) { %>
<tr onclick="javascript:window.location='/<%= board._id %>';">
<th scope="row">
<%= board.title %>
</th>
<td>
<%= board.creator %>
</td>
</tr>
<% }) %>
Or, if you really want a non-javascript solution, you could position an <a> tag absolutely (so it covers the cell) in each <td> and <th> as follows:
<% boards.forEach(function(board) { %>
<tr>
<th scope="row">
<%= board.title %>
<a class="coveringA" href="/<%= board._id %>"></a>
</th>
<td>
<%= board.creator %>
<a class="coveringA" href="/<%= board._id %>"></a>
</td>
</tr>
<% }) %>
with accompanying css as follows:
table td {
overflow:hidden;
}
.coveringA {
position:absolute;
display:block;
top:0px;
left:0px;
right:0px;
bottom:0px;
background:rgba(0,0,0,0); /* sets an invisible background to ensure its clickable in IE9 */
}

Scala find location of string in a string

I have this string:
var htmlString;
Assigned to:
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" >
<html>
<head>
<title>Payment Receipt</title>
<link rel="stylesheet" type="text/css" href="content/PaymentForm.css">
<style type="text/css">
</style>
<meta content='text/html; charset=UTF-8' http-equiv='Content-Type'/>
</head>
<body>
<div id="divPageOuter" class="PageOuter">
<div id="divPage" class="Page">
<!--[1]-->
<div id="divThankYou">
Thank you for your order!
</div>
<hr class="HrTop">
<div id="divReceiptMsg">
You may print this receipt page for your records.
</div>
<div class="SectionBar">
Order Information
</div>
<table id="tablePaymentDetails1Rcpt">
<tr>
<td class="LabelColInfo1R">
Merchant:
</td>
<td class="DataColInfo1R">
<!--Merchant.val-->
Ryan
<!--end-->
</td>
</tr>
<tr>
<td class="LabelColInfo1R">
Description:
</td>
<td class="DataColInfo1R">
<!--x_description.val-->
Rasmussenpayment
<!--end-->
</td>
</tr>
</table>
<table id="tablePaymentDetails2Rcpt" cellspacing="0" cellpadding="0">
<tr>
<td id="tdPaymentDetails2Rcpt1">
<table>
<tr>
<td class="LabelColInfo1R">
Date/Time:
</td>
<td class="DataColInfo1R">
<!--Date/Time.val-->
09-Jul-2012 12:26:46 PM PT
<!--end-->
</td>
</tr>
<tr>
<td class="LabelColInfo1R">
Customer ID:
</td>
<td class="DataColInfo1R">
<!--x_cust_id.val-->
<!--end-->
</td>
</tr>
</table>
</td>
<td id="tdPaymentDetails2Rcpt2">
<table>
<tr>
<td class="LabelColInfo1R">
Invoice Number:
</td>
<td class="DataColInfo1R">
<!--x_invoice_num.val-->
176966244
<!--end-->
</td>
</tr>
</table>
</td>
</tr>
</table>
<hr id="hrBillingShippingBefore">
<table id="tableBillingShipping">
<tr>
<td id="tdBillingInformation">
<div class="Label">
Billing Information
</div>
<div id="divBillingInformation">
Test14 Rasmussen<br>
1234 test st<br>
San Diego, CA 92107 <br>
</div>
</td>
<td id="tdShippingInformation">
<div class="Label">
Shipping Information
</div>
<div id="divShippingInformation">
</div>
</td>
</tr>
</table>
<hr id="hrBillingShippingAfter">
<div id="divOrderDetailsBottomR">
<table id="tableOrderDetailsBottom">
<tr>
<td class="LabelColTotal">
Total:
</td>
<td class="DescrColTotal">
</td>
<td class="DataColTotal">
<!--x_amount.val-->
US $250.00
<!--end-->
</td>
</tr>
</table>
<!-- tableOrderDetailsBottom -->
</div>
<div id="divOrderDetailsBottomSpacerR">
</div>
<div class="SectionBar">
Visa ****0027
</div>
<table class="PaymentSectionTable" cellspacing="0" cellpadding="0">
<tr>
<td class="PaymentSection1">
<table>
<tr>
<td class="LabelColInfo2R">
Date/Time:
</td>
<td class="DataColInfo2R">
<!--Date/Time.1.val-->
09-Jul-2012 12:26:46 PM PT
<!--end-->
</td>
</tr>
<tr>
<td class="LabelColInfo2R">
Transaction ID:
</td>
<td class="DataColInfo2R">
<!--Transaction ID.1.val-->
2173493354
<!--end-->
</td>
</tr>
<tr>
<td class="LabelColInfo2R">
Authorization Code:
</td>
<td class="DataColInfo2R">
<!--x_auth_code.1.val-->
07I3DH
<!--end-->
</td>
</tr>
<tr>
<td class="LabelColInfo2R">
Payment Method:
</td>
<td class="DataColInfo2R">
<!--x_method.1.val-->
Visa ****0027
<!--end-->
</td>
</tr>
</table>
</td>
<td class="PaymentSection2">
<table>
</table>
</td>
</tr>
</table>
<div class="PaymentSectionSpacer">
</div>
</div>
<!-- entire BODY -->
</div>
<div class="PageAfter">
</div>
</body>
</html>
And I want to find the location of "x_auth_code.1.val" in the string. And then I want to obtain a string from the location plus a certain number of characters. The goal would be to return the Authorization code.
You can use indexOfSlice, and then slice() in StringOps
scala> val myString = "Hello World!"
myString: java.lang.String = Hello World!
scala> val index = myString.indexOfSlice("Wo")
index: Int = 6
scala> val slice = myString.slice(index, index+5)
slice: String = World
With your html string:
scala> htmlString.indexOfSlice("x_auth_code.1.val")
res4: Int = 2771
Why aren't you using an XML parser? Don't treat XML as strings -- you'll get bitten if you do.
Here's a regex to do it, but my advice is: DO NOT USE IT! Use xml tools.
"""\Qx_auth_code.1.val\E[^>]*>([^<]*)""".r.findFirstMatchIn(htmlString).map(_ group 1)

Resources