Looping through node created by HtmlAgilityPack - c#-4.0

I need to parse this html code using HtmlAgilityPack and C#. I can get the
div class="patent_bibdata" node, but I don'know how to loop thru the child nodes.
In this sample there are 6 hrefs, but I need to separate them into two groups; Inventors, Classification. I'm not interested in the last two. There can be any number of hrefs in this div.
As you can see there is a text before the two groups that says what the hrefs are.
code snippet
HtmlWeb hw = new HtmlWeb();
HtmlDocument doc = m_hw.Load("http://www.google.com/patents/US3748943");
string xpath = "/html/body/table[#id='viewport_table']/tr/td[#id='viewport_td']/div[#class='vertical_module_list_row'][1]/div[#id='overview']/div[#id='overview_v']/table[#id='summarytable']/tr/td/div[#class='patent_bibdata']";
HtmlNode node = m_doc.DocumentNode.SelectSingleNode(xpath);
So how would you do this?
<div class="patent_bibdata">
<b>Inventors</b>:
<a href="http://www.google.com/search?tbo=p&tbm=pts&hl=en&q=ininventor:%22Ronald+T.+Lashley%22">
Ronald T. Lashley
</a>,
<a href="http://www.google.com/search?tbo=p&tbm=pts&hl=en&q=ininventor:%22Ronald+T.+Lashley%22">
Ronald T. Lashley
</a><br>
<b>Current U.S. Classification</b>:
84/312.00P;
84/312.00R<br>
<br>
<a href="http://www.google.com/url?id=3eF8AAAAEBAJ&q=http://patft.uspto.gov/netacgi/nph-Parser%3FSect2%3DPTO1%26Sect2%3DHITOFF%26p%3D1%26u%3D/netahtml/PTO/search-bool.html%26r%3D1%26f%3DG%26l%3D50%26d%3DPALL%26RefSrch%3Dyes%26Query%3DPN/3748943&usg=AFQjCNGKUic_9BaMHWdCZtCghtG5SYog-A">
View patent at USPTO</a><br>
<a href="http://www.google.com/url?id=3eF8AAAAEBAJ&q=http://assignments.uspto.gov/assignments/q%3Fdb%3Dpat%26pat%3D3748943&usg=AFQjCNGbD7fvsJjOib3GgdU1gCXKiVjQsw">
Search USPTO Assignment Database
</a><br>
</div>
Wanted result
InventorGroup =
<a href="http://www.google.com/search?tbo=p&tbm=pts&hl=en&q=ininventor:%22Ronald+T.+Lashley%22">
Ronald T. Lashley
</a>
<a href="http://www.google.com/search?tbo=p&tbm=pts&hl=en&q=ininventor:%22Ronald+T.+Lashley%22">
Thomas R. Lashley
</a>
ClassificationGroup
84/312.00P;
84/312.00R
The page I'm trying to scrape: http://www.google.com/patents/US3748943
// Anders
PS! I know that in this page the names of the inventors are the same, but in most of them they are different!

XPATH is your friend! Something like this will get you the inventors name:
HtmlWeb w = new HtmlWeb();
HtmlDocument doc = w.Load("http://www.google.com/patents/US3748943");
foreach (HtmlNode node in doc.DocumentNode.SelectNodes("//div[#class='patent_bibdata']/br[1]/preceding-sibling::a"))
{
Console.WriteLine(node.InnerHtml);
}

So it's obvious that I don't understand XPath (yet). So I came up with this solution.
Maybe not the smartest solution, but it works!
// Anders
List<string> inventorList = new List<string>();
List<string> classificationList = new List<string>();
string xpath = "/html/body/table[#id='viewport_table']/tr/td[#id='viewport_td']/div[#class='vertical_module_list_row'][1]/div[#id='overview']/div[#id='overview_v']/table[#id='summarytable']/tr/td/div[#class='patent_bibdata']";
HtmlNode nodes = m_doc.DocumentNode.SelectSingleNode(xpath);
bool bInventors = false;
bool bClassification = false;
for (int i = 0; i < nodes.ChildNodes.Count; i++)
{
HtmlNode node = nodes.ChildNodes[i];
string txt = node.InnerText;
if (txt.IndexOf("Inventor") > -1)
{
bClassification = false;
bInventors = true;
}
if (txt.IndexOf("Classification") > -1)
{
bClassification = true;
bInventors = false;
}
if (txt.IndexOf("USPTO") > -1)
{
bClassification = false;
bInventors = false;
}
string name = node.Name;
if (name.IndexOf("a") > -1)
{
if (bInventors)
{
string inventor = node.InnerText;
inventorList.Add(inventor);
}
if (bClassification)
{
string classification = node.InnerText;
classificationList.Add(classification);
}
}

Related

Razor Pages: Passing query parameters in a link

I tried in the razor page:
<div>
#foreach (var cat in Model.Categories)
{
<a asp-page="/Index?catId=#cat.Id">#cat.Name</a>
}
</div>
And in the cs file:
public void OnGet()
{
CurPage = 1;
CatId = -1;
Search = "";
HasCarousel = false;
Title = "All Products";
var queryParams = Request.Query;
foreach(var qp in queryParams)
{
if (qp.Key == "curPage") CurPage = int.Parse(qp.Value);
if (qp.Key == "catId") CatId = int.Parse(qp.Value);
if (qp.Key == "search") Search = qp.Value;
if (qp.Key == "hasCarousel") HasCarousel = bool.Parse(qp.Value);
}
But when i click the link no query parameter is added to the address and the Request.Query is empty.
What am I doing wrong? Or what is the right way to pass the query parameters to a razor page via a link?
To add query parameters to your link you can use asp-route- in your a tag.
In your case it would look like
<a asp-page="Index" asp-route-catId="#cat.Id">#cat.Name</a>
You're also making receiving the query parameters in your OnGet() harder than it has to be. You can add parameters to your method signature like
OnGet(int catId)
and they will be passed in by the query parameters.

How do I get an object from an observable item Collection by index?

I need to get the object in an observable collection by index to access a property of the item at that index.
This is a snippet of the code:
public ObservableCollection<TipsModel> TipObjects;
private void LoadContent()
{
TipObjects = new ObservableCollection<TipsModel>();
for (int i = 0; i < 5; i++)
{
TipsModel item = new TipsModel()
{
Image = ImageSource.FromFile("nonindustryIcon.png"),
Title = "Kill energy vampires and save up to $100 a year",
Text = "Seventy-five percentof the electrical use by home electronics occurs when they're at home. \n People not at home means no electricity. Do not stay at home. Go stay on the streets. ",
};
TipObjects.Add(item);
}
foreach (TipsModel item in TipObjects)
{
img = item.Image;
tipTitle = item.Title;
tip = item.Text;
item.Content = CreateContent();
}
slideView.ItemsSource = TipObjects;
}
private void slideView_SlidedToIndex(object sender, Telerik.XamarinForms.Primitives.SlideView.SlideViewSlidedToIndexEventArgs e)
{
var slideId = slideView.Id;
//TipsModel tip = TipObjects.item at index[18];
}
You can just normally do
var tip = TipObjects[18];
Observable collection is just a normal, fully functional collection that supports indexing.
Alternatively you can use the Items property as well. Both approaches are equivalent:
var tip = TipObjects.Items[18];

Export Rich Text to plain text c#

Good day to Stackoverflow community,
I am in need of some expert assistance. I have an MVC4 web app that has a few rich text box fields powered by TinyMCE. Up until now the system is working great. Last week my client informed me that they want to export the data stored in Microsoft SQL to Excel to run custom reports.
I am able to export the data to excel with the code supplied. However it is exporting the data in RTF rather than Plain text. This is causing issues when they try to read the content.
Due to lack of knowledge and or understanding I am unable to figure this out. I did read that it is possible to use regex to do this however I have no idea how to implement this. So I turn to you for assistance.
public ActionResult ExportReferralData()
{
GridView gv = new GridView();
gv.DataSource = db.Referrals.ToList();
gv.DataBind();
Response.ClearContent();
Response.Buffer = true;
Response.AddHeader("content-disposition", "attachment; filename=UnderwritingReferrals.xls");
Response.ContentType = "application/ms-excel";
Response.AddHeader("Content-Type", "application/vnd.ms-excel");
Response.Charset = "";
Response.Cache.SetCacheability(HttpCacheability.NoCache);
StringWriter sw = new StringWriter();
HtmlTextWriter htw = new HtmlTextWriter(sw);
gv.RenderControl(htw);
Response.Output.Write(sw.ToString());
Response.Flush();
Response.End();
return RedirectToAction("Index");
}
I would really appreciate any assistance. and thank you in advance.
I have looked for solutions on YouTube and web forums with out any success.
Kind Regards
Francois Muller
One option you can perform is to massage the Data you write to the XML file.
For example, idenfity in your string and replace it with string.Empty.
Similarly can be replaced with string.Empty.
Once you have identified all the variants of the Rich Text HTML tags, you can just create a list of the Tags, and inside a for FOR loop replace each of them with a suitable string.
Did you try saving the file as .xslx and sending over to the client.
The newer Excel format might handle the data more gracefully?
Add this function to your code, and then you can invoke the function passing it in the HTML string. The return output will be HTML free.
Warning: This does not work for all cases and should not be used to process untrusted user input. Please test it with variants of your input string.
public static string StripTagsCharArray(string source)
{
char[] array = new char[source.Length];
int arrayIndex = 0;
bool inside = false;
for (int i = 0; i < source.Length; i++)
{
char let = source[i];
if (let == '<')
{ inside = true; continue; }
if (let == '>') { inside = false; continue; }
if (!inside) { array[arrayIndex] = let; arrayIndex++; }
}
return new string(array, 0, arrayIndex);
}
So I managed to resolve this issue by changing the original code as follow:
As I'm only trying to convert a few columns, I found this to be working well. This will ensure each records is separated by row in Excel and converts the Html to plain text allowing users to add column filters in Excel.
I hope this helps any one else that has a similar issue.
GridView gv = new GridView();
var From = RExportFrom;
var To = RExportTo;
if (RExportFrom == null || RExportTo == null)
{
/* The actual code to be used */
gv.DataSource = db.Referrals.OrderBy(m =>m.Date_Logged).ToList();
}
else
{
gv.DataSource = db.Referrals.Where(m => m.Date_Logged >= From && m.Date_Logged <= To).OrderBy(m => m.Date_Logged).ToList();
}
gv.DataBind();
foreach (GridViewRow row in gv.Rows)
{
if (row.Cells[20].Text.Contains("<"))
{
row.Cells[20].Text = Regex.Replace(row.Cells[20].Text, "<(?<tag>.+?)(>|>)", " ");
}
if (row.Cells[21].Text.Contains("<"))
{
row.Cells[21].Text = Regex.Replace(row.Cells[21].Text, "<(?<tag>.+?)(>|>)", " ");
}
if (row.Cells[22].Text.Contains("<"))
{
row.Cells[22].Text = Regex.Replace(row.Cells[22].Text, "<(?<tag>.+?)(>|>)", " ");
}
if (row.Cells[37].Text.Contains("<"))
{
row.Cells[37].Text = Regex.Replace(row.Cells[37].Text, "<(?<tag>.+?)(>|>)", " ");
}
if (row.Cells[50].Text.Contains("<"))
{
row.Cells[50].Text = Regex.Replace(row.Cells[37].Text, "<(?<tag>.+?)(>|>)", " ");
}
}
Response.ClearContent();
Response.Buffer = true;
Response.AddHeader("content-disposition", "attachment; filename=Referrals " + DateTime.Now.ToString("dd/MM/yyyy") + ".xls");
Response.ContentType = "application/ms-excel";
Response.ContentEncoding = System.Text.Encoding.UTF8;
Response.AddHeader("Content-Type", "application/vnd.ms-excel");
Response.Charset = "";
Response.Cache.SetCacheability(HttpCacheability.NoCache);
StringWriter sw = new StringWriter();
HtmlTextWriter htw = new HtmlTextWriter(sw);
gv.RenderControl(htw);
//This code will export the data to Excel and remove all HTML Tags to pass everything into Plain text.
//I am using HttpUtility.HtmlDecode twice as the first instance changes null values to "Â" the second time it will run the replace code.
//I am using Regex.Replace to change the headings to more understandable headings rather than the headings produced by the Model.
Response.Write(HttpUtility.HtmlDecode(sw.ToString())
.Replace("Cover_Details", "Referral Detail")
.Replace("Id", "Identity Number")
.Replace("Unique_Ref", "Reference Number")
.Replace("Date_Logged", "Date Logged")
.Replace("Logged_By", "File Number")
.Replace("Date_Referral", "Date of Referral")
.Replace("Referred_By", "Name of Referrer")
.Replace("UWRules", "Underwriting Rules")
.Replace("Referred_To", "Name of Referrer")
);
Response.Flush();
Response.End();
TempData["success"] = "Data successfully exported!";
return RedirectToAction("Index");
}

Adding a reusable block of code in Webmatrix

I have created an SQL query which checks if a user owns a record in the database, by checking if the querystring and UserID return a count of 1. This is the code below, and it works absolutely fine:
#{
Layout = "~/_SiteLayout.cshtml";
WebSecurity.RequireAuthenticatedUser();
var db = Database.Open("StayInFlorida");
var rPropertyId = Request.QueryString["PropertyID"];
var rOwnerId = WebSecurity.CurrentUserId;
var auth = "SELECT COUNT (*) FROM PropertyInfo WHERE PropertyID = #0 and OwnerID = #1";
var qauth = db.QueryValue (auth, rPropertyId, rOwnerId);
}
#if(qauth==0){
<div class="container">
<h1>You do not have permission to access this property</h1>
</div>
}
else {
SHOW CONTENT HERE
}
The problem is that I need to apply this check on at least 10 different pages, maybe more in the future? I'm all for using reusable code, but I'm not sure how I can write this once, and reference it on each page that it's needed. I've tried doing this in the code block of an intermediate nested layout page, but I ran into errors with that. Any suggestions as to what would be the best approach? Or am I going to have to copy and paste this to every page?
The "Razor" way is to use a Function (http://www.mikesdotnetting.com/Article/173/The-Difference-Between-#Helpers-and-#Functions-In-WebMatrix).
Add the following to a file called Functions.cshtml in an App_Code folder:
#functions {
public static bool IsUsersProperty(int propertyId, int ownerId)
{
var db = Database.Open("StayInFlorida");
var sql = #"SELECT COUNT (*) FROM PropertyInfo
WHERE PropertyID = #0 and OwnerID = #1";
var result = db.QueryValue (sql, propertyId, ownerId);
return result > 0;
}
}
Then in your page(s):
#{
Layout = "~/_SiteLayout.cshtml";
WebSecurity.RequireAuthenticatedUser();
var propertyId = Request["PropertyID"].AsInt();
var ownerId = WebSecurity.CurrentUserId;
}
#if(!Functions.IsUsersProperty(propertyId, ownerId)){
<div class="container">
<h1>You do not have permission to access this property</h1>
</div>
}
else {
SHOW CONTENT HERE
}

How to show ID field as readonly in Edit Form, of a sharepoint list?

I need to show the ID field in the Edit Form of a sharepoint list.
There is a way to do it ? I tried a calculated field and nothing.
I know that I can see the ID field in the view, and if I show as a Access Mode.
I'm using WSS3.0
You can add the ID field to the form using some JavaScript in a CEWP.
<script type="text/javascript"
src="http://ajax.googleapis.com/ajax/libs/jquery/1.3.2/jquery.min.js">
</script>
<script type="text/javascript">
$(function() {
// Get the ID from the query string
var id = getQueryString()["ID"];
// Find the form's main table
var table = $('table.ms-formtable');
// Add a row with the ID in
table.prepend("<tr><td class='ms-formlabel'><h3 class='ms-standardheader'>ID</h3></td>" +
"<td class='ms-formbody'>" + id + " </td></tr>");
})
function getQueryString() {
var assoc = new Array();
var queryString = unescape(location.search.substring(1));
var keyValues = queryString.split('&');
for (var i in keyValues) {
var key = keyValues[i].split('=');
assoc[key[0]] = key[1];
}
return assoc;
}
</script>
There is an alternative method that doesn't use the jQuery library if you prefer to keep things lightweight.
You can do this by creating a custom edit form quite easily. I usually stick it into an HTML table rendered within a webpart. There may be a better way of doing that but it's simple and it works.
The key line you'll want to look at is spFormField.ControlMode. This tells SharePoint how to display the control (Invalid, Display, Edit, New). So what you'll want to do is check if your spField.InternalName == "ID" and if it is, set the ControlMode to be Display.
The rest is just fluff for rendering the rest of the list.
Hope this helps.
HtmlTable hTable = new HtmlTable();
HtmlTableRow hRow = new HtmlTableRow();
HtmlTableCell hCellLabel = new HtmlTableCell();
HtmlTableCell hCellControl = new HtmlTableCell();
SPWeb spWeb = SPContext.Current.Web;
// Get the list we are going to work with
SPList spList = spWeb.Lists["MyList"];
// Loop through the fields
foreach (SPField spField in spList.Fields)
{
// See if this field is not hidden or hide/show based on your own criteria
if (!spField.Hidden && !spField.ReadOnlyField && spField.Type != SPFieldType.Attachments && spField.StaticName != "ContentType")
{
// Create the label field
FieldLabel spLabelField = new FieldLabel();
spLabelField.ControlMode = _view;
spLabelField.ListId = spList.ID;
spLabelField.FieldName = spField.StaticName;
// Create the form field
FormField spFormField = new FormField();
// Begin: this is your solution here.
if (spField.InteralName == "ID")
{ spFormField.ControlMode = SPControlMode.Display; }
else
{ spFormField.ControlMode = _view; }
// End: the end of your solution.
spFormField.ListId = spList.ID;
spFormField.FieldName = spField.InternalName;
// Add the table row
hRow = new HtmlTableRow();
hTable.Rows.Add(hRow);
// Add the cells
hCellLabel = new HtmlTableCell();
hRow.Cells.Add(hCellLabel);
hCellControl = new HtmlTableCell();
hRow.Cells.Add(hCellControl);
// Add the control to the table cells
hCellLabel.Controls.Add(spLabelField);
hCellControl.Controls.Add(spFormField);
// Set the css class of the cell for the SharePoint styles
hCellLabel.Attributes["class"] = "ms-formlabel";
hCellControl.Attributes["class"] = "ms-formbody";
}
}

Resources