Kentico 9 search result transformation - kentico

We've noticed a bug when looking at French search results. in the CMS Desk, i've kept the Page Name in English for the French content. The issue is, these are showing on the French results page.
in the transformation, based off the default one, I present the clickable title like this:
<a href='<%# SearchResultUrl() %>' data-type="title" target="_blank" ><%#SearchHighlight(HTMLHelper.HTMLEncode(CMS.ExtendedControls.ControlsHelper.RemoveDynamicControls(DataHelper.GetNotEmpty(Eval("Title"), ""))), "<span class='highLight'>", "</span>")%></a>
Here's my thinking, if the Menu Caption is filled out, use that rather than title. How do i output DocumentMenuCaption without adjust the search fields on the menu page type?
I think my logic is, check if DocumentMenuCaption is emtpy, if it use, use Title.

You should be able to continue using GetNotEmpty and just pass in the DocumentMenuCaption first, something like this:
<%# GetNotEmpty(GetSearchValue("DocumentMenuCaption");Eval("Title")) %>
You may or may not need the "GetSearchValue" function, but that allows you to grab values from the object that may not be available in the default set of columns for the search results.
Alternatively, you should be able to use the IfEmpty() method:
<%# IfEmpty(GetSearchValue("DocumentMenuCaption"), Eval("Title"), GetSearchValue("DocumentMenuCaption")) %>
Both transformation methods taken from here (double check syntax on "GetNotEmpty" as there are different ways it's implemented: https://docs.kentico.com/k9/developing-websites/loading-and-displaying-data-on-websites/writing-transformations/reference-transformation-methods
You can read more about the search transformations here: https://docs.kentico.com/k9/configuring-kentico/setting-up-search-on-your-website/displaying-search-results-using-transformations

Related

How to filter text after webscraping

So I'm trying to webscrape this website that provides novels for free, for example this page: https://www.wuxiaworld.com/novel/martial-world/mw-chapter-1
I'm trying to only extract the title and the body of the chapter. Finding the title is easy enough since its in h4, however the body of the chapter is not separated by any specific div tags so I cannot just isolate it. I was wondering how I'd do this. The closest Ive gotten to just having the text is this.
Ps. Im new to webscraping, sorry if my question is unclear or stupid.
I tried to identify if the body of text was under any exclusive div tag but it wasn't, so i tried to call it under whatever the closest div tag was, this still returned a lot of useless and unwanted text.
edit : #koro, there's more than one instance of fr-view being used so it doesn't isolate the text. fr-view class also appears before the chapter text.
I'm not versed in webscraping but upon reviewing the page source html I see that <div class="fr-view"> only precedes the body text on the novel pages. If you start the logging after the scraper identifies this line you should be able to stop at the very next <a href="/novel..... tag to only have the novel text included.
Some of the pages I see also include footnotes with some extra information, these include an <a href=#footnote....> tag, so if you would like to keep the footnotes included I would search for <a href=/novel...> and NOT <a href=...>
P.S. I only looked at 4 pages and while they all appear to have the same format that I've pointed out above it's still possible that you may run into issues, but that's definitely something you can a bridge you can cross when you get there!

Kentico - Dynamic Page Title Not Display Correctly in Smart Search Results

I used {% CurrentDocument.DocumentName %} in the Metadata > page title field. The Title tag displays ok when viewing the article itself on the browser; however, when searching through Smart Search, the results output something like below in place of the Title. I'm not sure why, is there a way to fix this? Thanks!
{% CurrentDocument.DocumentName |(user)myLogin|(hash)9f2b69705f777e8a884a107dfb72f681d8eb99867b6967514dbdca851b7f4309%}
Note: This is for hundreds of article pages, and inheriting Page Title from Parent by using the macro work best for me.
What is your transformation for search results? How do you retrieve that value?
I can see two possible approaches to solve your issue:
go to page type -> search fields and select DocumentName as a value for Title field
adjust search results transformation and use <%# GetSearchValue("DocumentName") %> instead of <%# Eval("Title") %>
This is most likely because the user who signed the macro is not in the system any longer. I'd change the macro to simply read:
{%CurrentDocument.DocumentName#%}
Having the # at the end will say the macro does not need to be signed.

Kentico 9 transformation page/file URL

In my custom page type, you can select an uploaded file. That's fine, but in my ascx transformation, i'm having a hard time getting the URL. The field is 'Process'.
Here's what i currently have.
<%# IfEmpty(Eval("Process"),"N/A","<a href=" + Eval("Process") +" target='blank' class='icon download'>Download</a>")%>
When rendered, the html is this:
Download
I'm missing something.
You can use either of the 2 methods below. Both have their downfalls though.
<a href="<%# GetFileUrl("Process", "Something") %>"Link here<a/> this will
Downfall with this is if there is no value in the "Process" field, it will return an invalid URL. So I tend to use something a little better (but not much)
Item to download
This will create a valid URL with some invalid properties to it. Meaning if there is no value in the Process field, it will return 00000000-0000-0000-0000-000000000000. If the NodeAlias field is empty, it will return "download". So again, not 100% fool-proof but it works well in most cases.
Update
Check out this link:
https://devnet.kentico.com/articles/options-for-file-fields-in-structured-data
I think the piece you need in here is in the "CMS.File page type" section:
This is the link to the picture
Check out transformation methods reference
You can use <%#GetImage(Eval("Process"))%>. This will return an Image tag. There are a couple other parameters for sizing if you want to use those.
See the "Transformation reference" link on your Trasnformation editor, it goes to all the available transformation methods you can use.
In it it shows:
This will generate an actual image tag. If however you want a link, it usually is
/getattachment/<%# Eval("TheImage")%>/ImageFileNameCanBeAnythingThough.jpg
example:
/getattachment/1936c69d-a28c-428a-90a7-09154918da0f/Christmas.jpg

Advanced Orchard Theming - Object Model vs Placement.info?

I have been tasked with converting a design heavy, fairly advanced HTML template for a site into an Orchard theme and I am struggling with the best way to accomplish certain things. The theme is built on bootstrap and is a modern responsive HTML template like you might find on ThemeForest or something. The site will have a number of content types (staff members, portfolio items, partners, etc.) and will need a number of templates. The content types will have a large number of fields (upwards of a dozen) inside of custom content parts.
Based on what I have read the proper way to do theming in Orchard is using placement.info in combination with alternates, wrappers, etc. This gracefully handles if parts or properties are added/removed. However, this technique is quickly becoming overwhelming, since I have to declare the name and order of every field/part in the placement.info for every content type, and every display type of that content type. Each field of each content type then needs to be wrapped in very specific html. This creates an issue because a single page can be split out into potentially a couple dozen views, with HTML tags opening in one view and closing in another.
The best work around for this I have found is to basically ignore the placement.info file and build templates just by traversing the object model. So basically, for a portfolio page, I would copy in the template HTML I have and then replace the text values with values from the model. This might look something like:
<li class="#Display(Model.ContentItem.PortfolioPart.PortfolioCore.Value.ToLower())">
<a href="#Url.ItemDisplayUrl(contentItem)" >
#foreach (var media in Model.ContentItem.PortfolioPart.PortfolioImage.MediaParts)
{
<img src="#Display(media.MediaUrl)" />
}
<span class="type">#Display(Model.ContentItem.PortfolioPart.PortfolioCoreArea.Value)</span>
<span class="portfolio-item-content">
<span class="header">#Display(Model.ContentItem.TitlePart.Title)</span>
<span class="body">
<p>
#Display(Model.ContentItem.PortfolioPart.PortfolioTagline.Value)
</p>
</span>
</span>
</a>
</li>
The benefit with this method is that I can apply all of the values in a couple of views and it's more readable. Obviously the problem with this is that if any properties or parts are removed, the template breaks.
Is there a way in Orchard to have the best of both worlds? I can't have a wrapper or template for every field - this would end up potentially hundreds of fields by the end. I also might need to display content types in multiple places with different views - each field would then require a whole new set of wrappers or alternates for every projection.
Please let me know if I'm missing anything or if there is a better way to do this besides manually traversing to the properties I need. I need a way to be able to easily plug in properties into very specific html.
My final thought was to use very specific templates for custom content types using the object model but still provide good general templates/placement.info file so that general Orchard content is flexible but the custom content types have to stay how they are.
Side thought - I guess another option would be to wrap any code that accesses a property directly in a try catch block or some kind of error handler helper, but that doesn't seem like a "best practice".
I think the techniques in this article are what you're looking for: http://weblogs.asp.net/bleroy/archive/2013/02/13/easy-content-templates-for-orchard-take-2.aspx

How can I hide certain text from search engines?

In my WordPress blog, I have "Posted ? days ago" on every post. I have 10 posts on my homepage. So according to most keyword analysis tools, "days ago" is a top keyword on my blog, but I don't want it to be. How can I hide those words from search engines?
I don't want to use Javascript. I can easily use PHP and the $_SERVER variable, but I'm afraid I might get penalized for cloaking. Is there a HTML tag or an attribute like rel="nofollow" that I can use?
From Is there any way to have search engines not index a certain section of a page?
Supposedly you can add the class
robots-nocontent to elements on your
page, like this:
<div class="robots-nocontent">
<p>Ignore this stuff.</p>
</div>
Yahoo respects this, though I
don't know if other search engines
respect this. It appears Google is
not supporting this at this time.
I suspect if you load your content via
ajax you would get the same effect of
it not being present on the page.
and
There's no general way to do that and
personally I wouldn't bother with it.
Search engines are pretty good at
recognizing relevant content on a
page, and even though that content
might show up in the keywords that
search engines have found, it doesn't
mean that it would make the page
relevant for those keywords.
If you have a page about "Fish" and a
page about "Dogs" (that has the link
to the page about "Fish" somewhere in
the sidebar), search engines will
generally be able to recognize that
the page about "Fish" is much more
relevant for "Fish" than the page
about "Dogs" that mentions "Fish" in
the sidebar. It's possible that both
pages might be found at some point,
but generally given that mostly one
page from the site is shown in the
search results, that's not something
worth worrying about.
There's no need to be fancy with that,
and search engines are likely to just
get more confused if you try (eg if
you use JavaScript to hide the
content, you never know when search
engines will start to find that
content regardless). Similarly, using
iframes with robots.txt disallows or
AJAX will frequently degrade the
quality of your pages to users (slow
it down or make it less usable on a
variety of devices), so unless there
is a very, very strong & proven reason
that you need to do this, I would
strongly recommend not bothering with
it.
What I have found on wiki:
For Yandex:
<!--noindex-->Don't index this text.<!--/noindex-->
For Yahoo:
<div class="robots-nocontent">Don't index this text.</div>
For Google:
<!--googleoff: index--> Don't index this text.<!--googleon: index-->
Linksku, I'm fairly sure you shouldn't be worried about that particular piece of text. Our algorithms do a relatively good job detecting boilerplate text. As far as I can tell from your question, this text is boilerplate and we likely already know that.
As for detecting Googlebot and don't serving this text for it, you're right, that would be cloaking and you should never do it. In this case if you hide that text from us, we will also have a hard time detecting it's boilerplate and you would end up doing exactly what you're trying to avoid :)
I worked this out and posted it up at: http://www.scivillage.com/thread-2580.html
This should work, however more testing of it and feedback would be appreciated.
.x:before{
content:attr(title);
display:inline;
}
<ul>
<li><span class="x" title="Homepage"></span></li>
<li><span class="x" title="Contact" /></li>
</ul>
(I kept the class name short to reduce mark-up creep)
The search engines should ignore HTML tags with empty values when comes to looking for keywords, this should mean that it ignores what is written in the title attribute. (It assumes that the value is what's important, if it's empty then there is no point checking the attributes)
It was suggested that it's possible to negate having the closing tag in HTML5 due reduced strictness, however there is counter suggestions that end tags are still required.
I'd suggest not using it directly on a (anchor) tags since they can be used for sitemaps (using #), so it's means they would like have the Title spidered.
Although it is possible that it might assume any title content is there to inflate keywords through hidden elements, however I can not confirm this.
To exclude specific text from Google search results you can add data-nosnippet attribute.
https://developers.google.com/search/reference/robots_meta_tag#data-nosnippet-attr
From google documentation
You can also prevent certain parts of the page text content from being shown in a snippet by using data-nosnippet.
HTML:
<div class="hasHiddenText">_</div>
It is important that you leave a non-whitespace character between the element with a hidden text.
External CSS:
.hasHiddenText{
content: "Your hidden text here...";
/*This ovewrites the default content of the div but it isn't supported by all browsers.*/
}
.hasHiddenText::before{
content: " Your hidden text here...";
/*Places a hidden text above the div.*/
}
The "hidden text" pertains to content hidden to all search engines but visible to visitors.
You can also use nextline and all sorts of Unicode characters by escaping them with \uXXXX. To display linebreak characters correctly, be sure to add the
white-space:pre-line;
property.

Resources