I have a slight problem I have been trying to address for a client I have been working with. We have 4 sets of single pages that are loading content from a database using PHP based upon a get string that is provided. These pages that are generated are optimized well for SEO and have alt tags for images and Content that we need to be able to search using a search feature.
Now i had assumed (An everyone knows what assuming gets you) that these pages by default would be able to be searched by the concrete 5 built in search feature. But it doesn't work. If I search for a word that I know is definitely on one of these pages even multiple times no results are found.
How can I make Concrete5 search these pages. If its no do able by a default or by a plugin, then can someone please offer some advice on how to fix this. This is an important feature and must be completed.
EDIT: See my comment below. I still need some help or direction here as CSE inst much of an option.
EDIT2: It may be viable for me to install a crawler and a custom search engine to address my problems. I was thinking of spider. Any other suggestions on that or other options are much appreciated!
Unfortunately C5 doesn't provide a way to do this -- the only way to tap into the search index is with blocks. And even if you created a phony block just to pass content from the single_page through to the search index, there's no way to say that some content is from one URL while other content is from another URL (which you'd need to do since your single_page controller is handling many different URL's).
I don't know of a way to achieve what you want to do (and it appears that nobody else does either -- http://www.concrete5.org/community/forums/customizing_c5/make-content-in-single-pages-searchable/ ), other than building your own internal search engine.
EDIT: I just did some digging, and thought that perhaps you could manually insert records into the PageSearchIndex table and specify the searchable content and the desired path there -- but this won't work because it relies on one cID (collection id, a.k.a. page id) per entry -- so you'd only be able to insert one record for the top-level single_page path.
I think the simplest solution here would be to create your own searching infrastructure for your single_pages (like some kind of function in the controller that would return an array of page paths and searchable content for each one), then override the search block and perform an additional search of your single_page -- then combine the results on the search results page there. Or just use google site search for your site, which will actually crawl the pages and hence find your various single_page urls: https://www.google.com/cse/
Best of luck.
I have not tested this, but maybe you can put a function getSearchableContent() in the single pages controller like you do for blocks. This would return the string to be searched. Would look something like this:
function getSearchableContent() {
// ... compose searchstring depending on the queried content.
return $searchstring;
}
But I don't know if this works for dynamic content. If not, I'd look into C5's search index core classes and try to extend them for your project.
Related
I have this requirement:
We have a journalarticle and we wish to have sections which have content for internal and external users for the application.
We are able to hide the content from rendering by implementing custom template on web content display and using a simple custom-field for a user which helps us to classify it.
Having said that when we search something as an external user, the search portlet is able to fetch an article where the search text is a part of internal user content, and due to the above mentioned template the content is not visible.
In short, from the user's perspective the resultant article does not match the searched term.
I wish to seek some pointer to check whether there is a mechanism to ensure that when an external user searches something then we only search the dynamic-element of the doc which matches the user type?
We have thousands of such articles and create multiple copy of the same article does not seems viable solution.. so any pointers would be a great help.
Liferay version : 6.2 GA4 CE
Thanks!
AJ
First of all: Not finding a search term in a document can be a sign of good working synonym resolution in the search engine. It's questionable if this behaviour is always wrong or only in this particular case. Remember google bombs?
That being said, I believe that this architecture of half-visible documents is flawed from the beginning. Ideally I'd suggest to change it, for example by splitting the information to two articles, so that you can use the standard permissions to resolve. If you link both, you can determine how/which article or template to use. It's not an ideal solution, but might be a workaround.
Another workaround might be to change Liferay's indexer component and index two different versions of the article, with two different permissions. Of course, you'll have to change the search side as well, so that you'll find each article at most once, even if it's now twice in the search engine.
Again - not ideal, but might be the quickest fix that you can get right now without changing the underlying architecture. However, to change the underlying architecture is my actual recommendation.
I need to produce facet block from two vocabularies in my site. I am using Views and a patched version of Views infinite Scroll to generate the search page, using my search index, and I have tweaked everything I could in the facet display settings to see if I could produce the requested results, to no avail.
I do not need keyword searches. I need to show all taxonomy terms in each facets at all times and to be able to select a single criteria at a time from each vocabulary. So, never more thane one selection at a time from each facet block.
Why are you using Solr to store data and generate your search page, if you do not need keyword search and are trying to go against the native working of solr Facets, I hear you say? For performance reasons, it is the reason why I am using Solr to store & serve the results, I have even gone as far as pushing renedered node to the index with the help of the somwhat obscure search_api_solr_view_modes module.
I could take two separate routes
Create a custom block, load all the taxonomy terms, alter the output of the term link to point to the view and provide the TID for the View. The active filter data could be obtained from the view arguments. I know how to do that but feel it is the wrong way to go about it, if I am working with Solr, I should be using a facet, not a custom block.
Build a custom Facet block that has this exact behaviour. After reading a lot of documentation, I git kind of dicouraged with the possibility of doing this simply without having to develop a Facet plugin, which is kind of out of my league.
Any advice is appreciated.
Here is a screenshot of the interface I have to produce.
http://imageshack.com/a/img834/9836/kr0i.png
Each taxonomy term has to be persistent, i.e., produce a link event if there are no nodes indexed under this term.
Selecting a term in one of the vocabularies will deselect previously selected terms
Clicking on the x next to a term will remove it form the active search criterias.
Have a look at this. https://drupal.org/project/ajax_facets This might get you to where you need to be. Sans you infinite scroll. There is a youtube video that goes with it. http://www.youtube.com/watch?v=pBj3OkXLyWs
I'd appreciate it if it works as I haven't tried it my self.
I'm developing a webapp that will need to download the html form a website and then iterate through the code and try to find a specific but ever changing value (in our case it will be the price for the product).
For this, I was thinking about asking the user (upon installation and setup) to provide the system with a few lines of html from the page (that has the price) and then from then on, every time we need to fetch the price we would try to search for those lines and find the price.
Now, I believe this is a horrible and slow way of doing this and since there are no rules and the html can be totally different from one website to another (even the same website might change) I couldn't find a better way.
One improvement that I thought about was to iterate through the first time and record the line at which we find the code. Once found, the subsequent times we would then start from a few lines before the expected location and start the search. Any Thoughts on how I can improve on this?
I posted this question on https://cstheory.stackexchange.com/ but they commented that it's not on topic and that I should post it here.
I have the code for the above and if needed I can post it, I'm simply thinking that there must be a better, faster way of doing this.
This is actually something I tried for a project recently (using BeautifulSoup and Python). The solution that worked for me was to workout CSS selectors (which can map to jQuery selectors) that targeted the elements that contained the values I was looking for. In my case I was able to narrow down the full document to just the elements that contained what I was looking for but if you couldn't get exactly what you where after you could combine this with some extra lactic like test to see if it looks like a price (via regex) or test what it is next to.
We are using Liferay as a classic CMS meaning that we compose pages using web content articles. There is an issue with Liferay's internal search I could not yet find a proper answer for:
Because web content articles are pretty much only building blocks for pages we don't want the search to show them as distinct items. The user should only get a list of pages that contain their search keywords, including all the articles put onto this page.
At the moment we can see two different approaches and both come with certain problems we could not solve yet:
Idea 1
We modify the journal indexer and try to obtain all URLs of the pages (how?) where the article has been placed on. Then we add them to the document to be indexed. In the search result we then can access the URLs and collect them. In the end we make sure every URL is only shown once.
Idea 2
At some point Liferay renders the entire page before sending it to the browser. If we somehow could put an indexer there, we could index the entire page. We then could limit the search to the special "page documents". Getting the fully rendered page would be the main issue here, because either we would have to run a crawler to frequently trigger this indexing or we would need to find a way to trigger page rendering from within an indexer or something like that.
I have been carrying this problem around for quite a while now and still could not find an idea good enough to spend time trying it out. If anyone of you has some input on those two ideas or maybe an entirely different approach, I would be extremely grateful.
I'll just answer myself, because by now we found a suitable solution to solve our problem:
In addition to the default search portlet there is also a "Web Content Search Portlet" shipped with Liferay. It seems to have been part of Liferay for quite a while now, but it's somewhat hard to find, because there is hardly any documentation for it (I only found the Liferay wiki page, which isn't really anything at all). It searches only within web content articles and shows links to the pages rather than just a link an isolated view of the article. It has much less configuration options than the default search portlet, however. Pretty much all it allows to change is whether articles actually have to be placed on at least one page to show up in the results.
So there is no need for any kind of custom indexer or any other "hack"...all we need to do is use the correct portlet. We will only need to write a hook that changes the appearance of the result page.
What you ask is interesting but your ideas are on the wrong direction.
Specially idea 2 it's particulary wrong because you cannot do indexing work meanwhile a page is rendered. Think about performace only.
In Liferay pages and assets are not directly linked: pages have portlets and portlets display assets (web content and more).
Liferay indexing refers and scans assets content, not refers the display result of the assets. Think about permission: the same page can display different contents depends on the user who looks.
bye
Currently, we're running revolution 2.2. On site_content, we have some tags that are ran for crawling twitter. I want to start tracking the number of results for each tag as results come in, to determine which tags don't return that many results, etc.
So I was thinking that I should create a new table (twitter_data), and have a foreign key that will link it to the search tag ID, which is stored in site_content.
What is the best path to accomplish this? Should I create my table then run the reverse schema tool, outlined here?
http://rtfm.modx.com/display/revolution20/Reverse+Engineer+xPDO+Classes+from+Existing+Database+Table#ReverseEngineerxPDOClassesfromExistingDatabaseTable-CreatingaMySQLtable
I also found this, but not sure if this is what I should be looking into:
http://rtfm.modx.com/display/revolution20/Using+Custom+Database+Tables+in+your+3rd+Party+Components
Probably not - if you can avoid modifying the core modx schema do so. an external table may be your best option, but requires a fair bit of work.
though if you can explain wht you mean by 'tags' a little better [html tags? snippets? content tags? not sure what you mean] there may be other options. for example. one of our clients wanted to count page hits [and didn't want to use google to do it] so all we did was to create a template variable bound to each page they wanted to count and then updated that appropriate variable by writing plugin to fire on the onpageload or onpagerender event. [I don't ermember exactly which or what it was called]
Basically, you may be able to do this by writng a plugin rather than trying to extend anything or add snippets/chunks.