Does indexed search work in typo3 6.1?

Does indexed search work in typo3 6.1? - search

I have just finished migrating an old templavoila site to tyop3 6.1 and setup the indexed search (much like it was in 4.7) I can't get indexed search to index any content on any page. I would like to know if this extension actually works with a TV page and what I may have overlooked in it's setup.

indexed_search is a core extension and always works on the current version. If you are using MySQL its also recommended to install index_search_mysql.
To activate indexing just set the options
config {
#indexed_search
index_enable = 1
}
And check the results in "Web > Info > Indexed Search". There are also scheduler tasks to clean up indexes.

Actually Merecs answer is wrong. You will have to set
page.config.index_enable = 1
for it to work.

Related

typo3 crawler indexing content

I use typo3 7.6.10
I have crawler that index all pages and in search result are showed but crawler is not indexing the "content" of the page.
I have to write something in Configuration?

This tutorial by Xavier Perseguers tells you everything you need to do to index pages and records with Indexed Search. It was made for an older version of TYPO3 (as you can see from the screenshots) but it should work for newer releases too.

What to do after TYPO3 security update from 13.09.2016?

I don't understand the security patch from last week: https://typo3.org/teams/security/security-bulletins/typo3-core/typo3-core-sa-2016-022/ . I have an old TYPO3 6.2 installation. I have truncated all cf_* tables and opened the pages with UID 2-6. No cHash. As a result I see 13 cf_cache_hash-entries.
Now I have opened a detail page from a listing page in frontend. I see some parameters in URL like action, controller, the UID of the current displayed record and of cause a cHash.
Then I have copied these parameters (excluding id=x) to the URL of my pages 2-6. In cf_cache_hash I have still 13 records. So, there is no cache flooding.
Or how I have to interprete this quote:
Links with a valid cHash argument lead to newly generated page cache
entries. Because the cHash is not bound to a specific page, attackers
could use valid cHash arguments for multiple pages, leading to
additional useless page cache entries.
Next problem:
If extensions like realurl are used, it is required to flush their
caches (and TYPO3 caches as well)
Can you please tell me WHICH tables I/we should clear?
tx_realurl_urldecodecache
tx_realurl_urlencodecache
are maybe OK. But what about tx_realurl_pathcache? Of cause, I can clear that, but what about older entries for earlier realurl configuration? If I truncate that table, these old entries are not valid anymore and they were not builded again. So, old Search Engine Results are invalid.
Question from one of our customers: Is it enough to clear system cache in backend or should he click on Clear all Cache in Installtool? Nice. IMO, it is not enough and the tables have to be truncated on DB directly. Right.
Next one:
This means if such URLs are indexed by a search engine, visitors from
this search engine will end up on a not properly working page.
Hey cool. And now? What is the solution? Keep it as it is? IMO it depends on an InstallTool setting called: pageNotFoundOnCHashError. Right?
Please tell us what to do and please add some more details how to handle that.
Stefan

For me it boils down to (after installing the updated TYPO3 version):
If you don't use realurl: enable
$GLOBALS['TYPO3_CONF_VARS']['FE']['cHashIncludePageId'] = true;
& and you are probably "done". Of course all old google hits will be done for, but on a "public" site it's quite probable you never cared about google anyway if you didn't run realurl (or similar)
If you use realurl 1.X on a 6.2
Don't enable the config (there'll probably never be a proper patch)
Two options:
take the risk of a DDOS
use the 1.x version from https://github.com/mogic-le/typo3-realurl
If I understand it correctly it will set TYPO3 to no_cache mode if there is no hit on the caching table; While that is a performance issue, it will prevent cache table entries being made (as a side effect)
If you run 7.6+ and realurl 2
Wait for realurl 2.1 (and take the risk?)
Change the caching
framework to something like memcached (it's somewhat suggested
between the lines: If you have a caching backend that cannot be used
for a DDOS, you don't really have to care)
Use the fork from
helhum (though I think that won't help you one bit regarding old
links)

Realurl >= 2.1.0 supports this core option. But you are recommended to update to at least 2.1.4 because that fixes various other cHash issues.

Search only delivers media items after Sitecore Update

We updated our Sitecore CMS from version 6.3 to 6.6 SP2. This Sitecore version has the Intranet Module installed. Everything is working fine, but the Lucene Search doesn't seem to work properly.
There are two indexes defined. One for the whole content tree and one for the media library. The search only delivers results with media items (images, PDFs), but no pages. With the tool Luke I'm able to look into the indexes and I see the items there. But they are not in the search results on the website anymore.
I rebuilt the search indexes using the Sitecore Control Panel, but that didn't help.
As I said, it was working fine on Sitecore 6.3, but not on the updated 6.6 SP2.
Any idea what could be the problem?
Thanks in advance :)

Here is a blog post about Troubleshooting Sitecore Lucene search and indexing .
In shortcut:
Check if items are indexed correctly either using Luke.
Check if MatchAll query return page items:
SearchManager.GetIndex("your_index_name").CreateSearchContext()
.Search(new MatchAllDocsQuery(), int.MaxValue)
.FetchResults(0, int.MaxValue).Select(r => r.GetObject<Item>())
Check included templates:
<include hint="list:IncludeTemplate">

It turned out that the 3 missing fields _sclsMedia, _sclsSearchable and _scLang in the Content Lucene Index that are causing the search not to function. So I removed the 3 fields from the code in my solution and now I get search results again.
The question is why were those 3 fields lost during the update from Sitecore 6.3 to 6.6.

Monitoring search terms in Plone

I'd like to get a better idea of what people are searching for when they're using our website.
Just curious, what's the best way to monitor what's being entered into the search field in Plone 4? I saw this product — http://plone.org/products/ifsearchmonitor — but it's an old one. Has anyone used it with Plone 4 or know of something similar?

Okay I don't know why it took me so long to realize this, but it's built into Google Analytics. Here are instructions: https://support.google.com/analytics/answer/1012264?hl=en
And the search query parameters I used for Plone are: ##search, SearchableText, advanced_search

using google analytic's site search won't track users using the livesearch (without pressing enter and submit to the ##search view.
for awstats i use this extra section to track both:
# updated version for plone 4.3
# /livesearch_reply?q=testsuche
# /##search?SearchableText=testsuche
# /##updated_search?SearchableText=testsuche
# livesearches shown as q=, normal searches with just the phrase
ExtraSectionName1="Plone Suchabfragen"
ExtraSectionCodeFilter1="200 304"
ExtraSectionCondition1="URL,\/##search||URL,\/search||URL,\/##updated_search||URL,\/livesearch_reply"
ExtraSectionFirstColumnTitle1="Search:"
#ExtraSectionFirstColumnValues1="QUERY_STRING,SearchableText=([^&]+)||QUERY_STRING,q=([^&]+)"
ExtraSectionFirstColumnValues1="QUERY_STRING,SearchableText=([^&]+)||QUERY_STRING,(q=[^&]+)"
ExtraSectionFirstColumnFormat1="%s"
ExtraSectionStatTypes1=PL
ExtraSectionAddAverageRow1=0
ExtraSectionAddSumRow1=1
MaxNbOfExtra1=100
MinHitExtra1=1
if you want to track the livesearch in google analytics, you'll need to use event tracking: https://developers.google.com/analytics/devguides/collection/analyticsjs/events

Solr and web site indexing to create a site search

I was trying to build a 'site search' on a simple http site.
I have a site, lets call it www.mycompany.com, that is pure html.
Is there an easy way to use solr to index the entire site to build a full text search using solr as the engine?
I googled for a bit and could not find anything specific of the type:
Do A
Do B
...
profit!
Let me also know if I am a bit off with what is solr for :P
Thanks in advance.

Solr is only for indexing and searching text, it does not have a crawler since it's out the project's scope.
However take a look at Nutch, which is a crawler and not too hard to setup initially.
Nutch and Solr can be integrated if you need some Solr-specific feature to search the index.

$ bin/solr create -c corename
$ bin/post -c corename https://siteurl.com -recursive 2 -delay 1
This would do a basic index of the site but it would not be the best. If you want simple then there it is. It can be done.
I think this only works on solr 5+.

Two other options you might want to look at are Crawl Anywhere and Heritrix

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string