TYPO3: indexed_search - index all pages without visiting first

TYPO3: indexed_search - index all pages without visiting first - search

Normally indexed_search only indexes already visited pages. If a user hasn't visited the page, there are no search results. Is there a way to index all pages without visiting? Or do I have to use another extension?
Typo3: v. 4.2.8
Indexed_search: v. 2.11.1

Use the crawler extension.

StephenKing's answer would be the preferred solution. Less sophisticated method is setting Cron to call Wget to crawl pages in a defined interval:
0 * * * * wget -nv -nc -nd -np -r -l0 -P/var/tmp/ -erobots=off --accept=htm,html,php --wait=1 --delete-after http://www.example.com/ >/dev/null 2>&1

Related

How to suppress cron email feedback?

I have a cron every two minutes (*/2 * * * *) firing the following command...
wget "http://www.example.com/wp-cron.php?import_key=my_key_stringimport_id=16&action=trigger"
Trouble is, it is emailing me every two minutes, and also creating copious tiny files on the server, one each time.
I have tried several things. I know there is plenty of info out there about suppressing email feedback from cron.
cPanel's Cron page, where my crons are set, makes clear: "If you do not want an email to be sent for an individual cron job, you can redirect the command’s output to /dev/null. For example: mycommand >/dev/null 2>&1"
But when I did it like this...
wget -O "http://www.example.com/wp-cron.php?import_key=my_key_stringimport_id=16&action=trigger" >/dev/null 2>&1
... the cron stopped functioning.
(I believed an -O was necessarily to direct the output).
What is the proper way to formulate this?

To suppress mails from cron you can add before your line in cron MAILTO
MAILTO=""
*/2 * * * * command

This seems to do the trick...
wget --quiet -O "http://www.example.com/wp-cron.php?import_key=my_key_stringimport_id=16&action=trigger"
Ie. Add --quiet
Answer found elsewhere on Stackoverflow.
Bit confused how --quiet and -O co-exist.

Crontab doesn't work with wget

I set the following cron tab when I'm under root user.
* * * * * wget -q --spider http://mytargeturl.com/my/url/
The codes are under the same server but owned by another user (and I couldn't set a crontab with that user). I have to request the page with wget because of MVC link system complexity.
When I run:
crontab -l -u root
I can see this crontab setting.
Why would be the reason that crontab doesn't work?
Thanks.

Your syntax looks fine and this should work. Check /var/log/cron to make sure that it is indeed running, and if so, consider logging the command's output to a file and then inspect the file to pinpoint where the problem may be.

Status: 301 Moved Permanently ActiveCollab

I tried to set cronjob on my server for ActiveCollab
I use this
*/5 * * * * php "/home/bbb/public_html/tasks/frequently.php" RnuFA > /dev/null
but it always returns error message :
Status: 301 Moved Permanently
Location: https://mywebsite.com/
Content-type: text/html
I've tried to execute the command through SSH and it worked properly.
Can someone help me telling what configuration on my server that need to be checked for this kind of issue?
Thank you

Official recommendation is to use cURL to trigger scheduled tasks, not executable PHP. Currently it is just a recommendation, but upcoming releases will stop shipping /tasks folder so you will have to use cURL.
There are many environments (more than we expected) where there is one PHP that web server uses to prepare the page, and another PHP that runs via command line interface (CLI). This causes all sort of problems, so we decided to use only way way of triggering tasks - via URL.
Bottom line - use cURL. Documentation is here:
https://activecollab.com/help/books/self-hosted-edition/scheduled-tasks-setup.html
Here are sample commands:
*/3 * * * * /usr/bin/curl -s -L "http://url/of/frequently?code=XyZty" > /dev/null
0 * * * * /usr/bin/curl -s -L "http://url/of/hourly?code=XyZty" > /dev/null
0 12 * * * /usr/bin/curl -s -L "http://url/of/daily?code=XyZty" > /dev/null
0 7 * * * /usr/bin/curl -s -L "http://url/of/paper?code=XyZty" > /dev/null
but make sure to check Administration > Scheduled Tasks page of your activeCollab for exact URLs that you need to trigger.

Cron output to nothing

I've noticed that my cron outputs are creating index.html files on my server. The command I'm using is wget http://www.example.com 2>&1. I've also tried including --reject "index.html*"
How can I prevent the output from creating index.html files?
--2013-07-21 16:03:01-- http://www.examplel.com
Resolving example.com... 192.0.43.10
Connecting to www.example.com|192.0.43.10|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 0 [text/html]
Saving to: `index.html.9'
0K 0.00 =0s
2013-07-21 16:03:03 (0.00 B/s) - `index.html.9' saved [0/0]

Normally, the whole point of running wget is to create an output file. A URL like http://www.example.com typically resolves to http://www.example.com/index.html, so by creating index.html, the wget command is just doing its job.
If you want to run wget and discard the downloaded file, you can use:
wget -q -O /dev/null http://www.example.com
The -o /dev/null discards log messages; -O /dev/null discards the downloaded file.
If you want to be sure that anything wget writes to stdout or stderr is discarded:
wget -q -O /dev/null http://www.example.com >/dev/null 2>&1
In a comment, you say that you're using the wget command to "trigger items on your cron controller" using CodeIgniter. I'm not familiar with CodeIgniter, but downloading and discarding an HTML file seems inefficient. I suspect (and hope) that there's a cleaner way to do whatever you're trying to do.

Using wget in a crontab to run a PHP script

I set up a cron job on my Ubuntu server. Basically, I just want this job to call a php page on an other server. This php page will then clean up some stuff in a database. So I tought it was a good idea to call this page with wget and then send the result to /dev/null because I don't care about the output of this page at all, I just want it to do its database cleaning job.
So here is my crontab:
0 0 * * * /usr/bin/wget -q --post-data 'pass=mypassword' http://www.mywebsite.com/myscript.php > /dev/null 2>&1
(I post a password to make sure no one could run the script but me). It works like a charm except that wget writes each time an empty page in my user directory: the result of downloading the php page.
I don't understand why the result isn't send to /dev/null ? Any idea about the problem here?
Thanks you very much!

wget's output to STDOUT is it trying to make a connection, showing progress, etc.
If you don't want it to store the saved file, use the -O file parameter:
/usr/bin/wget -q --post-data -O /dev/null 'pass=mypassword' http://www.mywebsite.com/myscript.php > /dev/null 2>&1
Checkout the wget manpage. You'll also find the -q option for completely disabling output to STDOUT (but offcourse, redirecting the output as you do works too).

wget -O /dev/null ....
should do the trick

you can mute wget output with the --quiet option
wget --quiet http://example.com

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

TYPO3: indexed_search - index all pages without visiting first - search

Normally indexed_search only indexes already visited pages. If a user hasn't visited the page, there are no search results. Is there a way to index all pages without visiting? Or do I have to use another extension? Typo3: v. 4.2.8 Indexed_search: v. 2.11.1

Use the crawler extension.

StephenKing's answer would be the preferred solution. Less sophisticated method is setting Cron to call Wget to crawl pages in a defined interval: 0 * * * * wget -nv -nc -nd -np -r -l0 -P/var/tmp/ -erobots=off --accept=htm,html,php --wait=1 --delete-after http://www.example.com/ >/dev/null 2>&1

Related

How to suppress cron email feedback?

Crontab doesn't work with wget

Status: 301 Moved Permanently ActiveCollab

Cron output to nothing

Using wget in a crontab to run a PHP script

Categories

Resources