Ruby script extracts wrong value when executed with crontab - linux

OS: Amazon Linux
I have a Ruby script that connects to a site, then it searches with an XPath request for a div block where is the stats counter I want to parse.
Then it compares the number from the site with the current value in the database, if the number has increased it sends me an email.
The problem is that, then I run the script from the current directory it works.
The script parses the block of text which contains a value.
I extract the value with Regex like this (/\d/)
...
But when it the script executes by crontab it gets some strange value like
...041704300440043504330438044104420440043804400430432043004304304304304304404370430432043004420435043043504390447043504400435043704320430044804430430...
I don't know how to debug it because, when I run the script manually it works, but fails with strange value when executed by crontab.
The text in the site is russian, encoded with Windows-1251.
Maybe there is something wrong with that.
I have set # encoding: utf-8, in the .rb file.

That could be an environment problem, which could include bad paths, etc. You can compare your ENV from the command-line to the environment when launched by crontab.
Try:
ruby -rpp -e 'pp ENV' > /tmp/crontab_env.out
from crontab, then:
ruby -rpp -e 'pp ENV' > /tmp/cmd_env.out
from the command-line, then:
vimdiff /tmp/*env.out
or use a regular editor.

If you're using RVM, note that it is typically only available to interactive shells. There's a whole section in the RVM manual dedicated to this topic: RVM: Ruby Version Manager - Using Cron with RVM
It could be that this is simply a problem of the wrong Ruby version, including its Gems, being used. Try removing the hashbang line in your script, and calling it like this in your crontab:
1 0 * * * /usr/local/rvm/bin/ruby-1.9.3-p362 /path/to/script.rb
This should make sure the proper environment is loaded with the Ruby binary.
If the actual problem is that RVM isn't even available for non-interactive scripts, you could also go one step further and do what your shell does when it's loading RVM—scroll to the right, this is a big line:
1 0 * * * /bin/bash -l -c 'source "$HOME/.rvm/scripts/rvm" && rvm use 1.9.3-p362 && ruby /path/to/script.rb

Problems with cron jobs are often caused by having the wrong environment. The script probably depends on an environment variable that's set when you start an interactive shell (through ~/.profile, ~/.bashrc or similar), but not when your program is started directly, by cron.
Get a list of environment variables and their current values by typing env. Add a cron job that simply runs env. Compare the outputs and chip away until you find the culprit.
I'd say LANG and friends are a good place to start. Get a list of language and encoding-related environment variables by typing locale.

Related

Why would one include ". ./.profile" in a crontab entry in Unix?

I want to know the use of . ./.profile whenever we execute cron jobs. I have seen many scripts having this included. The question is, what is the use and what if I don't add it?
Example:
00 1-22 * * 1-5 . ./.profile ; /global/u1/sie/rox/Scripts/Calls.ksh >/dev/null 2>&1
. somefile is the POSIX-compliant equivalent to the bash builtin source: Running source somefile in bash, or . somefile in any POSIX-compliant shell, executes every command inside that script in that existing shell.
In terms of why this is useful in a crontab: cron runs with a very minimal environment -- it may not even have a PATH set, and is unlikely to have many other facilities. If your scripts depend on environment variables being present, it can be necessary to either specify them in the crontab or to source in (that is, execute in the existing shell) a script which defines them.
That said, I advise against this idiom:
.profile is used by login sessions -- sessions with a user interacting with the shell in real-time -- and folks intending to customize their interactive session's behavior are liable to make modifications without keeping scheduled jobs in mind.
It's not obvious by reading your crontab which environment variables ~/.profile will or won't set, and thus difficult to reason about the state of the environment.
Instead, you should set environment variables at the top of your crontab:
PATH=/bin:/usr/bin:/usr/local/bin
VARNAME=VALUE
# ...etc...
0 1-22 * * 1-5 /global/u1/sie/rox/Scripts/Calls.ksh >/dev/null 2>&1
The profile files are the shell profiles, you can add code to it that will run as soon as the shell start up, ./profile is the profile file for Ksh and Bourne, /.bash_profile is for bash /.login is for Tcsh and Csh.
When a script calls the profile it's because it needs something from it, i.e $path variables or even specific commands that it might not have access to. In this case, since cron doesn't have access to much since it runs in a minimal enviroment that script will pull the .profile because it depends on something that's in there.
More info here
and here

What is the correct execution command for a crontab job?

Set up
I have several bashfiles on my computer which I want to run periodically.
I can run the bashfiles manually in Terminal (Mac OS), e.g. cd'ing myself to the correct folder and subsequently executing,
./France_run.txt
gives the desired result.
Problem
I do not want to run the bashfiles manually.
I've created cronjobs in crontab, e.g.
0 0 * * 2 /Users/mypath/France_run.txt
which should run each Tuesday at 00:00. However, nothing happens.
Am I only referring to the file and missing a 'run this script' command? Or is it something else?
You may be only referring to the file, and it's probably logging an error somewhere (usually /var/log/message, or in the mail file of the root user...which is disabled by default on Macs).
The thing about running scripts through cron is that it runs under a different environment. When you normally log in to a Bash session, certain environment variables get automatically set, so the system automatically checks for things like a path (locations in the file system where executables can be found). Different Unix like systems handle this situation slightly differently...I can't recall the details of how Macs deal with it, but on some systems, I've had to explicitly provide the full path to, for example, the Bash executable in order to get stuff to work.
The location of the executable for the scripts is usually /bin/bash, or /bin/sh, or something like that. So when going through a Bash session, if you call /Users/mypath/France_run.txt and that file is an executable Bash script (e.g. the first line is something like #!/bin/bash and the file's executable bit is set) then system knows to automatically run something like /bin/bash /Users/mypath/France_run.txt.
In the context of cron, however, you don't automatically get those conveniences, so you may have to spell out just about everything (i.e. specify the full paths to all binaries or executables). Again, this is not always the case. I just looked at a Debian system where I created some cron jobs to run scripts, and I didn't have to call /bin/bash there, but I do recall having to do something like that int the past on a Mac.
So your cron job may just need to specify the full path to the Bash binary:
0 0 * * 2 /bin/bash /Users/mypath/France_run.txt
And if France_run.txt makes any calls to system binaries (like ls), you may need to fully qualify those as well (/bin/ls instead of just ls).
Also, depending on how the script is written, it may even be necessary to cd into the directory of the script, as if you were running it manually:
0 0 * * 2 cd /Users/mypath; /bin/bash ./France_run.txt
(cd is a Bash built-in, so there's no path to specify there)

Import PATH environment variable into Bash script launched with cron

When creating Bash scripts, I have always had a line right at the start defining the PATH environment variable. I recently discovered that this doesn't make the script very portable as the PATH variable is different for different versions of Linux (in my case, I moved the script from Arch Linux to Ubuntu and received errors as various executables weren't in the same places).
Is it possible to copy the PATH environment variable defined by the login shell into the current Bash script?
EDIT:
I see that my question has caused some confusion resulting in some thinking that I want to change the PATH environment variable of the login shell with a bash script, which is the exact opposite of what I want.
This is what I currently have at the top of one of my Bash scripts:
#!/bin/bash
PATH=/usr/local/sbin:/usr/local/bin:/usr/bin:/usr/bin/site_perl:/usr/bin/vendor_perl:/usr/bin/core_perl
# Test if an internet connection is present
wget -O /dev/null google.com
I want to replace that second line with something that copies the value of PATH from the login shell into the script environment:
#!/bin/bash
PATH=$(command that copies value of PATH from login shell)
# Test if an internet connection is present
wget -O /dev/null google.com
EDIT 2: Sorry for the big omission on my part. I forgot to mention that the scripts in question are being run on a schedule through cron. Cron creates it's own environment for running the scripts which does not use the environment variables of the login shell or modify them. I just tried running the following script in cron:
#!/bin/bash
echo $PATH >> /home/user/output.txt
The result is as follows. As you can see, the PATH variable used by cron is different to the login shell:
user#ubuntu_router:~$ cat output.txt
/usr/bin:/bin
user#ubuntu_router:~$ echo $PATH
/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games
Don't touch the user's PATH at all unless you have a specific reason. Not doing anything will (basically) accomplish what you ask.
You don't have to do anything to get the user's normal PATH since every process inherits the PATH and all other environment variables automatically.
If you need to add something nonstandard to the PATH, the usual approach is to prepend (or append) the new directory to the user's existing PATH, like so:
PATH=/opt/your/random/dir:$PATH
The environment of cron jobs is pretty close to the system's "default" (for some definition of "default") though interactive shells may generally run with a less constrained environment. But again, the fix for that is to add any missing directories to the current value at the beginning of the script. Adding directories which don't exist on this particular system is harmless, as is introducing duplicate directories.
I've managed to find the answer to my question:
PATH=$PATH:$(sed -n '/PATH=/s/^.*=// ; s/\"//gp' '/etc/environment')
This command will grab the value assigned to PATH by Linux from the environment file and append it to the PATH used by Cron.
I used the following resources to help find the answer:
How to grep for contents after pattern?
https://help.ubuntu.com/community/EnvironmentVariables#System-wide_environment_variables

Can the Linux "at" command run in environments other than /bin/sh?

I need a scheduler (for one time only actions) for a site I'm coding (in php), and I had two ideas:
1- Run a php script with crontab and verify against a database of scheduled actions and execute ones that are older than current time.
2- Schedule various tasks with the "at" command.
The second option seems much better and simpler, so that's what I'm trying to do. However, I haven't found a way to tell "at" to run a command using the PHP interpreter, and so far I've been creating a .sh script, which contains a single command, which is to run a file through the php interpreter. That is far from the optimal setting, and I wish I could just execute the php code directly through "at", something like:
at -e php -f /path/to/phpscript time
Is it possible? I haven't found anything about using environments other than bash in either the man or online.
You can prepend phpscript with a #!/usr/bin/php (or wherever your php script is stored) and make /path/to/phpscript executable. This is exactly what the #! syntax is for.
Just so it's clear, your phpscript would look like this:
#!/usr/bin/php
...your code goes here
The command you specify to at is executed by /bin/sh, but sh can invoke any command, executed directly or by any specified interpreter.
The following works on my Ubuntu 12.04 system with the bash shell:
$ cat hello.php
#!/usr/bin/php
<?php
echo "Hello, PHP\n";
?>
$ echo "$PWD/hello.php > hello.php.out" | at 16:11
warning: commands will be executed using /bin/sh
job 4 at Sat Aug 25 16:11:00 2012
$ date
Sat Aug 25 16:11:05 PDT 2012
$ cat hello.php.out
Hello, PHP
$
In some cases, you'll have to do some extra work to set environment variables correctly (it's not necessary for this simple case). Quoting the man page:
For both at and batch, commands are read from standard input or the
file specified with the -f option and executed. The working directory,
the environment (except for the variables BASH_VERSINFO, DISPLAY,
EUID, GROUPS, SHELLOPTS, TERM, UID, and _) and the umask are retained
from the time of invocation.
As at is currently implemented as a setuid program, other environment
variables (e.g. LD_LIBRARY_PATH or LD_PRELOAD) are also not exported.
This may change in the future. As a workaround, set these variables
explicitly in your job.

Cron does not run from /root

If I run a script from /home/<user>/<dir>/script.sh, as root, the cron works pretty well. But If I run the script from /root/<dir>/script.sh (as root, again), the cron does not seem to work.
Having run afoul of various default $PATHs in the past when using 'cron', I always spell in full the absolute $PATH for each executable file and each target file. I always assume that 'cron' has NO $PATH set and has NO current-working-directory.
In other words don't use a command like
"myprocess abc*.txt"
but do it in full like
"/usr/localbin/myprocess /home/jvs/abc*.txt".
Alternatively, create a bash script which does the job, and call that bash script with a full absolute path, such as
"/usr/local/bin/myprocess_abc_txts".
If you need to have some flexibility in the script, use environment variables which are set specifically within the bash script you call with 'cron'.
I think you need to add a little more information. I'd guess it is a permissions thing though. Add the permissions of the file, the directories, and the line in your crontab so we can help. Also, if you are putting this in /root, are you running this in root's crontab?
Remember the environment - especially when run by cron rather than by root. When cron runs something, you probably don't have anything much set of your environment, unlike when you run a command via at. It is also not clear what your current directory will be. So, for commands that will be run by cron, use a script (as you're already doing) and make sure it sets enough of the environment for it to run. And make sure your environment setting code is not interactive!
On my machines, I have a mechanism such that the cron entry reads (for example):
23 1 * * 1-5 /usr/bin/ksh /work1/jleffler/bin/Cron/weekday
The weekday script in the Cron directory is a link to a standard script that first sets the environment and then runs the command /work1/jleffler/bin/weekday (in this case - it uses the name of the command to determine what to run).
The actual script in the Cron directory is:
: "$Id: runcron.sh,v 2.1 2001/02/27 00:53:22 jleffler Exp $"
#
# Commands to be performed by Cron (no debugging options)
# Set environment -- not done by cron (usually switches HOME)
. $HOME/.cronfile
base=`basename $0`
cmd=${REAL_HOME:-/real/home}/bin/$base
if [ ! -x $cmd ]
then cmd=${HOME}/bin/$base
fi
exec $cmd ${#:+"$#"}
I've been using it a while now - this version since 2001 - and it works a treat for me. I'm using a basic (Sun Solaris 10) implementation of cron; there may be new features in new versions of cron on other platforms to make some of this unnecessary. (The $REAL_HOME stuff is a weirdness of mine; pretend it says $HOME - though that makes some of the script unnecessary for you.) The .cronfile is responsible for the environment setting - it does quite a lot, but that's my problem, not yours.
It could be because you're looking for relative directories/files in the script which are located when running it from /home/ but not from /root, because /root is not in /home/root nor would it look like a users homefolder in /home/
Can you check and see if it is looking for relative files, or post the script?
On another note, why don't you just set it to run from a user's homefolder then?
Another way to run sh script is place your bash script in /usr/bin directory and simply run command bash yourscript.sh without adding /usr/bin/ directory

Resources