Using '~' (A.K.A. tilde) in input directive does not seem to work? - tilde

This is the first time I've tried to use a "~" in my input.
It works when I convert the "~" to an absolute path.
Proof they are the same file:
(CentOS5-Compatible) [tboyarski#login2 6-bamMetrics]$ ls ~/share/references/rRNA.ensg72.hg19.interval_list
-rw-rw-r-- 1 fcchan users 24K Dec 12 2013 /home/tboyarski/share/references/rRNA.ensg72.hg19.interval_list
(CentOS5-Compatible) [tboyarski#login2 6-bamMetrics]$ ls /genesis/extscratch/clc/references/rRNA.ensg72.hg19.interval_list
-rw-rw-r-- 1 fcchan users 24K Dec 12 2013 /genesis/extscratch/clc/references/rRNA.ensg72.hg19.interval_list
(CentOS5-Compatible) [tboyarski#login2 6-bamMetrics]$ ls /home/tboyarski/share/references/rRNA.ensg72.hg19.interval_list
-rw-rw-r-- 1 fcchan users 24K Dec 12 2013 /home/tboyarski/share/references/rRNA.ensg72.hg19.interval_list
Doesn't work:
rule intervalList:
input:
"~/share/references/rRNA.ensg72.hg19.interval_list"
Works:
rule intervalList:
input:
"/home/tboyarski/share/references/rRNA.ensg72.hg19.interval_list"
I've only tried using it in the input directive at this time. I might spend a moment to see if it works as an output directive (not something I need, just curious).
Thoughts?
EDIT
#alvits Was able to point out that ~ is converted by the shell to ${HOME} before it gets evaluated. It would seem that when providing ~ to snakemake, this conversion does not occurr. Easy alternative is to just always use ${HOME}, which works both on MacOSX and Linux :).
I was able to use the following successfully:
rule intervalList:
input:
"${HOME}/share/references/rRNA.ensg72.hg19.interval_list"
Second EDIT
User pointed out what I thought was a solution was in fact not.
Solution is to not use either "~" or "${HOME}" in the input directive.

You may try wrapping the path in os.path.expanduser (available after import os in your snakefile):
On Unix and Windows, return the argument with an initial component of ~ or ~user replaced by that user‘s home directory.
(Quoted from the documentation: https://docs.python.org/3/library/os.path.html#os.path.expanduser)

I'm going to close this question.
I ended up going with #alvits suggestion.
I chose it because that is what tilde represents, so it makes sense to provide it in an non-referenced form.
expanduser wrapping, although a possibility, is overkill for what is required.
Thank you everyone for your help!
2017/06/09 - EDIT: Don't use "~" or "${HOME}" in Snakemake inputs.
After seeing the comment by Johannes Köster, I double checked what I had done, and I was unable to reproduce it.
I'm sorry for the confusion. Thank you so much for catching this error J.K.
I'm not sure what I had setup prior. I ended up changing the way the file is being used, even with git, I cannot reproduce it.

Related

how to download batch of data with linux command line?

For example I want to download data from:
http://nimbus.cos.uidaho.edu/DATA/OBS/
with the link:
http://nimbus.cos.uidaho.edu/DATA/OBS/pr_1979.nc
to
http://nimbus.cos.uidaho.edu/DATA/OBS/pr_2015.nc
How can I write a script to download all of them? with wget?and how to loop the links from 1979 to 2015?
wget can take file as input which contains URLs per line.
wget -ci url_file
-i : input file
-c : resume functionality
So all you need to do is put the URLs in a file and use that file with wget.
A simple loop like Jeff Puckett II's answer will be sufficient for your particular case, but if you happen to deal with more complex situations (random urls), this method may come in handy.
Probably something like a for loop iterating over a predefined series.
Untested code:
for i in {1979..2015}; do
wget http://nimbus.cos.uidaho.edu/DATA/OBS/pr_$i.nc
done

How can you hide passwords in command line arguments for a process in linux

There is quite a common issue in unix world, that is when you start a process with parameters, one of them being sensitive, other users can read it just by executing ps -ef. (For example mysql -u root -p secret_pw
Most frequent recommendation I found was simply not to do that, never run processes with sensitive parameters, instead pass these information other way.
However, I found that some processes have the ability to change the parameter line after they processed the parameters, looking for example like this in processes:
xfreerdp -decorations /w:1903 /h:1119 /kbd:0x00000409 /d:HCG /u:petr.bena /parent-window:54526138 /bpp:24 /audio-mode: /drive:media /media /network:lan /rfx /cert-ignore /clipboard /port:3389 /v:cz-bw47.hcg.homecredit.net /p:********
Note /p:*********** parameter where password was removed somehow.
How can I do that? Is it possible for a process in linux to alter the argument list they received? I assume that simply overwriting the char **args I get in main() function wouldn't do the trick. I suppose that maybe changing some files in /proc pseudofs might work?
"hiding" like this does not work. At the end of the day there is a time window where your password is perfectly visible so this is a total non-starter, even if it is not completely useless.
The way to go is to pass the password in an environment variable.

lua pattern, exclusion doesn't work with end of string

Maybe exclusion is not the correct term, but I'm talking about using the following in lua's string.find() function :
[^exclude]
It doesn't seem to work if the character is followed by nothing, IE it's the last character in the string.
More specifically, I'm getting a list of processes running and attempting to parse them internally with LUA.
root#OpenWrt:/# ps | grep mpd
5427 root 21620 S mpd /etc/mpd2.conf
5437 root 25660 S mpd
This wouldn't be an issue if I could expect a \n every time, but sometimes ps doesn't list itself which creates this issue. I want to match:
5437 root 25660 S mpd
From this I will extract the PID for a kill command. I'm running an OpenWRT build that doesn't support regex or exact options on killall otherwise I'd just do that.
(%d+ root%s+%d+ S%s+mpd[^ ])
The above pattern does not work unfortunately. It's because there is no character after the last character in the last line I believe. I have also tried these:
(%d+ root%s+%d+ S%s+mpd$)
The above pattern returns nil.
(%d+ root%s+%d+ S%s+mpd[^ ]?)
The above pattern returns the first process (5427)
Maybe there is a better way to go about this, or just a simple pattern change I can make to get it to work, but I can't seem to find one that will only grab the right process. I can't go off PID or VSZ since they are variable. Maybe I'll have to see if I can compile OpenWRT with better killall support.
Anyways, thanks for taking the time to read this, and if this is a duplicate I'm sorry but I couldn't find anything similar to my predicament. Any suggestions are greatly appreciated!
Given:
local s = [[5427 root 21620 S mpd /etc/mpd2.conf
5437 root 25660 S mpd]]
The following pattern
string.match(s,"(%d+)%s+root%s+%d+%s+S%s+mpd[%s]-$")
returns:
5437 root 25660 S mpd
whereas this:
string.match(s,"(%d+%s+root%s+%d+%s+S%s+mpd[%s]%p?[%w%p]+)")
returns:
5427 root 21620 S mpd /etc/mpd2.conf

Automated ACL Check in shell script

I'd like to get some ideas from you on how to implement that. Let me explain a little bit my problem:
Scenario:
We have a system that must have some especific ACLs set in order to run it. So, before running it would be great if I could run a sort of pre check in order to verify if everything was set correctly.
Goal:
Create a script that checks those ACLs before starting the system alerting in case one of them is wrong based in a list of files/folder and its ACLs.
Problems:
Since the getfacl result is not a simple return, the only way I found to do such checking was parsing the result and analising each piece of it, that not as elegant as I'd like it could be.
I doubt many of you had to do something ACLs check but for sure you guys can contribute to my cause :)
Thanks everybody in advance
How about using Python module pylibacl
>>> import posix1e
>>> acl1 = posix1e.ACL(file="file1.txt")
>>> print acl1
user::rw-
group::r--
other::r--
Since the getfacl result is not a simple return, the only way I found to do such checking was parsing the result and analising each piece of it, that not as elegant as I'd like it could be.
What exactly are you trying to do? If you're just comparing the result of calling getfacl to a desired ACL, it should be easy. For example, assuming that you have stored your desired ACL in a file named acl-i-want, you could do something like this:
getfacl /path > acl-i-have
if ! diff -q acl-i-have acl-i-want; then
echo "ACLs are different."
fi

Compare two websites and see if they are "equal?"

We are migrating web servers, and it would be nice to have an automated way to check some of the basic site structure to see if the rendered pages are the same on the new server as the old server. I was just wondering if anyone knew of anything to assist in this task?
Get the formatted output of both sites (here we use w3m, but lynx can also work):
w3m -dump http://google.com 2>/dev/null > /tmp/1.html
w3m -dump http://google.de 2>/dev/null > /tmp/2.html
Then use wdiff, it can give you a percentage of how similar the two texts are.
wdiff -nis /tmp/1.html /tmp/2.html
It can be also easier to see the differences using colordiff.
wdiff -nis /tmp/1.html /tmp/2.html | colordiff
Excerpt of output:
Web Images Vidéos Maps [-Actualités-] Livres {+Traduction+} Gmail plus »
[-iGoogle |-]
Paramètres | Connexion
Google [hp1] [hp2]
[hp3] [-Français-] {+Deutschland+}
[ ] Recherche
avancéeOutils
[Recherche Google][J'ai de la chance] linguistiques
/tmp/1.html: 43 words 39 90% common 3 6% deleted 1 2% changed
/tmp/2.html: 49 words 39 79% common 9 18% inserted 1 2% changed
(he actually put google.com into french... funny)
The common % values are how similar both texts are. Plus you can easily see the differences by word (instead of by line which can be a clutter).
The catch is how to check the 'rendered' pages. If the pages don't have any dynamic content the easiest way to do that is to generate hashes for the files using a md5 or sha1 commands and check then against the new server.
IF the pages have dynamic content you will have to download the site using a tool like wget
wget --mirror http://thewebsite/thepages
and then use diff as suggested by Warner or do the hash thing again. I think diff may be the best way to go since even a change of 1 character will mess up the hash.
I've created the following PHP code that does what Weboide suggest here. Thanks Weboide!
the paste is here:
http://pastebin.com/0V7sVNEq
Using the open source tool recheck-web (https://github.com/retest/recheck-web), there are two possibilities:
Create a Selenium test that checks all of your URLs on the old server, creating Golden Masters. Then running that test on the new server and find how they differ.
Use the free and open source (https://github.com/retest/recheck-web-chrome-extension) Chrome extension, that internally uses recheck-web to do the same: https://chrome.google.com/webstore/detail/recheck-web-demo/ifbcdobnjihilgldbjeomakdaejhplii
For both solutions you currently need to manually list all relevant URLs. In most situations, this shouldn't be a big problem. recheck-web will compare the rendered website and show you exactly where they differ (i.e. different font, different meta tags, even different link URLs). And it gives you powerful filters to let you focus on what is relevant to you.
Disclaimer: I have helped create recheck-web.
Copy the files to the same server in /tmp/directory1 and /tmp/directory2 and run the following command:
diff -r /tmp/directory1 /tmp/directory2
For all intents and purposes, you can put them in your preferred location with your preferred naming convention.
Edit 1
You could potentially use lynx -dump or a wget and run a diff on the results.
Short of rendering each page, taking screen captures, and comparing those screenshots, I don't think it's possible to compare the rendered pages.
However, it is certainly possible to compare the downloaded website after downloading recursively with wget.
wget [option]... [URL]...
-m
--mirror
Turn on options suitable for mirroring. This option turns on recursion and time-stamping, sets infinite recursion depth and keeps FTP
directory listings. It is currently equivalent to -r -N -l inf --no-remove-listing.
The next step would then be to do the recursive diff that Warner recommended.

Resources