Perl - basic STDIN issue - linux

I'm now with Perl.
i have the following code which the purpose is to extract the software name
by text parsing.
the software name in this case is "ddd" :
print "Please provide full installation path (Ex:/a/b/c/ddd)\n";
my $installPath = <STDIN>;
#going to extract software name
my #soft = split '/', $installPath;
my $softName = print "#soft[4]\n";
print "$softName\n";
but,
instead of getting "ddd" as software name i got:
ddd
1
i don't understand from where the '1' comes from?
Thanks for the help.

The error comes from this:
my $softName = print "#soft[4]\n";
# ^^^^^
The function print returns 1 (true) when it succeeds, which it does here. The 1 is assigned to your variable, which you then print.
print "$softName\n";
Short recap:
my $installPath = <STDIN>; # "/a/b/c/ddd"
my #soft = split '/', $installPath; # 5th element is "ddd"
my $softName = print "#soft[4]\n"; # this prints "ddd", but "1" is returned
# ^^^^^ print returns 1, which is assigned to $softName
print "$softName\n"; # "1" is printed
What you want is:
my $softName = $soft[4];
Which is just taking the 5th element of the array. You should use $ and not # when referring to a single element. You can use # when referring to a slice, multiple elements.
A better way to do what you are trying to do is using File::Basename:
use File::Basename;
my $softName = basename($installPath);
File::Basename is a core module in Perl 5.

my $softName = print "#soft[4]\n"; is a bad way of treating an array, and this is what is causing the issue.
When referencing an array as a whole, then the # should be used. What you have done here by referencing #soft[4], you do point at a particular value in the array, but you are still referring to it in an array context, and since $softName is a scalar that only wants one single value, perl tries its best to figure out what you want, since you want nothing like it at all. To make it clear to perl that you are referencing a specific item in the array and not the array as a whole, use $ instead. Perl will understand since you also specify [4].
In addition, what is being assigned to $softName is not that array value, but the result of the print which is the status code (this is where the "1" comes from).
To correct your code, change that line to:
my $softName = $soft[4];

Related

What did I do wrong

import sys
super_heroes = {'Iron Man' : 'Tony Stark',
'Superman' : 'Clark Kent',
'Batman' : 'Bruce Wayne',
}
print ('Who is your favorite Superhero?')
name = sys.stdin.readline()
print ('Do you know that his real name is', super_heroes.get(name))
I'm doing a simple code here that should read an input in a dictionary and print it out after a string of letters, but when ran it prints out
"
Who is your favorite Superhero?
Iron Man
Do you know that his real name is None
"
Even Though the input is in my dictionary.
Your input is having a newline at the end of the line.
I have tried it online REPL. Check it
try following to resolve it.
name = sys.stdin.readline().strip()
After stripping Check here
sys.stdin.readline() returns the input value including the newline character, which is not what you expect. You should replace sys.stdin.readline() with input() or raw_input(), which are really more pythonic ways to get input values from the user, without including the newline character.
raw_input() is preferable to ensure that the returned value is of string type.
To go a little bit further, you can then add a test if name in super_heroes: to perform specific actions when the favorite superhero name is not in your dictionary (instead of printing None). Here is an example:
super_heroes = {'Iron Man' : 'Tony Stark',
'Superman' : 'Clark Kent',
'Batman' : 'Bruce Wayne',
}
print ('Who is your favorite Superhero?')
name = raw_input()
if name in super_heroes:
print ('Do you know that his real name is', super_heroes[name], '?')
else:
print ('I do not know this superhero...')
sys.std.readline() appends a line break at the end of user input you may want to replace it before getting your Super Hero:
name = name.replace('\n','')

set function with file- python3

I have a text file with given below content
Credit
Debit
21/12/2017
09:10:00
Written python code to convert text into set and discard \n.
with open('text_file_name', 'r') as file1:
same = set(file1)
print (same)
print (same.discard('\n'))
for first print statement print (same). I get correct result:
{'Credit\n','Debit\n','21/12/2017\n','09:10:00\n'}
But for second print statement print (same.discard('\n')) . I am getting result as
None.
Can anybody help me to figure out why I am getting None. I am using same.discard('\n') to discard \n in the set.
Note:
I am trying to understand the discard function with respect to set.
The discard method will only remove an element from the set, since your set doesn't contain just \n it can't discard it. What you are looking for is a map that strips the \n from each element like so:
set(map(lambda x: x.rstrip('\n'), same))
which will return {'Credit', 'Debit', '09:10:00', '21/12/2017'} as the set. This sample works by using the map builtin which applies it's first argument to each element in the set. The first argument in our map usage is lambda x: x.rstrip('\n') which is simply going to remove any occurrences of \n on the right-hand side of each string.
discard removes the given element from the set only if it presents in it.
In addition, the function doesn't return any value as it changes the set it was ran from.
with open('text_file_name', 'r') as file1:
same = set(file1)
print (same)
same = {elem[:len(elem) - 1] for elem in same if elem.endswith('\n')}
print (same)
There are 4 elements in the set, and none of them are newline.
It would be more usual to use a list in this case, as that preserves order while a set is not guaranteed to preserve order, plus it discards duplicate lines. Perhaps you have your reasons.
You seem to be looking for rstrip('\n'). Consider processing the file in this way:
s = {}
with open('text_file_name') as file1:
for line in file1:
s.add(line.rstrip('\n'))
s.discard('Credit')
print(s) # This displays 3 elements, without trailing newlines.

str.format places last variable first in print

The purpose of this script is to parse a text file (sys.argv[1]), extract certain strings, and print them in columns. I start by printing the header. Then I open the file, and scan through it, line by line. I make sure that the line has a specific start or contains a specific string, then I use regex to extract the specific value.
The matching and extraction work fine.
My final print statement doesn't work properly.
import re
import sys
print("{}\t{}\t{}\t{}\t{}".format("#query", "target", "e-value",
"identity(%)", "score"))
with open(sys.argv[1], 'r') as blastR:
for line in blastR:
if line.startswith("Query="):
queryIDMatch = re.match('Query= (([^ ])+)', line)
queryID = queryIDMatch.group(1)
queryID.rstrip
if line[0] == '>':
targetMatch = re.match('> (([^ ])+)', line)
target = targetMatch.group(1)
target.rstrip
if "Score = " in line:
eValue = re.search(r'Expect = (([^ ])+)', line)
trueEvalue = eValue.group(1)
trueEvalue = trueEvalue[:-1]
trueEvalue.rstrip()
print('{0}\t{1}\t{2}'.format(queryID, target, trueEvalue), end='')
The problem occurs when I try to print the columns. When I print the first 2 columns, it works as expected (except that it's still printing new lines):
#query target e-value identity(%) score
YAL002W Paxin1_129011
YAL003W Paxin1_167503
YAL005C Paxin1_162475
YAL005C Paxin1_167442
The 3rd column is a number in scientific notation like 2e-34
But when I add the 3rd column, eValue, it breaks down:
#query target e-value identity(%) score
YAL002W Paxin1_129011
4e-43YAL003W Paxin1_167503
1e-55YAL005C Paxin1_162475
0.0YAL005C Paxin1_167442
0.0YAL005C Paxin1_73182
I have removed all new lines, as far I know, using the rstrip() method.
At least three problems:
1) queryID.rstrip and target.rstrip are lacking closing ()
2) Something like trueEValue.rstrip() doesn't mutate the string, you would need
trueEValue = trueEValue.rstrip()
if you want to keep the change.
3) This might be a problem, but without seeing your data I can't be 100% sure. The r in rstrip stands for "right". If trueEvalue is 4e-43\n then it is true the trueEValue.rstrip() would be free of newlines. But the problem is that your values seem to be something like \n43-43. If you simply use .strip() then newlines will be removed from either side.

How do I print a hash in perl?

How do I print $stopwords? It seems to be a string ($) but when I print it I get: "HASH(0x8B694)" with the memory address changing on each run.
I am using Lingua::StopWords and I simply want to print the stop words that it's using so I know for sure what stop words are there. I would like to print these two a file.
Do I need to deference the $stopwords some how?
Here is the code:
use Lingua::StopWords qw( getStopWords );
open(TEST, ">results_stopwords.txt") or die("Unable to open requested file.");
my $stopwords = getStopWords('en');
print $stopwords;
I've tried:
my #temp = $stopwords;
print "#temp";
But that doesn't work. Help!
Last note: I know there is a list of stop words for Lingua::StopWords, but I am using the (en) and I just want to make absolute sure what stop words I am using, so that is why I want to print it and ideally I want to print it to a file which the file part I should already know how to do.
$ doesn't mean string. It means a scalar, which could be a string, number or reference.
$stopwords is a hash reference. To use it as a hash, you would use %$stopwords.
Use Data::Dumper as a quick way to print the contents of a hash (pass by reference):
use Data::Dumper;
...
print Dumper($stopwords);
to dereference a hashref :
%hash = %{$hashref}; # makes a copy
so to iterate over keys values
while(($key,$value)=each%{$hashref}){
print "$key => $value\n";
}
or (less efficient but didactic purpose)
for $key (keys %{$hashref}){
print "$key => $hashref->{$key}\n";
}
Have a look at Data::Printer as a nice alternative to Data::Dumper. It will give you pretty-printed output as well as information on methods which the object provides (if you're printing an object). So, whenever you don't know what you've got:
use Data::Printer;
p( $some_thing );
You'll be surprised at how handy it is.
getStopWords returns a hashref — a reference to a hash — so you would dereference it by prepending %. And you actually only want its keys, not its values (which are all 1), so you would use the keys function. For example:
print "$_\n" foreach keys %$stopwords;
or
print join(' ', keys %$stopwords), "\n";
You can also skip the temporary variable $stopwords, but then you need to wrap the getStopWords call in curly-brackets {...} so Perl can tell what's going on:
print join(' ', keys %{getStopWords('en')}), "\n";

matlab iterative filenames for saving

this question about matlab:
i'm running a loop and each iteration a new set of data is produced, and I want it to be saved in a new file each time. I also overwrite old files by changing the name. Looks like this:
name_each_iter = strrep(some_source,'.string.mat','string_new.(j).mat')
and what I#m struggling here is the iteration so that I obtain files:
...string_new.1.mat
...string_new.2.mat
etc.
I was trying with various combination of () [] {} as well as 'string_new.'j'.mat' (which gave syntax error)
How can it be done?
Strings are just vectors of characters. So if you want to iteratively create filenames here's an example of how you would do it:
for j = 1:10,
filename = ['string_new.' num2str(j) '.mat'];
disp(filename)
end
The above code will create the following output:
string_new.1.mat
string_new.2.mat
string_new.3.mat
string_new.4.mat
string_new.5.mat
string_new.6.mat
string_new.7.mat
string_new.8.mat
string_new.9.mat
string_new.10.mat
You could also generate all file names in advance using NUM2STR:
>> filenames = cellstr(num2str((1:10)','string_new.%02d.mat'))
filenames =
'string_new.01.mat'
'string_new.02.mat'
'string_new.03.mat'
'string_new.04.mat'
'string_new.05.mat'
'string_new.06.mat'
'string_new.07.mat'
'string_new.08.mat'
'string_new.09.mat'
'string_new.10.mat'
Now access the cell array contents as filenames{i} in each iteration
sprintf is very useful for this:
for ii=5:12
filename = sprintf('data_%02d.mat',ii)
end
this assigns the following strings to filename:
data_05.mat
data_06.mat
data_07.mat
data_08.mat
data_09.mat
data_10.mat
data_11.mat
data_12.mat
notice the zero padding. sprintf in general is useful if you want parameterized formatted strings.
For creating a name based of an already existing file, you can use regexp to detect the '_new.(number).mat' and change the string depending on what regexp finds:
original_filename = 'data.string.mat';
im = regexp(original_filename,'_new.\d+.mat')
if isempty(im) % original file, no _new.(j) detected
newname = [original_filename(1:end-4) '_new.1.mat'];
else
num = str2double(original_filename(im(end)+5:end-4));
newname = sprintf('%s_new.%d.mat',original_filename(1:im(end)-1),num+1);
end
This does exactly that, and produces:
data.string_new.1.mat
data.string_new.2.mat
data.string_new.3.mat
...
data.string_new.9.mat
data.string_new.10.mat
data.string_new.11.mat
when iterating the above function, starting with 'data.string.mat'

Resources