RPy2 and Bioconductor: gene expression example - rpy2

I'm trying to work out how to do the classic gene expression example in Bioconductor using Rpy2, and I'm stuck right at the beginning. How do you load the data? In R we do:
> library('ALL')
> library('limma')
> data('ALL')
In Python we do:
>>> import rpy2.robjects as robjects
>>> from rpy2.robjects.packages import importr
>>> base = importr('base')
>>> ALL = importr('ALL')
How to do the Python equivalent of:
> data('ALL') ??
I don't see an example where package data is loaded in the docs for the extensions. I thought this could be it but it seems that data is not of the right class because it has signature "character" when fed to featureNames:
>>> data = robjects.r('data(ALL)')
>>> data.rclass
<rpy2.rinterface.SexpVector - Python:0x1004b3828 / R:0x10376d558>
>>> featureNames = robjects.r('featureNames')
>>> featureNames(data)
Error in function (classes, fdef, mtable) :
unable to find an inherited method for function "featureNames", for signature "character"
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/Library/Python/2.6/site-packages/rpy2-2.2.2dev_20110818-py2.6-macosx-10.6-universal.egg/rpy2/robjects/functions.py", line 82, in __call__
return super(SignatureTranslatedFunction, self).__call__(*args, **kwargs)
File "/Library/Python/2.6/site-packages/rpy2-2.2.2dev_20110818-py2.6-macosx-10.6-universal.egg/rpy2/robjects/functions.py", line 34, in __call__
res = super(Function, self).__call__(*new_args, **new_kwargs)
rpy2.rinterface.RRuntimeError: Error in function (classes, fdef, mtable) :
unable to find an inherited method for function "featureNames", for signature "character"
UPDATE: I think I have it now:
>>> import rpy2.robjects as robjects
>>> from rpy2.robjects.packages import importr
>>> base = importr('base')
>>> ALL = importr('ALL')
>>> data = robjects.r('data(ALL)')
>>> data.rclass
<rpy2.rinterface.SexpVector - Python:0x258b190 / R:0xdf86c8>
>>> data = robjects.globalenv['ALL']
>>> data
<RS4 - Python:0x2591490 / R:0x29a2134>
>>> data.rclass
<rpy2.rinterface.SexpVector - Python:0x258b3b0 / R:0xdf85c8>
>>> featureNames = robjects.r('featureNames')
>>> featureNames(data)
<StrVector - Python:0x23e4f08 / R:0x304fc00>
['1000..., '1001..., '1002..., ..., 'AFFX..., 'AFFX..., 'AFFX...]
>>> exprs = robjects.r['exprs']
>>> e = exprs(data)
>>> e
<Matrix - Python:0x23b2da0 / R:0x84d8000>
[7.597323, 5.046194, 3.900466, ..., 3.095670, 3.342961, 3.842535]
>>>

This problem should be solved in a Python package containing bioconductor extensions to rpy2

Related

How to select a row in a pandas DataFrame datetime index using a datetime variable?

I am not a Professional programmer at all and slowly accumulating some experience in python.
This is the issue I encounter.
On my dev machine I had a python3.7 installed with pandas version 0.24.4
the following sequence was working perfectly fine.
>>> import pandas as pd
>>> df = pd.Series(range(3), index=pd.date_range("2000", freq="D", periods=3))
>>> df
2000-01-01 0
2000-01-02 1
2000-01-03 2
Freq: D, dtype: int64
>>> import datetime
>>> D = datetime.date(2000,1,1)
>>> df[D]
0
in the production environnent the pandas version is 1.1.4 and the sequence described does not work anymore.
>>> df[D]
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/ec2-user/.local/lib/python3.7/site-packages/pandas/core/series.py", line 882, in __getitem__
return self._get_value(key)
File "/home/ec2-user/.local/lib/python3.7/site-packages/pandas/core/series.py", line 989, in _get_value
loc = self.index.get_loc(label)
File "/home/ec2-user/.local/lib/python3.7/site-packages/pandas/core/indexes/datetimes.py", line 622, in get_loc
raise KeyError(key)
KeyError: datetime.date(2000, 1, 1)
Then, unexpectedly, by transforming D in a string type the following command did work :
>>> df[str(D)]
0
Any idea of why this behaviour has changed in the different versions ?
Is this behaviour a bug or will be permanent over time ?
should I transform all the selections by datetime variables in the code in string variables or is there a more robust way over time to do this ?
It depends of version. If need more robust solution use datetimes for match DatetimeIndex:
import datetime
D = datetime.datetime(2000,1,1)
print (df[D])
0

AttributeError: module 'readline' has no attribute 'set_completer_delims'

>>> import pdb
>>> x = [1,2,3,4,5]
>>> y = 6
>>> z = 7
>>> r1 = y+z
>>> r1
13
>>> r2 = x+y
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: can only concatenate list (not "int") to list
>>> pdb.set_trace()
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/lib/python3.6/pdb.py", line 1585, in set_trace
Pdb().set_trace(sys._getframe().f_back)
File "/usr/lib/python3.6/pdb.py", line 156, in __init__
readline.set_completer_delims(' \t\n`##$%^&*()=+[{]}\\|;:\'",<>?')
AttributeError: module 'readline' has no attribute 'set_completer_delims'
>>>
Whats problem? run python3.6 an error occurred
I just try to pdb on Cygwin.
(Note that other lib is okay)
In my case the problem was fixed by installing pyreadline:
pip install pyreadline
Please try it.
More info: https://github.com/winpython/winpython/issues/544

Unable to get the method 'learn' from Pool.py

I am implementing the code
import sys
sys.path.append('/home/stepfourward/naivebayes/Naive-Bayes/')
from NaiveBayes import *
import os
DClasses = ["python", "java", "hadoop", "django", "datascience", "php"]
base = "learn/"
p = Pool()
for i in DClasses:
p.learn(base + i, i)
NaiveBayes module contains Pool.py that has the function learn():
def learn(self, directory, dclass_name):
"""
directory is a path, where the files of the class with the name dclass_name can be found
"""
x = DocumentClass(self.__vocabulary)
dir = os.listdir(directory)
for file in dir:
d = Document(self.__vocabulary)
print(directory + "/" + file)
d.read_document(directory + "/" + file, learn=True)
x = x + d
self.__document_classes[dclass_name] = x
x.SetNumberOfDocs(len(dir))
but when I am applying the method p.learn(base + i, i) metioned in code above I am getting attribute error.
AttributeError: 'Pool' object has no attribute 'learn'
How to eradicate this error. Thanks.
Here are the correct steps to use the said NaiveBayes library, after you have cloned the repo as explained elsewhere in a folder Naive-Bayes:
What you do
import sys
sys.path.append('Naive-Bayes/') # your own path here
from NaiveBayes import * # NO error here
p = Pool()
produces an error:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
NameError: name 'Pool' is not defined
What you should do:
import sys
sys.path.append('Naive-Bayes/')
from NaiveBayes.Pool import Pool # correct import
p = Pool() # runs OK now
DClasses = ["python", "java", "hadoop", "django", "datascience", "php"]
base = "learn/"
for i in DClasses:
p.learn(base + i, i)
At this point (but not before), I am getting an expected error, simply because your directories (e.g. learn/python) are not present in my machine:
Traceback (most recent call last):
File "<stdin>", line 2, in <module>
File "/home/herc/SO/Naive-Bayes/NaiveBayes/Pool.py", line 29, in learn
dir = os.listdir(directory)
FileNotFoundError: [Errno 2] No such file or directory: 'learn/python'
but the clear message is that the Pool object and the learn method in Pool.py can indeed be accessed.
Tested with Python 3.4.3 in Ubuntu...

Keep getting error `TypeError: 'float' object is not callable' when trying to run file using numpy library

I intend to perform a Newton Raphson iteration on some data I read in from a file. I use the following function in my python program.
def newton_raphson(r1, r2):
guess1 = 2 * numpy.log(2) / (numpy.pi() * (r1 + r2))
I call this function as so:
if answer == "f": # if data is in file
fileName = input("What is the name of the file you want to open?")
dataArray = extract_data_from_file(fileName)
resistivityArray = []
for i in range(0, len(dataArray[0])):
resistivity_point = newton_raphson(dataArray[0][i], dataArray[1][i])
resistivityArray += [resistivity_point]
On running the program and entering my file, this returns `TypeError: 'float' object is not callable'. Everything I've read online suggests this is due to missing an operator somewhere in my code, but I can't see where I have. Why do I keep getting this error and how do I avoid it?
numpy.pi is not a function, it is a constant:
>>> import numpy
>>> numpy.pi
3.141592653589793
Remove the () call from it:
def newton_raphson(r1, r2):
guess1 = 2 * numpy.log(2) / (numpy.pi * (r1 + r2))
as that is causing your error:
>>> numpy.pi()
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: 'float' object is not callable

QTreeWidgetItem not hashable in python3

I need to port some python2 code to python3.
There I found that a dict of QTreeWidgetItem is created. In Python 2 this is working fine, as the object is hashable. But in python 3 you will get an error because __hash__ is not implemented:
$ python2
>>> from PyQt5 import QtWidgets
>>> x = QtWidgets.QTreeWidgetItem()
>>> foo = {x: 23}
>>> hash(x)
-9223363252877437056
$ python3
>>> from PyQt5 import QtWidgets
>>> x = QtWidgets.QTreeWidgetItem()
>>> hash(x)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: unhashable type: 'QTreeWidgetItem'
I consider this is a bug or is there any reason for this, that I do not see?
The PyQt5 documentation does not mention anything in this direction, and for QTreeWidgetItem, there is only the C++ doc available, which does not help in this python specific case.

Resources