string index out of range & encoding utf-8 (python3) - python-3.x

I always get this error...
Traceback (most recent call last):
File "C:/Users/01/Desktop 3/Projects/univ/number.py", line 11, in <module>
print(line[5])
IndexError: string index out of range
I just want to read information from a txt file
readFile = open("utf.txt", encoding="utf-8").read()
for line in readFile:
print(line[5])
I set txt encoding to "utf-8", my IDE also has the same encoding set
One more thing to concider: the file is written in Russian

Your readFile variable is a string (because of .read() method). When you iterate over it you get one char (this is your line variable). Then you try to print sixth element of this one char. Of course you get IndexError.

Related

how can i open a random text file in python?

i am trying to make a python program that randomly selects a text file to open and outputs the contents of the randomly selected text file
when i try running the code, i get this error
Traceback (most recent call last):
File "//fileserva/home$/K59046/Documents/project/project.py", line 8, in
o = open(text, "r")
TypeError: expected str, bytes or os.PathLike object, not tuple
this is the code that i have written
import os
import random
os.chdir('N:\Documents\project\doodoo')
a = os.getcwd()
print("current dir is",a)
file = random.randint(1, 4)
text = (file,".txt")
o = open(text, "r")
print (o.read())
can somebody tell me what i am doing wrong?
As your error message says, your text variable is a tuple, not a string. You can use f-strings or string concatenation to solve this:
# string concatenation
text = str(file) + ".txt"
# f-strings
text = f"{file}.txt"
Your variable text is not what you expect. You currently create a tuple that could look like this: (2, ".txt"). If you want a string like "2.txt", you need to concatenate the two parts:
text = str(file) + ".txt"

Parsing xml file from url to a astropy votable without downloading

From http://svo2.cab.inta-csic.es/theory/fps/ you can get the transmission curves for many filters used in astronomical observations. I would like to get these data by opening the url with the corresponding xml file (for each filter), parse it to astropy's votable that helps to read the table data easily.
I have managed to do this by opening the file converting it to a UTF-8 file and saving in locally as an xml. Then opening the local file works fine, as it is obvious form the following example.
However I do not want to save the file and open it again. When I tried that by doing: votable = parse(xml_file), it raises an OSError: File name too long as it takes all the file as a string.
from urllib.request import urlopen
fltr = 'http://svo2.cab.inta-csic.es/theory/fps/fps.php?ID=2MASS/2MASS.H'
url = urlopen(fltr).read()
xml_file = url.decode('UTF-8')
with open('tmp.xml','w') as out:
out.write(xml_file)
votable = parse('tmp.xml')
data = votable.get_first_table().to_table(use_names_over_ids=True)
print(votable)
print(data["Wavelength"])
The output in this case is:
<VOTABLE>... 1 tables ...</VOTABLE>
Wavelength
AA
----------
12890.0
13150.0
...
18930.0
19140.0
Length = 58 rows
Indeed according to the API documentation, votable.parse's first argument is either a filename or a readable file-like object. It doesn't specify this exactly, but apparently the file also has to be seekable meaning that it can be read with random access.
The HTTPResponse object returned by urlopen is indeed a file-like object with a .read() method, so in principle it might be possible to pass directly to parse(), but this is how I found out it has to be seekable:
fltr = 'http://svo2.cab.inta-csic.es/theory/fps/fps.php?ID=2MASS/2MASS.H'
u = urlopen(fltr)
>>> parse(u)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "astropy/io/votable/table.py", line 135, in parse
_debug_python_based_parser=_debug_python_based_parser) as iterator:
File "/usr/lib/python3.6/contextlib.py", line 81, in __enter__
return next(self.gen)
File "astropy/utils/xml/iterparser.py", line 157, in get_xml_iterator
with _convert_to_fd_or_read_function(source) as fd:
File "/usr/lib/python3.6/contextlib.py", line 81, in __enter__
return next(self.gen)
File "astropy/utils/xml/iterparser.py", line 63, in _convert_to_fd_or_read_function
with data.get_readable_fileobj(fd, encoding='binary') as new_fd:
File "/usr/lib/python3.6/contextlib.py", line 81, in __enter__
return next(self.gen)
File "astropy/utils/data.py", line 210, in get_readable_fileobj
fileobj.seek(0)
io.UnsupportedOperation: seek
So you need to wrap the data in a seekable file-like object. Along the lines that #keflavich wrote you can use io.BytesIO (io.StringIO won't work as explained below).
It turns out that there's no reason to explicitly decode the UTF-8 data to unicode. I'll spare the example, but after trying it myself it turns out parse() works on raw bytes (which I find a bit odd, but okay). So you can read the entire contents of the URL into an io.BytesIO which is just an in-memory file-like object that supports random access:
>>> u = urlopen(fltr)
>>> s = io.BytesIO(u.read())
>>> v = parse(s)
WARNING: W42: None:2:0: W42: No XML namespace specified [astropy.io.votable.tree]
>>> v.get_first_table().to_table(use_names_over_ids=True)
<Table masked=True length=58>
Wavelength Transmission
AA
float32 float32
---------- ------------
12890.0 0.0
13150.0 0.0
... ...
18930.0 0.0
19140.0 0.0
This is, in general, the way in Python to do something with some data as though it were a file, without writing an actual file to the filesystem.
Note, however, this won't work if the entire file can't fit in memory. In that case you still might need to write it out to disk. But if it's just for some temporary processing and you don't want to litter your disk tmp.xml like in your example, you can always use the tempfile module to, among other things, create temporary files that are automatically deleted once they're no longer in use.

select random item (and not pick this one for the next random pick up) from list created from a file

I'm trying to build a program that will pick a random items (just once for each) from a list, that was imported from a file.
NB: I put only one item for a line in my file (no space, no coma, nothing else than a simple word)
I have a code like that for now:
file = open('file.txt', 'r')
myList = file.readlines()
myList[0]
rand_item = random.choice(myList)
print (rand_item)
I am just at the beginning of my program, so I'm just testing every new step that i make. Here, I'd like to display a random item from my list (itsefl imported from a file).
I have this message when I try to run my program:
Traceback (most recent call last):
File "C:/Users/Julien/Desktop/test final.py", line 16, in <module>
rand_item = random.choice(listEmployee)
AttributeError: 'builtin_function_or_method' object has no attribute 'choice'

pandas to_numeric couldn't convert string values to integers

I am trying to use pandas.to_numeric to convert a series to ints.
df['numeric_col'] = pd.to_numeric(df['numeric_col'], errors='raise')
I got errors,
Traceback (most recent call last):
File "/home/user_name/script.py", line 86, in execute
data = module(**module_args).execute(data)
File "/home/user_name/script.py", line 62, in execute
invoices['numeric_invoice_no'] = pd.to_numeric(invoices['numeric_invoice_no'], errors='raise')
File "/usr/local/lib/python3.5/dist-packages/pandas/core/tools/numeric.py", line 126, in to_numeric
coerce_numeric=coerce_numeric)
File "pandas/_libs/src/inference.pyx", line 1052, in pandas._libs.lib.maybe_convert_numeric (pandas/_libs/lib.c:56638)
ValueError: Integer out of range. at position 106759
if I change it to,
df['numeric_col'] = pd.to_numeric(df['numeric_col'], errors='coerce')
the values in numeric_col will not convert to ints, i.e. they are still strings.
if I changed to,
df['numeric_col'] = df['numeric_col'].astype(int)
I got error,
OverflowError: Python int too large to convert to C long
so I have to change it to,
df['numeric_col'] = df['numeric_col'].astype(float)
then there was no error generated.
The size of the series is about 994572, the strings in the column are like 52333612273, 56032860 or 02031757.
I am wondering what are the issues with to_numeric and astype here.
I am running Python 3.5 on Linux mint 18.1 64-bit.
Maybe you have a comma(,) within your numeric string values or still having a null value(NaN) within the columns of your dataframe , so try to replace the commas with empty space using the
.replace() method
and then drop or fill in the Null values with
.fillna() or .replace or .dropna()
before using
df['DataFrame Column'] = df['DataFrame Column'].astype(int)

Storing the Output to a FASTA file

from Bio import SeqIO
from Bio import SeqRecord
from Bio import SeqFeature
for rec in SeqIO.parse("C:/Users/Siva/Downloads/sequence.gp","genbank"):
if rec.features:
for feature in rec.features:
if feature.type =="Region":
seq1 = feature.location.extract(rec).seq
print(seq1)
SeqIO.write(seq1,"region_AA_output1.fasta","fasta")
I am trying to write the output to a FASTA file but i am getting error. Can anybody help me?
This the error which i got
Traceback (most recent call last):
File "C:\Users\Siva\Desktop\region_AA.py", line 10, in <module>
SeqIO.write(seq1,"region_AA_output1.fasta","fasta")
File "C:\Python34\lib\site-packages\Bio\SeqIO\__init__.py", line 472, in write
count = writer_class(fp).write_file(sequences)
File "C:\Python34\lib\site-packages\Bio\SeqIO\Interfaces.py", line 211, in write_file
count = self.write_records(records)
File "C:\Python34\lib\site-packages\Bio\SeqIO\Interfaces.py", line 196, in write_records
self.write_record(record)
File "C:\Python34\lib\site-packages\Bio\SeqIO\FastaIO.py", line 190, in write_record
id = self.clean(record.id)
AttributeError: 'str' object has no attribute 'id'
First, you're trying to write a plain sequence as a fasta record. A fasta record consists of a sequence plus an ID line (prepended by ">"). You haven't provided an ID, so the fasta writer has nothing to write. You should either write the whole record, or turn the sequence into a fasta record by adding an ID yourself.
Second, even if your approach wrote anything, it's continually overwriting each new record into the same file. You'd end up with just the last record in the file.
A simpler approach is to store everything in a list, and then write the whole list when you're done the loop. For example:
new_fasta = []
for rec in SeqIO.parse("C:/Users/Siva/Downloads/sequence.gp","genbank"):
if rec.features:
for feature in rec.features:
if feature.type =="Region":
seq1 = feature.location.extract(rec).seq
# Use an appropriate string for id
new_fasta.append('>%s\n%s' % (rec.id, seq1))
with open('region_AA_output1.fasta', 'w') as f:
f.write('\n'.join(new_fasta))

Resources