Simple Moving Average on a .cat file with Python 3.6 - python-3.x

I'm trying to write a program that performs a moving average on a .cat file with ~500 float values, then saves the result to another file. The code works fine if I give in input an array like x=[1,2,3...] but when I try with the file I get the error message:
TypeError: unsupported operand type(s) for *: 'float' and '_io.TextIOWrapper'
May someone please help me?
import numpy as np
def movingaverage (values, window):
weights = np.repeat(1.0,window)/window
sma = np.convolve(values,weights,'valid')
return sma
with open('Relative_flux.cat','r') as f:
data=movingaverage(f,3)
print(data)

f is a file handle, not the contents of the files. The contents must first be read, then formatted into an array of floats, before being handed to your function, which expects an array of floats.
Assuming the file is formatted in the way you mention in your comment:
data=movingaverage([float(x) for x in f.read().split()], 3)
read() reads the whole content of the file and returns it as a string.
split() splits the string at all whitespaces
[float(x) for x in [...]) applies the conversion to float to every string, returning an array of floats.
This code will throw an exception if any of the entries in the file cannot be converted to float, or if the format is not consistently floating point numbers separated by whitespaces.

Your object f is an open file rather than an array of floating point values. You need to read lines from the file and load the floating point values into an array, which depends on the specific file format you're using.

Related

Error in reading a ascii encoded csv file?

I have a csv file named Qid-NamedEntityMapping.csv having data like this:
Q1000070 b'Myron V. George'
Q1000296 b'Fred (footballer, born 1979)'
Q1000799 b'Herbert Greenfield'
Q1000841 b'Stephen A. Northway'
Q1001203 b'Buddy Greco'
Q100122 b'Kurt Kreuger'
Q1001240 b'Buddy Lester'
Q1001867 b'Fyodor Stravinsky'
The second column is 'ascii' encoded, and when I am reading the file using the following code, then also it not being read properly:
import chardet
import pandas as pd
def find_encoding(fname):
r_file = open(fname, 'rb').read()
result = chardet.detect(r_file)
charenc = result['encoding']
return charenc
my_encoding = find_encoding('datasets/KGfacts/Qid-
NamedEntityMapping.csv')
df = pd.read_csv('datasets/KGfacts/Qid-
NamedEntityMapping.csv',error_bad_lines=False, encoding=my_encoding)
But the output looks like this:
Also, I tried to use encoding='UTF-8'. but still, the output is the same.
What can be done to read it properly?
Looks like you have an improperly saved TSV file. Once you circumvent the TAB problem (as suggested in my comment), you can convert the column with names to a more suitable representation.
Let's assume that the second column of the dataframe is called "names". The b'XXX' thing is probably a bytes [mis]representation of a string. Convert it to a bytes object with ast.literal_eval and then decode to a string:
import ast
df["names"].apply(ast.literal_eval).apply(bytes.decode)
#0 Myron...
#1 Fred...
Last but not least, your problem has almost nothing to do with encodings or charsets.
Your issue looks like the CSV is actually tab separated; so you need to have sep='\t' in the read_csv function. It's reading everything else as a single column, except "born 1979" in the first row, as that is the only cell with a comma in it.

Iterate over images with pattern

I have thousands of images which are labeled IMG_####_0 where the first image is IMG_0001_0.png the 22nd is IMG_0022_0.png, the 100th is IMG_0100_0.png etc. I want to perform some tasks by iterating over them.
I used this fnames = ['IMG_{}_0.png'.format(i) for i in range(150)] to iterate over the first 150 images but I get this error FileNotFoundError: [Errno 2] No such file or directory: '/Users/me/images/IMG_0_0.png' which suggests that it is not the correct way to do it. Any ideas about how to capture this pattern while being able to iterate over the specified number of images i.e in my case from IMG_0001_0.png to IMG_0150_0.png
fnames = ['IMG_{0:04d}_0.png'.format(i) for i in range(1,151)]
print(fnames)
for fn in fnames:
try:
with open(fn, "r") as reader:
# do smth here
pass
except ( FileNotFoundError,OSError) as err:
print(err)
Output:
['IMG_0000_0.png', 'IMG_0001_0.png', ..., 'IMG_0148_0.png', 'IMG_0149_0.png']
Dokumentation: string-format()
and format mini specification.
'{:04d}' # format the given parameter with 0 filled to 4 digits as decimal integer
The other way to do it would be to create a normal string and fill it with 0:
print(str(22).zfill(10))
Output:
0000000022
But for your case, format language makes more sense.
You need to use a format pattern to get the format you're looking for. You don't just want the integer converted to a string, you specifically want it to always be a string with four digits, using leading 0's to fill in any empty space. The best way to do this is:
'IMG_{:04d}_0.png'.format(i)
instead of your current format string. The result looks like this:
In [2]: 'IMG_{:04d}_0.png'.format(3)
Out[2]: 'IMG_0003_0.png'
generate list of possible names and try if exist is slow and horrible way to iterate over files.
try look to https://docs.python.org/3/library/glob.html
so something like:
from glob import iglob
filenames = iglob("/path/to/folder/IMG_*_0.png")

Python file i/0: error -- How to write to a file when the filename given is a string?

I have a function that accepts d (dictionary that must be sorted asciibetically by key,) and filename (file that may or may not exist.) I have to have exact format written to this file and the function must return None.
Format:
Every key-value pair of the dictionary should be output as: a string that starts with key, followed by ":", a tab, then the integers from the value list. Every integer should be followed by a "," and a tab except for the very last one, which should be followed by a newline.
The issue is when I go to close the file and run my testers, it tells me this error:
'str' object has no attribute 'close'
Obviously that means my file isn't a file, it's a string. How do I fix this?
Here is my current functions that work together to accept the dictionary, sort the dictionary, open/create file for writing, write the dictionary to the file in specified format that can be read as a string, and then close the file:
def format_item(key,value):
return key+ ":\t"+",\t".join(str(x) for x in value)
def format_dict(d):
return sorted(format_item(key,value) for key, value in d.items())
def store(d,filename):
with open(filename, 'w') as f:
f.write("\n".join(format(dict(d))))
filename.close()
return None
Example of expected output:
IN: d = {'orange':[1,3],'apple':[2]}"
OUT: store(d,"out.txt")
the file contents should be read as this string: "apple:\t2\norange:\t1,\t3\n"
You have actually set the file handle to f but you are trying to close filename.
so your close command should be f.close()

a bytes-like object is required, not 'str': typeerror in compressed file

I am finding substring in compressed file using following python script. I am getting "TypeError: a bytes-like object is required, not 'str'". Please any one help me in fixing this.
from re import *
import re
import gzip
import sys
import io
import os
seq={}
with open(sys.argv[1],'r') as fh:
for line1 in fh:
a=line1.split("\t")
seq[a[0]]=a[1]
abcd="AGATCGGAAGAGCTCGTATGCCGTCTTCTGCTTG"
print(a[0],"\t",seq[a[0]])
count={}
with gzip.open(sys.argv[2]) as gz_file:
with io.BufferedReader(gz_file) as f:
for line in f:
for b in seq:
if abcd in line:
count[b] +=1
for c in count:
print(c,"\t",count[c])
fh.close()
gz_file.close()
f.close()
and input files are
TruSeq2_SE AGATCGGAAGAGCTCGTATGCCGTCTTCTGCTTG
the second file is compressed text file. The line "if abcd in line:" shows the error.
The "BufferedReader" class gives you bytestrings, not text strings - you can directly compare both objects in Python3 -
Since these strings just use a few ASCII characters and are not actually text, you can work all the way along with byte strings for your code.
So, whenever you "open" a file (not gzip.open), open it in binary mode (i.e.
open(sys.argv[1],'rb') instead of 'r' to open the file)
And also prefix your hardcoded string with a b so that Python uses a binary string inernally: abcd=b"AGATCGGAAGAGCTCGTATGCCGTCTTCTGCTTG" - this will avoid a similar error on your if abcd in line - though the error message should be different than the one you presented.
Alternativally, use everything as text - this can give you more methods to work with the strings (Python3's byte strigns are somewhat crippled) presentation of data when printing, and should not be much slower - in that case, instead of the changes suggested above, include an extra line to decode the line fetched from your data-file:
with io.BufferedReader(gz_file) as f:
for line in f:
line = line.decode("latin1")
for b in seq:
(Besides the error, your progam logic seens to be a bit faulty, as you don't actually use a variable string in your innermost comparison - just the fixed bcd value - but I suppose you can fix taht once you get rid of the errors)

Skipping over array elements of certain types

I have a csv file that gets read into my code where arrays are generated out of each row of the file. I want to ignore all the array elements with letters in them and only worry about changing the elements containing numbers into floats. How can I change code like this:
myValues = []
data = open(text_file,"r")
for line in data.readlines()[1:]:
myValues.append([float(f) for f in line.strip('\n').strip('\r').split(',')])
so that the last line knows to only try converting numbers into floats, and to skip the letters entirely?
Put another way, given this list,
list = ['2','z','y','3','4']
what command should be given so the code knows not to try converting letters into floats?
You could use try: except:
for i in list:
try:
myVal.append(float(i))
except:
pass

Resources