pandas to_numeric couldn't convert string values to integers - python-3.x

I am trying to use pandas.to_numeric to convert a series to ints.
df['numeric_col'] = pd.to_numeric(df['numeric_col'], errors='raise')
I got errors,
Traceback (most recent call last):
File "/home/user_name/script.py", line 86, in execute
data = module(**module_args).execute(data)
File "/home/user_name/script.py", line 62, in execute
invoices['numeric_invoice_no'] = pd.to_numeric(invoices['numeric_invoice_no'], errors='raise')
File "/usr/local/lib/python3.5/dist-packages/pandas/core/tools/numeric.py", line 126, in to_numeric
coerce_numeric=coerce_numeric)
File "pandas/_libs/src/inference.pyx", line 1052, in pandas._libs.lib.maybe_convert_numeric (pandas/_libs/lib.c:56638)
ValueError: Integer out of range. at position 106759
if I change it to,
df['numeric_col'] = pd.to_numeric(df['numeric_col'], errors='coerce')
the values in numeric_col will not convert to ints, i.e. they are still strings.
if I changed to,
df['numeric_col'] = df['numeric_col'].astype(int)
I got error,
OverflowError: Python int too large to convert to C long
so I have to change it to,
df['numeric_col'] = df['numeric_col'].astype(float)
then there was no error generated.
The size of the series is about 994572, the strings in the column are like 52333612273, 56032860 or 02031757.
I am wondering what are the issues with to_numeric and astype here.
I am running Python 3.5 on Linux mint 18.1 64-bit.

Maybe you have a comma(,) within your numeric string values or still having a null value(NaN) within the columns of your dataframe , so try to replace the commas with empty space using the
.replace() method
and then drop or fill in the Null values with
.fillna() or .replace or .dropna()
before using
df['DataFrame Column'] = df['DataFrame Column'].astype(int)

Related

Converting date format in apache logs to ISO format In Python 3

I have been trying to convert the date format in apache logs to ISO format in Python 3, but I can't seem to get it to work.
I can get it to work if I only include the days, months, and year but not in combination with hours, minutes, and seconds.
Text = "25/Jan/2000:14:00:01"
Date = dateutil.parser.parse(Text)
Date = Date.isoformat()
print(Date)
# I receive the following error messages below
Traceback (most recent call last):
File "Z:/Test.py", line 3, in <module>
Date = dateutil.parser.parse(Text)
File "Z:\Python\lib\site-packages\dateutil\parser\_parser.py", line 1356, in parse
return DEFAULTPARSER.parse(timestr, **kwargs)
File "Z:\Python\lib\site-packages\dateutil\parser\_parser.py", line 648, in parse
raise ValueError("Unknown string format:", timestr)
ValueError: ('Unknown string format:', '25/Jan/2000:14:00:01'
I have also tried using datetime module (datetime.datetime.strftime()) but the same problem appears.
Could somebody help me out?
This is because the parse() method expects the string format for the date to be of this form: 25/Jan/2000 14:00:01. You need to replace the : after the year with a space.
Doing something like this would work:
Text = "25/Jan/2000:14:00:01"
Date = parse(Text.replace(":", " ", 1))
Date = Date.isoformat()
print(Date)

Simple Moving Average on a .cat file with Python 3.6

I'm trying to write a program that performs a moving average on a .cat file with ~500 float values, then saves the result to another file. The code works fine if I give in input an array like x=[1,2,3...] but when I try with the file I get the error message:
TypeError: unsupported operand type(s) for *: 'float' and '_io.TextIOWrapper'
May someone please help me?
import numpy as np
def movingaverage (values, window):
weights = np.repeat(1.0,window)/window
sma = np.convolve(values,weights,'valid')
return sma
with open('Relative_flux.cat','r') as f:
data=movingaverage(f,3)
print(data)
f is a file handle, not the contents of the files. The contents must first be read, then formatted into an array of floats, before being handed to your function, which expects an array of floats.
Assuming the file is formatted in the way you mention in your comment:
data=movingaverage([float(x) for x in f.read().split()], 3)
read() reads the whole content of the file and returns it as a string.
split() splits the string at all whitespaces
[float(x) for x in [...]) applies the conversion to float to every string, returning an array of floats.
This code will throw an exception if any of the entries in the file cannot be converted to float, or if the format is not consistently floating point numbers separated by whitespaces.
Your object f is an open file rather than an array of floating point values. You need to read lines from the file and load the floating point values into an array, which depends on the specific file format you're using.

Pandas ast.literal_eval crashes with non-basic datatypes while reading lists from csv

I have a Pandas dataframe that was saved as a csv by using command:
file.to_csv(filepath + name + ".csv", encoding='utf-8', sep=";", quoting=csv.QUOTE_NONNUMERIC)
I know that while saving, all of the columns in the dataframe are converted to String format, but I've managed to convert them back using
raw_table['synset'] = raw_table['synset'].map(ast.literal_eval)
This seems to work fine when the lists in columns contain numbers or text, but not when there's a different datatype. Point in question comes as I try to open a column "synset" with following values (where every line represents a different row in column, and also empty lists are included):
[Synset('report.n.03')]
[]
[Synset('application.n.04')]
[]
[Synset('legal_profession.n.01')]
[Synset('demeanor.n.01')]
[Synset('demeanor.n.01')]
[Synset('outgrowth.n.01')]
These values come from nltk-package, and are Synset-objects.
Trying to evaluate these using ast.literal_eval() causes following crash:
File "/.../site-packages/pandas/core/series.py", line 2158, in map
new_values = map_f(values, arg)
File "pandas/_libs/src/inference.pyx", line 1574, in pandas._libs.lib.map_infer
File "/.../ast.py", line 85, in literal_eval
return _convert(node_or_string)
File "/.../ast.py", line 61, in _convert
return list(map(_convert, node.elts))
File "/.../ast.py", line 84, in _convert
raise ValueError('malformed node or string: ' + repr(node))
ValueError: malformed node or string: <_ast.Call object at 0x7f1918146ac8>
Any help on how I could resolve this problem?

How to get sum of floats in python?

I have a file containing a Persian sentence, then a tab, a Persian word, a tab and then an English word in each line of that. I also have a dictionary with keys and float values. I have to find the words of the file in each line that are in the dictionary, too. And then return their values. Then I have to calculate the logarithm of each word and finally calculate the sum of them for each line separately. The problem is, when I want to calculate the sum, this error occurs: TypeError: 'float' object is not iterable. How can I fix it?
import math
probabilities = {"شور": 0.02, "نمک": 0.05,"زندگی": 0.07, "غذاهای": 0.01, "غذای": 0.05}
filename = "F.txt"
for line in open(filename, encoding="utf-8"):
list_line = line.split("\t")
words = list_line[0].split()
for key, value in probabilities.items():
for word in words:
if word == key:
result = sum(float(math.log(value)))
print(word, result, end=" ")
print()
When I run it, this error appears:
Traceback (most recent call last):
File "C:\example.py", line 14, in <module>
result = sum(float(math.log(value)))
TypeError: 'float' object is not iterable
F.txt ([https://www.dropbox.com/s/ag5at9iuuln2x02/F.txt?dl=0):
شور ورود دانشگاه جالب توجه شور passion
۱۳ راهکار شور اشتیاق واقعی زندگی شور passion
نمک موجود ذائقه غذاهای شور عادت شور salty
از مضرات نمک غذای شور بدانید شور salty
I have to calculate the sum of each line separately and have just one number for each line at last.
Your code is very wrong indeed (you may skip to point #4):
your dictionary has syntax error with the quotes
you're splitting a file handle not lines
you create a double loop to search for keys when you already have a dictionary
you just need result += float(math.log(value)), (init result to 0 outside the loop) sum is for iterables.

string index out of range & encoding utf-8 (python3)

I always get this error...
Traceback (most recent call last):
File "C:/Users/01/Desktop 3/Projects/univ/number.py", line 11, in <module>
print(line[5])
IndexError: string index out of range
I just want to read information from a txt file
readFile = open("utf.txt", encoding="utf-8").read()
for line in readFile:
print(line[5])
I set txt encoding to "utf-8", my IDE also has the same encoding set
One more thing to concider: the file is written in Russian
Your readFile variable is a string (because of .read() method). When you iterate over it you get one char (this is your line variable). Then you try to print sixth element of this one char. Of course you get IndexError.

Resources