getting a error 'bool' object is not iterable - python-3.x

want to count the number of occurrence to be true which is
satisfying below condition ,but it not showing the value in true or
false but not able to count it,I don't know where i am doing wrong please have a look into my below code
from openpyxl import load_workbook
import openpyxl as xl
from openpyxl.utils.dataframe import dataframe_to_rows
import pandas as pd
import os
import xlwings as xw
import datetime
filelist_patch=[f for f in os.listdir() if f.endswith(".xlsx") and 'SEL' in f.upper() and '~' not in f.upper()]
print(filelist_patch[0])
wb = xl.load_workbook(filelist_patch[0],read_only=True,data_only=True)
wb_device=wb["de_exp"]
cols_device = [0,9,14,18,19,20,21,22,23,24,25]
#######################average count in vuln##############################
for row in wb_device.iter_rows(max_col=25):
cells = [cell.value for (idx, cell) in enumerate(row) if (
idx in cols_device and cell.value is not None)]
os_de=cells[1]
qca_de=cells[2]
file_data =((os_de=="cl") & (qca_de=='Q'))
print(sum(file_data))
getting a type error
TypeError Traceback (most recent call last)
<ipython-input-70-735a490062da> in <module>
30 file_data =((os_de=="client") & (qca_de=='Q'))(here i want to count the number of occurence that is in true
---> 31 print(sum(file_data))
32
33
TypeError: 'bool' object is not iterable

Your question is very hard to read. Please use proper grammar and punctuation. I understand that you are probably not a native speaker, but that is no excuse to not form proper sentences that start with a capital letter and end with a period.
Please sort your own thoughts before you ask a question, and then write your thoughts in multiple short an concise sentenses.
Nonetheless, I'll try to guess what you are trying to say.
90% of your code is unrelated to the question, so I'll try to reform your question. If my guess is incorrect, of course my answer will be worthless, and I'd ask you to reword your question to be more precise.
Reworded Question
Question: How to count the number of true statements in a number of conditions?
Details: Given a number of conditions (like os_de=="client" and qca_de=='Q'), how do I count the number of correct ones among them?
Attempt:
# Dummy data, to make this a short and reproducable example:
cells = ["", "cl", "D"]
os_de = cells[1]
qca_de = cells[2]
file_data = ((os_de=="cl") & (qca_de=='Q'))
print(sum(file_data))
Expected result value: 1
Actual result:
TypeError Traceback (most recent call last)
<ipython-input-70-735a490062da> in <module>
30 file_data = ((os_de=="client") & (qca_de=='Q'))
---> 31 print(sum(file_data))
32
33
TypeError: 'bool' object is not iterable
Answer
Both (os_de=="client") and (qca_de=='Q') are of type boolean.
Using & on them makes the result still be a boolean. So if you try to use sum() on it, it rightfully complains that the sum of a boolean does not make sense.
Sum can only be done over a collection of numbers.
You are almost there, though. Just instead of combining them with &, make them a list instead.
# Dummy data, to make this a short and reproducable example:
cells = ["", "cl", "D"]
os_de = cells[1]
qca_de = cells[2]
file_data = [(os_de=="cl"), (qca_de=='Q')]
print(sum(file_data))
Which prints 1, as expected: https://ideone.com/96Rghq
Try to include an ideone.com link in your questions in the future, this forces you to make your example code complete, simple and reproducable.

Related

How to specify string format with reference to the variable type at the time of formatting?

If there is a variable a whose type is unknown until it is created, how to account for the type when using python string formatting, specifically at the time of formatting?
Example:
import numpy as np
import datetime as dt
a = dt.datetime(2020,1,1)
print(f'{a:%d/%m/%Y}')
>>>
01/01/2020
However if the variable is changed, then this will produce an error:
a = np.nan
print(f'{a:%d/%m/%Y}')
>>>
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-70-02594a38949f> in <module>
1 a = np.nan
----> 2 print(f'{a:%d/%m/%Y}')
ValueError: Invalid format specifier
So I am attempting to implement something like:
print(f'{a:%d/%m/%Y if not np.isnan(a) else ,.0f}')
>>>
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-78-52c24b6abb16> in <module>
----> 1 print(f'{a:%d/%m/%Y if not np.isnan(a) else ,.0f}')
ValueError: Invalid format specifier
However, as we can see it attempts to format the variable before evaluating the expression. Perhaps the syntax needs correcting or another approach is required entirely. However, I do specifically want to evaluate which formatter to deploy it and deploy it at the time of printing the f-string.
Thanks!
Realise that the following approach is possible:
print(f'{a:%d/%m/%Y}' if isinstance(a,dt.datetime) else f'{a}')
However, the syntax is still too cumbersome, especially if this is to be deployed in mutliple locations. Effectively am searching for a streamlined way to format the variable if it is of a compatible type to the format and if not, then default to no specified formatter.
Thanks!
What about a custom function, so that you can handle all the types you need in a single point in your code?
import numbers
def get_formatter(var):
"""Get the format of a variable to be used in an f-string, based on its type"""
if isinstance(var, (dt.datetime, dt.date)):
return '%d/%m/%Y'
elif isinstance(var, numbers.Number):
if np.isfinite(var):
return '.3f'
return ''
You can then you the function to format the variable; for instance:
for v in [
dt.datetime(2020,1,1),
np.nan,
5,
5.555555,
]:
print(f'{v:{get_formatter(v)}}')
will produce:
01/01/2020
nan
5.000
5.556
I'd recommend to define your own custom formatter if you have many such kind of things to do.
For example:
import string
import datetime as dt
class CustomFormatter(string.Formatter):
def format_field(self, value, format_spec):
if isinstance(value, float):
return str(value)
elif isinstance(value,dt.datetime):
return value.__format__(format_spec)
return super().format(value, format_spec)
In this custom formatter, you can ignore all the format specifiers for floats and call the __format__() method for datetimes or change them to anything else you want. And while using, you just:
print(fmt.format("{:%d/%m/%Y}",a))
This is useful when you'd like use like this:
a = dt.datetime(2020,1,1)
b = np.nan
fmt = CustomFormatter()
for value in [a,b]:
print(fmt.format("The value is: {:%d/%m/%Y}",value))
The output will be:
The value is: 01/01/2020
The value is: nan
which means you do not need to change your specifier for datetime or float or any possible types and not even checking the types, cause you've done this and put it in your class.
You could make an inline conditional format like so:
print(f'{a:{"%d/%m/%Y" if not np.isnan(a) else ",.0f"}}')
So basically you needed an extra pair of curly brackets, however, I think the solution proposed by #PieCot is much cleaner.

Pandas read_csv not working when items missing from last column

I'm having problems reading in the following three seasons of data (all the seasons after these ones load without problem).
import pandas as pd
import itertools
alphabets = ['a','b', 'c', 'd']
keywords = [''.join(i) for i in itertools.product(alphabets, repeat = 3)]
col_names = keywords[:57]
seasons = [2002, 2003, 2004]
for season in seasons):
df = pd.read_csv("https://www.football-data.co.uk/mmz4281/{}{}/E0.csv".format(str(season)[-2:], str(season+1)[-2:]), names=col_names).dropna(how='all')
This gives the following error:
pandas.errors.ParserError: Error tokenizing data. C error: Expected 57 fields in line 337, saw 62
I have looked on stack overflow for problems that have a similar error code (see below)but none seem to offer a solution that fits my problem.
Python Pandas Error tokenizing data
I'm pretty sure the error is caused when there is missing data in the last column, however I don't know how to fix it, can someone please explain how to do this?
Thanks
Baz
UPDATE:
The amended code now works for seasons 2002 and 2003. However 2004 is now producing a new error:
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xa0 in position 0: invalid start byte
Following the answer below from Serge Ballesta option 2:
UnicodeDecodeError when reading CSV file in Pandas with Python
df = pd.read_csv("https://www.football-data.co.uk/mmz4281/{}{}/E0.csv".format(str(season)[-2:], str(season+1)[-2:]), names=col_names, encoding = "latin1").dropna(how='all')
With the above amendment the code also works for season=2004.
I still have two questions though:
Q1.) How can I find which character/s were causing the problem is season 2004?
Q2.) Is it safe to use the 'latin1' encoding for every season even though there wre originally encoded at 'utf-8>

Python 3 OutOfBoundsDatetime: Out of bounds nanosecond timestamp: (Workaround)

Encountered an error today involving importing a CSV file with dates. The file has known quality issues and in this case one entry was "3/30/3013" due to a data entry error.
Reading other entries about the OutOfBoundsDatetime error, datetime's upper limit maxes out at 4/11/2262. The suggested solution was to fix the formatting of the dates. In my case the date format is correct but the data is wrong.
Applying numpy logic:
df['Contract_Signed_Date'] = np.where(df['Contract_Signed_Date']>'12/16/2017',
df['Alt_Date'],df['Contract_Signed_Date'])
Essentially if the file's 'Contract Signed Date' is greater than today (being 12/16/2017), I want to use the Alt_Date column instead. It seems to work except when it hits the year 3013 entry it errors out. Whats a good pythonic way around the out of bounds error?
Perhaps hideously unpythonic but it appears to do what you want.
Input, file arthur.csv:
input_date,var1,var2
3/30/3013,2,34
02/2/2017,17,35
Code:
import pandas as pd
from io import StringIO
target_date='2017-12-17'
for_pandas = StringIO()
print ('input_date,var1,var2,alt_date', file=for_pandas) #new header
with open('arthur.csv') as arthur:
next(arthur) #skip header in csv
for line in arthur:
line_items = line.rstrip().split(',')
date = '{:4s}-{:0>2s}-{:0>2s}'.format(*list(reversed(line_items[0].split('/'))))
if date>target_date:
output = '{},{},{},{}'.format(*['NaT',line_items[1],line_items[2],date])
else:
output = '{},{},{},{}'.format(*[date,line_items[1],line_items[2],'NaT'])
print(output, file=for_pandas)
for_pandas.seek(0)
df = pd.read_csv(for_pandas, parse_dates=['input_date', 'alt_date'])
print (df)
Output:
0 NaT 2 34 3013-30-03
1 2017-02-02 17 35 NaT

ValueError, though check has already be performed for this

Getting a little stuck with NaN data. This program trawls through a folder in an external hard drive loads in a txt file as a dataframe, and should reads the very last value of the last column. As some of the last rows do not complete for what ever reason, i have chosen to take the row before (or that's what i hope to have done. Here is the code and I have commented the lines that I think are giving the trouble:
#!/usr/bin/env python3
import glob
import math
import pandas as pd
import numpy as np
def get_avitime(vbo):
try:
df = pd.read_csv(vbo,
delim_whitespace=True,
header=90)
row = next(df.iterrows())
t = df.tail(2).avitime.values[0]
return t
except:
pass
def human_time(seconds):
secs = seconds/1000
mins, secs = divmod(secs, 60)
hours, mins = divmod(mins, 60)
return '%02d:%02d:%02d' % (hours, mins, secs)
def main():
path = 'Z:\\VBox_Backup\\**\\*.vbo'
events = {}
customers = {}
for vbo_path in glob.glob(path, recursive=True):
path_list = vbo_path.split('\\')
event = path_list[2].upper()
customer = path_list[3].title()
avitime = get_avitime(vbo_path)
if not avitime: # this is to check there is a number
continue
else:
if event not in events:
events[event] = {customer:avitime}
print(event)
elif customer not in events[event]:
events[event][last_customer] = human_time(events[event][last_customer])
print(events[event][last_customer])
events[event][customer] = avitime
else:
total_time = events[event][customer]
total_time += avitime
events[event][customer] = total_time
last_customer = customer
events[event][customer] = human_time(events[event][customer])
df_events = pd.DataFrame(events)
df.to_csv('event_track_times.csv')
main()
I put in a line to check for a value, but I am guessing that NaN is not a null value, hence it hasn't quite worked.
C:\Users\rob.kinsey\AppData\Local\Continuum\Anaconda3) c:\Users\rob.kinsey\Pro
ramming>python test_single.py
BARCELONA
03:52:42
02:38:31
03:21:02
00:16:35
00:59:00
00:17:45
01:31:42
03:03:03
03:16:43
01:08:03
01:59:54
00:09:03
COTA
04:38:42
02:42:34
sys:1: DtypeWarning: Columns (0) have mixed types. Specify dtype option on import or set low_memory=False.
04:01:13
01:19:47
03:09:31
02:37:32
03:37:34
02:14:42
04:53:01
LAGUNA_SECA
01:09:10
01:34:31
01:49:27
03:05:34
02:39:03
01:48:14
SILVERSTONE
04:39:31
01:52:21
02:53:42
02:10:44
02:11:17
02:37:11
01:19:12
04:32:21
05:06:43
SPA
Traceback (most recent call last):
File "test_single.py", line 56, in <module>
main()
File "test_single.py", line 41, in main
events[event][last_customer] = human_time(events[event][last_customer])
File "test_single.py", line 23, in human_time
The output is starting out correctly, except for the sys:1 error, but at least it carries on, and the final error that stalls the program completely. How can I get past this NaN issue, all variables I am working with should be of float data type or should have been ignored. All data types should only be strings or floats until the time conversion which are integers.
Ok, even though no one answered, I am compelled to answer my own question as I am not convinced I am the only person that has had this problem.
There are 3 main reasons for receiving NaN in a data frame, most of these revolve around infinity, such as using 'inf' as a value or dividing by zero, which will also provide NaN as a result, the wiki page was the most helpful for me in solving this issue:
https://en.wikipedia.org/wiki/NaN
One other important point about NaN it that is works a little like a virus, in that anything that touches it in any calculation will result in NaN, so the problem can get exponentially worse. Actually what you are dealing with is missing data, and until you realize that's what it is, NaN is the least useful and frustrating thing as it comes under a datatype not an error yet any mathematical operations will end in NaN. BEWARE!!
The reason on this occasion is because a specific line was used to get the headers when reading in the csv file and although that worked for the majority of these files, some of them had the headers I was after on a different line, as a result, the headers being imported into the data frame either were part of the data itself or a null value. As a result, trying to access a column in the data frame by header name resulted in NaN, and as discussed earlier, this proliferated though the program causing a few problems which had used workarounds to combat, one of which was actually acceptable which is to add this line:
df = df.fillna(0)
after the first definition of the df variable, in this case:
df= pd.read_csv(vbo,
delim_whitespace=True,
header=90)
The bottom line is that if you are receiving this value, the best thing really is to work out why you are getting NaN in the first place, then it is easier to make an informed decision as to whether or not replacing NaN with '0' is a viable choice.
I sincerely hope this helps anyone who finds it.
Regards
iFunction

type error :'builtin_function_or_method' object is not subscriptable

from random import randint
random_number = (randint (33, 126))
print (random_number)
print (chr[random_number])
i am generating a random number from 33 to 126, and trying to turn that random number into its ASCII equivalent.
However, this error keeps on showing up :TypeError: 'builtin_function_or_method' object is not subscriptable
I believe what you're trying to do is change the random number to a character. Replace:
for i in (random_number):
print [chr[i]]
with this:
print [chr[random_number]]
random_number here is a single integer. A for loop needs a range, "from x to y", so one number isn't enough to tell the system what to count. You can use something like for i in range(33, random_number):, but from your description, I don't know why you're using a for loop at all. Use print chr(random_number) instead.
Note the use of parentheses instead of brackets. You should be calling a function here.

Resources