Python string with currency formatted - python-3.x

I am attempting to retrieve a dollar amount from my database, concatenate it with other information and have it print with 2 decimal points. I have tried so many different configurations I have lost tract but continue to get only one decimal point (or an error). Can someone please show me where I am losing it.
#get payments on the permits
for i in range(len(IDS)):
paydate = date.today() - timedelta(30)
query = '''select `AmountPaid_credit`, `PayComment`, `DateReceived`, `WPTSinvoiceNo`
from Payments
where `NPDES_ID` = ? and `DateReceived` >= ?'''
PayStrings.append("\tNo Payment Recieved")
for row in cur.execute(query, (IDS[i],paydate)):
WPTSInv.append(row[3])
d= row[2].strftime('%m/%d/%Y')
#create a payment string
PayStrings[i]="\t$"+str(row[0])+" - "+d+" - "+row[1]
returns this
$2000.0 - 02/09/2017 - 2017 APPLICATION FEE
but I need this
$2000.00 - 02/09/2017 - 2017 APPLICATION FEE

As an alternative to Haifeng's answer, you could do the following
from decimal import Decimal
str(round(Decimal(row[0]), 2))
Edit: I would also like to add that there is a locale module and that it is probably a better solution, even if it is a little bit more work. You can see how to use it at this question here: Currency formatting in Python

In your case, you just converting decimal places:
>>> value = "2000.0"
>>> "{:.2f}".format(float(value))
'2000.00'

Related

Remove non meaningful characters in pandas dataframe

I am trying to remove all
\xf0\x9f\x93\xa2, \xf0\x9f\x95\x91\n\, \xe2\x80\xa6,\xe2\x80\x99t
type characters from the below strings in Python pandas column. Although the text starts with b' , it's a string
Text
_____________________________________________________
"b'Hello! \xf0\x9f\x93\xa2 End Climate Silence is looking for volunteers! \n\n1-2 hours per week. \xf0\x9f\x95\x91\n\nExperience doing digital research\xe2\x80\xa6
"b'I doubt if climate emergency 8s real, I think people will look ba\xe2\x80\xa6 '
"b'No, thankfully it doesn\xe2\x80\x99t. Can\xe2\x80\x99t see how cheap to overtourism in the alan alps can h\xe2\x80\xa6"
"b'Climate Change Poses a WidelllThreat to National Security "
"b""This doesn't feel like targeted propaganda at all. I mean states\xe2\x80\xa6"
"b'berates climate change activist who confronted her in airport\xc2\xa0
The above content is in pandas dataframe as a column..
I am trying
string.encode('ascii', errors= 'ignore')
and regex but without luck. It will be helpful if I can get some suggestions.
Your string looks like byte string but not so encode/decode doesn't work. Try something like this:
>>> df['text'].str.replace(r'\\x[0-9a-f]{2}', '', regex=True)
0 b'Hello! End Climate Silence is looking for v...
1 b'I doubt if climate emergency 8s real, I thin...
2 b'No, thankfully it doesnt. Cant see how cheap...
3 b'Climate Change Poses a WidelllThreat to Nati...
4 b""This doesn't feel like targeted propaganda ...
5 b'berates climate change activist who confront...
Name: text, dtype: object
Note you have to clean your unbalanced single/double quotes and remove the first 'b' character.
You could go through your strings and keep only ascii characters:
my_str = "b'Hello! \xf0\x9f\x93\xa2 End Climate Silence is looking for volunteers! \n\n1-2 hours per week. \xf0\x9f\x95\x91\n\nExperience doing digital research\xe2\x80\xa6"
new_str = "".join(c for c in my_str if c.isascii())
print(new_str)
Note that .encode('ascii', errors= 'ignore') doesn't change the string it's applied to but returns the encoded string. This should work:
new_str = my_str.encode('ascii',errors='ignore')
print(new_str)

How to calculate average datetime timestamps in python3

I have a code which I have it's performance timestamped, and I want to measure the average of time it took to run it on multiple computers, but I just cant figure out how to use the datetime module in python.
Here is how my procedure looks:
1) I have the code which simply writes into a text file the log, where the timestamp looks like
t1=datetime.datetime.now()
...
t2=datetime.datetime.now()
stamp= t2-t1
And that stamp variable is just written in say log.txt so in the log file it looks like 0:07:23.160896 so it seems like it's %H:%M:%S.%f format.
2) Then I run a second python script which reads in the log.txt file and it reads the 0:07:23.160896 value as a string.
The problem is I don't know how to work with this value because if I import it as a datetime it will also append and imaginary year and month and day to it, which I don't want, I simply just want to work with hours and minutes and seconds and microseconds to add them up or do an average.
For example I can just open it in Libreoffice and add the 0:07:23.160896 to 0:00:48.065130 which will give 0:08:11.226026 and then just divide by 2 which will give 0:04:05.613013, and I just can't possibly do that in python or I dont know how to do it.
I have tried everything, but neither datetime.datetime, nor datetime.timedelta allows simply multiplication and division like that. If I just do a y=datetime.datetime.strptime('0:07:23.160896','%H:%M:%S.%f') it will just give out 1900-01-01 00:07:23.160896 and I can't just take a y*2 like that, it doesnt allow arithmetic operations, plus if if I convert it into a timedelta it will also multiply the year,which is ridiculous. I simply just want to add and subtract and multiply time.
Please help me find a way to do this, and not just for 2 variables but possibly even a way to calculate the average of an entire list of timestamps like average(['0:07:23.160896' , '0:00:48.065130', '0:00:14.517086',...]) way.
I simply just want a way to calculate the average of many timestamps and give out it's average in the same format, just as you can just select a column in Libreoffice and take the AVERAGE() function which will give out the average timestamp in that column.
As you have done, you first read the string into a datetime-object using strptime: t = datetime.datetime.strptime(single_time,'%H:%M:%S.%f')
After that, convert the time part of your datestring into a timedelta, so you can easily calculate with times: tdelta = datetime.timedelta(hours=t.hour, minutes=t.minute, seconds=t.second, microseconds=t.microsecond)
Now you can easily calculate with the timedelta object, and convert at the end of the calculations back into a string by str(tdsum)
import datetime
times = ['0:07:23.160896', '0:00:48.065130', '0:12:22.324251']
# convert times in iso-format into timedelta list
tsum = datetime.timedelta()
count = 0
for single_time in times:
t = datetime.datetime.strptime(single_time,'%H:%M:%S.%f')
tdelta = datetime.timedelta(hours=t.hour, minutes=t.minute, seconds=t.second, microseconds=t.microsecond)
tsum = tsum + tdelta
count = count + 1
taverage = tsum / count
average_time = str(taverage)
print(average_time)

Adding a number in one column to a date in another in a pandas dataframe

My first python project that didn't print 'Hello World' - so be gentle. Tried answers from similar questions but they don't seem to work.
I'm working with an Excel file, parsing as pandas dataframe.
I have a calculated column that calculates the number of days to later be added to a date. The number of days to add column is done as below, with 'choices' being a list of integers. This seems to work fine.
choices = [0,0,925,778,567,608, 638,730]
df['Days_to_add'] = np.select(conditions, choices, default=0)
I now want to add this to an existing date column, to return a new column with the new date. So far i've tried this but Jupyter says its depreciated and will return a TypeError in a future version:
df["Estimated Start"] = pd.to_timedelta(df["Date1"]) + df['Days_to_add']
Also tried this:
df['Estimated_Start'] = df.Max_Dec_Date + pd.DateOffset(df['Days_to_add'])
And something else that told me to use timedelta index, and something else that pointed to timedelta range. I think the problem is something to do with trying to add an integer to a series?
No success with any of it. Help?
Date is not TimeDelta, but DateTime,
so the addition should go like this:
df["Estimated Start"] = pd.to_datetime(df["Date1"]) + pd.to_timedelta(df['Days_to_add'], unit='D')

Pandas .rolling.corr using date/time offset

I am having a bit of an issue with pandas's rolling function and I'm not quite sure where I'm going wrong. If I mock up two test series of numbers:
df_index = pd.date_range(start='1990-01-01', end ='2010-01-01', freq='D')
test_df = pd.DataFrame(index=df_index)
test_df['Series1'] = np.random.randn(len(df_index))
test_df['Series2'] = np.random.randn(len(df_index))
Then it's easy to have a look at their rolling annual correlation:
test_df['Series1'].rolling(365).corr(test_df['Series2']).plot()
which produces:
All good so far. If I then try to do the same thing using a datetime offset:
test_df['Series1'].rolling('365D').corr(test_df['Series2']).plot()
I get a wildly different (and obviously wrong) result:
Is there something wrong with pandas or is there something wrong with me?
Thanks in advance for any light you can shed on this troubling conundrum.
It's very tricky, I think the behavior of window as int and offset is different:
New in version 0.19.0 are the ability to pass an offset (or
convertible) to a .rolling() method and have it produce variable sized
windows based on the passed time window. For each time point, this
includes all preceding values occurring within the indicated time
delta.
This can be particularly useful for a non-regular time frequency index.
You should checkout the doc of Time-aware Rolling.
r1 = test_df['Series1'].rolling(window=365) # has default `min_periods=365`
r2 = test_df['Series1'].rolling(window='365D') # has default `min_periods=1`
r3 = test_df['Series1'].rolling(window=365, min_periods=1)
r1.corr(test_df['Series2']).plot()
r2.corr(test_df['Series2']).plot()
r3.corr(test_df['Series2']).plot()
This code would produce similar shape of plots for r2.corr().plot() and r3.corr().plot(), but note that the calculation results still different: r2.corr(test_df['Series2']) == r3.corr(test_df['Series2']).
I think for regular time frequency index, you should just stick to r1.
This mainly because the result of two rolling 365 and 365D are different.
For example
sub = test_df.head()
sub['Series2'].rolling(2).sum()
Out[15]:
1990-01-01 NaN
1990-01-02 -0.355230
1990-01-03 0.844281
1990-01-04 2.515529
1990-01-05 1.508412
sub['Series2'].rolling('2D').sum()
Out[16]:
1990-01-01 -0.043692
1990-01-02 -0.355230
1990-01-03 0.844281
1990-01-04 2.515529
1990-01-05 1.508412
Since there are a lot NaN in rolling 365, so the corr of two series in two way are quit different.

Python Math - Floating with financial output comes out incorrect

In a small portion of code for a retail auditing calculator, I'm attempting to allow the input of a retail value and multiply it by up to 2 entered quantities The expected (intended) result is $X*Y=$Z.
I've attempted to modify the code a couple of says and seem to be stuck on how this math is (isn't) working correctly.
I've attempted a number of different configurations in the code and the most I've achieved is the following:
#Retail value of item, whole number (i.e. $49.99 entered as 4999)
rtlVAL = input("Retail Value: ")
#Quantity of Items - can be multiplied for full stack items, default if no entry is '1'
qt1 = float(input("Quantity 1: ")) #ex. 4
qt2 = float(input("Quantity 2: ") or "1") #ex " "
#Convert the Retail Value to finacial format (i.e 4999 to $49.99)
rtl = float("{:.2}".format (rtlVAL))
# Screen Output
qtyVAL = int(qt1)*int(qt2)
print("$" + str(qtyVAL*rtl))
The entered values are:
Retail Value: 4999
Quantity 1: 4
Quantity 2: (blank)
The expected performance is 4999 * 4 * (because no entry defaults to value of 1) and the expected result is $199.96
The result of this code is $196.0, so not only is it the wrong conclusion but it's missing the two decimal places.
I'm not entirely certain why the math comes up wrong in context to expectation.
What am I missing here?
On line 9, I've tried the following:
rtl = float("{:.2f}".format (rtlVAL))
rtl = int("{:.2f}".format (rtlVAL))
The return was
ValueError: Unknown format code 'f' for object of type 'str'
if I change line 13 to:
print("$" + float(qtyVAL*rtl))
I get
TypeError: must be str, not float
using either of the prior alterations in conjunction with the latter will return the ValueError:
Python 3.4 and 3.6
I did search a few other SO questions regarding Python, Math, Floating point, and formatting but the questions were looking for and presenting something far more advances and entangled than this so i wasn't able to glean an answer to make a contextual application or it applied mainly to Python 2.7 wherein some of the code such as raw input() is simply input() and altered by int(input())in Python 3.x to step out of str value (as far as I understand for this purpose.
I did not see this as a duplicate, but if I missed something in that I do apologize - it isn't intentional.
No need to mess around with number formats:
rtl = float(rtlVAL)/100
Just divide the retail value by 100 to get the dollar value
EDIT:
Incidentally, the reason it was coming up with 196 was because your number format was taking the first two digits of rtlVAL - 49 in your case - and then multiplying by that.

Resources