Formatting printing with spacing Python - python-3.x

I am trying to have all the values indented and at the same level whilst printing the integers and strings specified below. I am trying to have the space equidistant throughout. How would I be able to do that?
print('Consecutive monthly {} results: {}\tIndexes: {} - {}'.format('positive',14,'2019-04-01 00:00:00','2020-06-01 00:00:00'))
print('Consecutive monthly {} results: {}\tIndexes: {} - {}'.format('negative',2,'2018-02-01 00:00:00','2020-06-01 00:00:00'))
Output
Consecutive monthly positive results: 14 Indexes: 2019-04-01 00:00:00 - 2020-06-01 00:00:00
Consecutive monthly negative results: 2 Indexes: 2018-02-01 00:00:00 - 2018-03-01 00:00:00
Expected Output:
Consecutive monthly positive results: 14 Indexes: 2019-04-01 00:00:00 - 2020-06-01 00:00:00
Consecutive monthly negative results: 2 Indexes: 2018-02-01 00:00:00 - 2018-03-01 00:00:00

You can specify the format for decimal number like this.
print('Consecutive monthly {} results: {:<2d}\tIndexes: {} - {}'.format('positive',14,'2019-04-01 00:00:00','2020-06-01 00:00:00'))
print('Consecutive monthly {} results: {:<2d}\tIndexes: {} - {}'.format('negative',2,'2018-02-01 00:00:00','2020-06-01 00:00:00'))
The ':<2d' in the format string specifies the number to be left justified, and space(in characters) it should take for it.

Related

subtract second datetime row from first datetime row of a column if another column shows duplicate values

I have a dataframe with two columns Order date and Customer(which have duplicates of only 2 values which has been sorted), I want to subtract the second Order date of the second occurrence of a Customer from the first Order date. Order date is in datetime format
here is a sample of the table
context I'm trying to calculate the time it takes for a customer to make a second order\
Order date Customer
4260 2022-11-11 16:29:00 (App admin)
8096 2022-10-22 12:54:00 (App admin)
996 2021-09-22 20:30:00 10013
946 2021-09-14 15:16:00 10013
3499 2022-04-20 12:17:00 100151
... ... ...
2856 2022-03-21 13:49:00 99491
2788 2022-03-18 12:15:00 99523
2558 2022-03-08 12:07:00 99523
2580 2022-03-04 16:03:00 99762
2544 2022-03-02 15:40:00 99762
I have tried deleting by index but it returns just the first two values.
expected output should be another dataframe with just the Customer name and the difference between the Second and first Order dates of the duplicate customer in minutes
expected output:
| Customer | difference in minutes |
| -------- | -------- |
| 1232 | 445.0 |
|(App Admin)| 3432.0 |
| 1145 | 2455.0 |
|6653 | 32.0 |
You can use groupby:
df['Order date'] = pd.to_datetime(df['Order date'])
out = (df.groupby('Customer', as_index=False)['Order date']
.agg(lambda x: (x.iloc[0] - x.iloc[-1]).total_seconds() / 60)
.query('`Order date` != 0'))
print(out)
# Output:
Customer Order date
0 (App admin) 29015.0
1 10013 11834.0
4 99523 14408.0
5 99762 2903.0

Pandas - converting out of order string date time

I have a DataFrame column that has string values for date/time (Input data). I need to convert it into a semi-timestamp format (Desired output data). There are rows that are blank and need to remain blank. I use quotes for illustrative purposes. I am using strptime but getting an error (see below).
Input data (String):
Mar 8 12:00 PM 2020
' '
Mar 8 1:00 PM 2020
Mar 8 6:00 PM 2020
Mar 9 8:00 AM 2020
Desired output data:
3/8/2020 12:00:00
' '
3/8/2020 13:00:00
3/8/2020 18:00:00
3/9/2020 08:00:00
Code:
import datetime as dt
df['date'].apply(lambda x: dt.datetime.strptime(x, '%b %d %H:%M %p %Y'))
Error:
ValueError: time data '' does not match format '%b %d %H:%M %p %Y'
How can I rewrite this code to get the desired output?
For me working to_datetime with format similar like yoour with %I for select hours in 12H format, also is added errors='coerce' for missing values (NaT) if some value not matching:
df['date'] = pd.to_datetime(df['date'], format='%b %d %I:%M %p %Y', errors='coerce')
print (df)
date
0 2020-03-08 12:00:00
1 NaT
2 2020-03-08 13:00:00
3 2020-03-08 18:00:00
4 2020-03-09 08:00:00
Last for custom format use Series.dt.strftime with Series.replace:
df['date'] = (pd.to_datetime(df['date'], format='%b %d %I:%M %p %Y', errors='coerce')
.dt.strftime('%m/%d/%y %H:%M:%S')
.replace('NaT', ''))
print (df)
date
0 03/08/20 12:00:00
1
2 03/08/20 13:00:00
3 03/08/20 18:00:00
4 03/09/20 08:00:00
Or replace multiple spoaces to one space:
df['date'] = (pd.to_datetime(df['date'].replace('\s+', ' ', regex=True), format='%b %d %I:%M %p %Y', errors='coerce')
.dt.strftime('%m/%d/%y %H:%M:%S')
.replace('NaT', ''))
print (df)
date
0 03/08/20 12:00:00
1
2 03/08/20 13:00:00
3 03/08/20 18:00:00
4 03/09/20 08:00:00

String does not contain a date

I have a DataFrame With This Column :
Mi_Meteo['Time_Instant'].head():
0 2013/11/14 17:00
1 2013/11/14 18:00
2 2013/11/14 19:00
3 2013/11/14 20:00
4 2013/11/14 21:00
Name: Time_Instant, dtype: object
After Doing Some Inspection This is What I realised :
Mi_Meteo['Time_Instant'].value_counts():
2013/12/09 02:00 33
2013/12/01 22:00 33
2013/12/11 10:00 33
2013/12/05 09:00 33
.
.
.
.
2013/11/16 02:00 21
2013/11/07 10:00 11
2013/11/17 22:00 11
DateTIme 3
So I striped it:
Mi_Meteo['Time_Instant'] = Mi_Meteo['Time_Instant'].str.rstrip('DateTIme')# Cause Otherwise I would get this Error When Converting : 'Unknown string format'
And Then I tried To Convert it :
Mi_Meteo['Time_Instant'] = pd.to_datetime(Mi_Meteo['Time_Instant'])
But I Get This Error:
String does not contain a date.
Any Suggestion Would Be Much Appreciated , Thank U all.
A bit late, why don't you use this:
Mi_Meteo['Time_Instant'] = pd.to_datetime(Mi_Meteo['Time_Instant'], errors='coerce')
In the pandas.to_datetime document a description of the 'errors' parameter:
errors{‘ignore’, ‘raise’, ‘coerce’}, default ‘raise’ If ‘raise’, then
invalid parsing will raise an exception.
If ‘coerce’, then invalid parsing will be set as NaT.
If ‘ignore’, then invalid parsing will return the input.
I got the same error - it turns out that two of my dates were empty: ' '.
To find the row index of the problematic dates I used the following list comprehension:
badRows = [n for n,x in enumerate(df['DATE'].tolist()) if x.strip() in ['']]
This returned a list, containing the indices of the rows in the 'DATE' column that were causing the problems:
[745672, 745673]
Can then delete these rows in place:
df.drop(df.index[badRows],inplace=True)
I'm having trouble reproducing your error, so I cannot be sure if this will fix the issue you have. If not then please try to provide a minimum sample of code/data that reproduces your error.
This is what I tried to reproduce your situation:
lzt = ['2013/11/16 02:00 ',
'2013/11/07 10:00 ',
'2013/11/17 22:00 ',
'DateTIme',
'DateTIme',
'DateTIme']
ser = pd.Series(lzt)
ser = ser.str.rstrip('DateTIme')
ser = pd.to_datetime(ser)
But as I said I got no error, so either we have a different version of pandas or there's something else wrong with your data. Using rstrip leave some blank string data:
0 2013/11/16 02:00
1 2013/11/07 10:00
2 2013/11/17 22:00
3
4
5
which for me gives NaT (not a time) when I run pd.to_datetime on it:
Out[34]:
0 2013-11-16 02:00:00
1 2013-11-07 10:00:00
2 2013-11-17 22:00:00
3 NaT
4 NaT
5 NaT
dtype: datetime64[ns]
I'd say it's better practice to remove the unwanted rows all together:
ser = ser[ser != 'DateTIme']
Out[39]:
0 2013-11-16 02:00:00
1 2013-11-07 10:00:00
2 2013-11-17 22:00:00
dtype: datetime64[ns]
See if that works, otherwise please give enough information to reproduce the error.
There are two possible solutions to this:
Either you can make the error disappear by using coerce in errors argument of pd.to_datetime() as follows:Mi_Meteo['Time_Instant'] = pd.to_datetime(Mi_Meteo['Time_Instant'], errors='coerce')
Or if you are interested to know which dates have the unparsable values, you can search for them by converting each value at a time as follows. Ths will work regardless of the type or format of the wrong value:
dates = []
wrong_dates = []
for i in Mi_Meteo['Time_Instant'],unique():
try:
date = pd.to_datetime(i)
dates.append(i)
except:
wrong_dates.append(i)
In the wrong_dates list you will have all the wrong values while in dates, all the right values

Generate a interval based time series using Spark SQL

I am new to Spark sql. I want to generate the following series of start time and end time which have an interval of 5 seconds for current date. So in lets say I am running my job on 1st Jan 2018 I want a series of start time and end time which have a difference of 5 seconds. So there will be 17280 records for 1 day
START TIME | END TIME
-----------------------------------------
01-01-2018 00:00:00 | 01-01-2018 00:00:04
01-01-2018 00:00:05 | 01-01-2018 00:00:09
01-01-2018 00:00:10 | 01-01-2018 00:00:14
.
.
01-01-2018 23:59:55 | 01-01-2018 23:59:59
01-02-2018 00:00:00 | 01-01-2018 00:00:05
I know I can generate this data-frame using a scala for loop. My constraint is that I can use only queries to do this.
Is there any way I can create this data structure using select * constructs?

Convert incomplete 12h datetime-like strings into appropriate datetime type

I've got a pandas Series containing datetime-like strings with 12h format, but without the am/pm abbreviations. It covers an entire month of data :
40 01/01/2017 11:51:00
41 01/01/2017 11:51:05
42 01/01/2017 11:55:05
43 01/01/2017 11:55:10
44 01/01/2017 11:59:30
45 01/01/2017 11:59:35
46 02/01/2017 12:00:05
47 02/01/2017 12:00:10
48 02/01/2017 12:13:20
49 02/01/2017 12:13:25
50 02/01/2017 12:24:50
51 02/01/2017 12:24:55
52 02/01/2017 12:33:30
Name: TS, dtype: object
(318621,) # shape
My goal is to convert it to datetime format, so as to obtain the appropriate unix timestamps values, and make comparisions/arithmetics with other datetime data with, this time, 24h format. So I already tried this :
pd.to_datetime(df.TS, format = '%d/%m/%Y %I:%M:%S') # %I for 12h format
Which outputs me :
64 2017-01-02 00:46:50
65 2017-01-02 00:46:55
66 2017-01-02 01:01:00
67 2017-01-02 01:01:05
68 2017-01-02 01:05:00
But the am/pm informations are not taken into account. I know that, as a rule, the am/pm first have to be specified in the strings, then one can use dt.dt.strptime() or pd.to_datetime() to parse them with the %p indicator.
So I wanted to know if there's an other way to deal with this issue through datetime or pandas datetime modules ? Or, do I have to manualy add the abbreviations 'am/pm' before the parsing ?
You have data in 5 second intervals throughout multiple days. The desired end format is like this (with AM/PM column we need to add, because Pandas cannot possibly guess, since it looks at one value at a time):
31/12/2016 11:59:55 PM
01/01/2017 12:00:00 AM
01/01/2017 12:00:05 AM
01/01/2017 11:59:55 AM
01/01/2017 12:00:00 PM
01/01/2017 12:59:55 PM
01/01/2017 01:00:00 PM
01/01/2017 01:00:05 PM
01/01/2017 11:59:55 PM
02/01/2017 12:00:00 AM
First, we can parse the whole thing without AM/PM info, as you already showed:
ts = pd.to_datetime(df.TS, format = '%d/%m/%Y %I:%M:%S')
We have a small problem: 12:00:00 is parsed as noon, not midnight. Let's normalize that:
ts[ts.dt.hour == 12] -= pd.Timedelta(12, 'h')
Now we have times from 00:00:00 to 11:59:55, twice per day.
Next, note that the transitions are always at 00:00:00. We can easily detect these, as well as the first instance of each date:
twelve = ts.dt.time == datetime.time(0,0,0)
newdate = ts.dt.date.diff() > pd.Timedelta(0)
midnight = twelve & newdate
noon = twelve & ~newdate
Next, build an offset series, which should be easy to inspect for correctness:
offset = pd.Series(np.nan, ts.index, dtype='timedelta64[ns]')
offset[midnight] = pd.Timedelta(0)
offset[noon] = pd.Timedelta(12, 'h')
offset.fillna(method='ffill', inplace=True)
And finally:
ts += offset

Resources