I have a file with about 200 lines of dates. Each of the dates is in YYYYMMDD format. How can I separate out each months data so I can get the averages of each month?
This is the best I've been able to figure out how to do it
Dates = line.split()
Year= Dates[0][0:4]
Month = Dates[0][4:6]
Date = Dates [0][6:8]
Assuming your file looks similar to this:
20131001 20131005 20130101 20130202
20130109 20130702 20130503 20130701
20130712 20130401 20131101 20131123
Here is what I would do to get a list of all the months in the file:
with open('dates.txt') as f:
lines = f.readlines()
months = [date[4:6] for line in lines for date in line.split()]
print(months)
To deal with the dates as actual datetime objects, use the datetime.strptime method to convert the date strings to datetime objects.
Related
I have a 'some.xlsb' file with some 10 columns, out of which 2 are DateTime column.
When I load using pandas the date-time column is parsed in a different form.
Explanations:
where DateTime value corresponding to 4/10/2021 11:50:24 AM - read as 44296.5
Below is the code I tried.
goods_df = pd.read_excel('some.xlsb',
engine='pyxlsb', sheet_name='goods_df')
goods_df_header = goods_df.iloc[1]
goods_df.columns = goods_df_header #set the header row as the df header
goods_df= goods_df[2:]
goods_df.head(2)
When you read xlsb file using pandas you will get excel time float value because xlsb convert datetime object into an float value before storing.
According to Microsoft 44296.5 means 44296.5 days passed since jan 1st 1900.
You need convert this into epoch and then date by using below formula( epoch value= number of sec passed since jan 1st 1970 00:00:00 ).
a = datetime.datetime.strftime((int(<datevalue from excel>)*86400)-2207520000, "%m/%d/%Y")
Or you can save this xlsb as xlsx and read it you will get exact datetime object.
how to separate date and time from datetime column if you have the format as below :
click here to view image
I am trying int(datetime column) for fetching date ; Datetime column - int(datetime column) for fetching time column
Your formula cannot work because your data is a text string (note that it has a letter included) and not a number.
So first convert the string into a "real" time with:
=substitute(a2,"T"," ")
You can then use:
Date: =INT(SUBSTITUTE(A2,"T"," "))
Time: =MOD(SUBSTITUTE(A2,"T"," "),1)
and be sure to format the results as desired:
If your column is formatted true date then use to separate date
=TEXT(A1,"yyyy-mm-dd")
For time
=TEXT(A1,"hh:mm:ss")
If data is in text string or output by TEXT() function then try below functions.
for date =TEXT(FILTERXML("<t><s>"&SUBSTITUTE(A1,"T","</s><s>")&"</s></t>","//s[1]"),"yyyy-mm-dd")
for time =TEXT(FILTERXML("<t><s>"&SUBSTITUTE(A1,"T","</s><s>")&"</s></t>","//s[last()]"),"hh:mm:ss")
For date
=LEFT(A2,FIND("T",A2)-1)
For time
=RIGHT(A2,LEN(A2)-FIND("T",A2))
I am looking to read data from a column in my CSV file.
All of the data in this column are dates. (DD/MM/YYYY).
I want my program to read the Dates column, and if the date is within 3 days of the current date, I want to add variables to all of the values in that row.
Ex.
Date,Name,LaterDate
1/1/19,John Smith, 2/21/19
If I run my program on 2/19/2019, I want an email sent that says "John Smith's case is closing on "2/21/2019".
I understand how to send an email. The part that I get stuck on is:
Reading the CSV column specifically.
If the date is within 3 days,
Assign variables to the values in the ROW,
Use those variables to send a custom email.
I see a lot of "Use Pandas" but I might need the individual steps broken down.
Thank you.
First things first, you need to read all the values of the csv file and store it in a variable (old_df). Then you need to save all the dates in the Series (dates). Next we create an empty DataFrame with the same columns. From here we create a simple for loop for each date in dates and it's index i. Turn date into a datetime object from the datetime library. Then we subtract amount of days between the current date and date. Take the absolute value of days so we always get a positive amount of days. Then add the index of that particular date in old_df to new_df.
import pandas as pd
from datetime import datetime
old_df = pd.read_csv('example.csv')
dates = old_df['LaterDate']
new_df = pd.DataFrame(columns=['Date', 'Name', 'LaterDate'])
for i, date in enumerate(dates):
date = datetime.strptime(date, '%m/%d/%y')
days = (datetime.now() - date).days
if abs(days) <= 3:
new_df = new_df.append(old_df.loc[i, :])
print(new_df)
I have a mixed format column in a dataframe from a pd.read_csv(). There is a lot of information out there about datetime handling but I didn't find anything for this specific problem:
2 datatime types:
Custom dd/mm/yyyy hh:mm that shows up in excel as such: 10/03/2018 07:18
General that shows up in Excel as such: 8/13/2018 2:28:34 PM
I used :
df.Last_Updated = pd.to_datetime(df['Last_Updated'])
df = df.sort_values('Last_Updated').drop_duplicates(['Name'], keep='last')
But I get a mixed bunch where the custom format returns as another datatime type :
yyyy-mm-dd hh:mm:ss and shows up in my Excel export as 2017-11-22 19:54:35
Upon checking it changes the dd/mm/yyyy hh:mm (02/09/2018 17:55:44) format to yyyy-mm-dd hh:mm:ss (2018-02-09 17:55:44) and since I have to perform an exclusion of the type 'older than' it causes errors; in this particular case, a computer that has it's last connection in September returns as having it in February.
Does anyone know a way to unify the datetime format?
Date format:
from notepad:
X = "10/2/2018 10:07:31 PM"
Y = "8/13/2018 2:28:34 PM"
from CSV (and by opening the .txt via Excel):
X = 10/02/2018 22:07 PM
Y = 8/13/2018 2:28:34 PM
after datetime applied in code:
X = 02/10/2018 22:07:31
Y = 13/08/2018 14:28:34
I have a series of dates and some corresponding values. The format of the data in Excel is "Custom" dd/mm/yyyy hh:mm.
When I try to convert this column into an array in Matlab, in order to use it as the x axis of a plot, I use:
a = datestr(xlsread('filename.xlsx',1,'A:A'), 'dd/mm/yyyy HH:MM');
But I get a Empty string: 0-by-16.
Therefore I am not able to convert it into a date array using the function datenum.
Where do I make a mistake? Edit: passing from hh:mm to HH:MM doesn't work neither. when I try only
a = xlsread('filename.xlsx',1,'A2')
I get: a = []
According to the documentation of datestr the syntax for minutes, months and hours is as follows:
HH -> Hour in two digits
MM -> Minute in two digits
mm -> Month in two digits
Therefore you have to change the syntax in the call for datestr. Because the serial date number format between Excel and Matlab differ, you have to add an offset of 693960 to the retrieved numbers from xlsread.
dateval = xlsread('test.xls',1,'A:A') + 693960;
datestring = datestr(dateval, 'dd/mm/yyyy HH:MM');
This will read the first column (A) of the first sheet (1) in the Excel-file. For better performance you can specify the range explicitly (for example 'A1:A20').
The code converts...
... to:
datestring =
22/06/2015 16:00
Edit: The following code should work for your provided Excel-file:
% read from file
tbl = readtable('data.xls','ReadVariableNames',false);
dateval = tbl.(1);
dateval = dateval + 693960;
datestring = datestr(dateval)
% plot with dateticks as x-axis
plot(dateval,tbl.(2))
datetick('x','mmm/yy')
%datetick('x','dd/mmm/yy') % this is maybe better than only the months
Minutes need to be called with a capital M to distinguish them from months.
Use a=datestr(xlsread('filename.xlsx',1,'A:A'),'dd/mm/yyyy HH:MM')
Edit: Corrected my original answer, where I had mixed up the cases needed.
I tried with this. It works but it is slow and I am not able to plot the dates at the end. Anyway:
table= readtable ('filename.xlsx');
dates = table(:,1);
dates = table2array (dates);
dates = datenum(dates);
dates = datestr (dates);