Convert csv time format - python-3.x

I was trying to convert the time format from CSV file like "21-03-2019 00:10:00" to "2019-03-21 00:10:00" I spent hours on this finally still doesn't work hope you all guys can point out my wrong place.
This time I am using Python 3
import pandas as pd
import datetime
data = pd.read_csv('/Users/dongmintian994410/Downloads/Data/FM02.csv', header=0)
for i in range(0, len(data)):
row = data.iloc[i]['Date Time']
now I can print out the first row which including the time array however I don't know how to continue .
I would like to convert the time format like "21-03-2019 00:10:00" to "2019-03-21 00:10:00"

from datetime import datetime
import pandas as pd
data = pd.read_csv('/Users/dongmintian994410/Downloads/Data_Capiatalwater/FM002 2.csv', header=0)
for i in range(0, len(data)):
row = data.iloc[i]['Date Time']
datetime_str = row
datetime_object = datetime.strptime(datetime_str, '%d/%m/%Y %H:%M:%S')
print(datetime_object)
finally, I figured this out

Related

Pandas plotting graph with timestamp

pandas 0.23.4
python 3.5.3
I have some code that looks like below
import pandas as pd
from datetime import datetime
from matplotlib import pyplot
def dateparse():
return datetime.strptime("2019-05-28T00:06:20,927", '%Y-%m-%dT%H:%M:%S,%f')
series = pd.read_csv('sample.csv', delimiter=";", parse_dates=True,
date_parser=dateparse, header=None)
series.plot()
pyplot.show()
The CSV file looks like below
2019-05-28T00:06:20,167;2070
2019-05-28T00:06:20,426;147
2019-05-28T00:06:20,927;453
2019-05-28T00:06:22,688;2464
2019-05-28T00:06:27,260;216
As you can see 2019-05-28T00:06:20,167 is the timestamp with milliseconds and 2070 is the value that I want plotted.
When I run this the graph gets printed however on the X-Axis I see numbers which is a bit odd. I was expecting to see actual timestamps (like MS Excel). Can someone tell me what I am doing wrong?
You did not set datetime as index. Aslo, you don't need a date parser, just pass the columns you want to parse:
dfstr = '''2019-05-28T00:06:20,167;2070
2019-05-28T00:06:20,426;147
2019-05-28T00:06:20,927;453
2019-05-28T00:06:22,688;2464
2019-05-28T00:06:27,260;216'''
df = pd.read_csv(pd.compat.StringIO(dfstr), sep=';',
header=None, parse_dates=[0])
plt.plot(df[0], df[1])
plt.show()
Output:
Or:
df.set_index(0)[1].plot()
gives a little better plot:

Keyerror in time/Date Components of datetime - what to do?

I am using a pandas DataFrame with datetime indexing. I know from the
Xarray documentation, that datetime indexing can be done as ds['date.year'] with ds being the DataArray of xarray, date the date index and years of the dates. Xarray points to datetime components which again leads to DateTimeIndex, the latter being panda documentation. So I thought of doing the same with pandas, as I really like this feature.
However, it is not working for me. Here is what I did so far:
# Import required modules
import pandas as pd
import numpy as np
# Create DataFrame (name: df)
df=pd.DataFrame({'Date': ['2017-04-01','2017-04-01',
'2017-04-02','2017-04-02'],
'Time': ['06:00:00','18:00:00',
'06:00:00','18:00:00'],
'Active': [True,False,False,True],
'Value': np.random.rand(4)})
# Combine str() information of Date and Time and format to datetime
df['Date']=pd.to_datetime(df['Date'] + ' ' + df['Time'],format = '%Y-%m-%d %H:%M:%S')
# Make the combined data the index
df = df.set_index(df['Date'])
# Erase the rest, as it is not required anymore
df = df.drop(['Time','Date'], axis=1)
# Show me the first day
df['2017-04-01']
Ok, so this shows me only the first entries. So far, so good.
However
df['Date.year']
results in KeyError: 'Date.year'
I would expect an output like
array([2017,2017,2017,2017])
What am I doing wrong?
EDIT:
I have a workaround, which I am able to go on with, but I am still not satisfied, as this doesn't explain my question. I did not use a pandas DataFrame, but an xarray Dataset and now this works:
# Load modules
import pandas as pd
import numpy as np
import xarray as xr
# Prepare time array
Date = ['2017-04-01','2017-04-01', '2017-04-02','2017-04-02']
Time = ['06:00:00','18:00:00', '06:00:00','18:00:00']
time = [Date[i] + ' ' + Time[i] for i in range(len(Date))]
time = pd.to_datetime(time,format = '%Y-%m-%d %H:%M:%S')
# Create Dataset (name: ds)
ds=xr.Dataset({'time': time,
'Active': [True,False,False,True],
'Value': np.random.rand(4)})
ds['time.year']
which gives:
<xarray.DataArray 'year' (time: 4)>
array([2017, 2017, 2017, 2017])
Coordinates:
* time (time) datetime64[ns] 2017-04-01T06:00:00 ... 2017-04-02T18:00:00
Just in terms of what you're doing wrong, your are
a) trying to call an index as a series
b) chaning commands within a string df['Date'] is a single column df['Date.year'] is a column called 'Date.year'
if you're datetime is the index, then use the .year or dt.year if it's a series.
df.index.year
#or assuming your dtype is a proper datetime (your code indicates it is)
df.Date.dt.year
hope that helps bud.

Formatting duration as "hh:mm:ss" and write to pandas dataframe and to save it as CSV file

I imported data from a CSV file to pandas dataframe.
Then, created a duration column by taking difference of 2 datetime columns and which is as follows:
df['Drive Time'] = df['Delivered Time'] - df['Pickup Time']
Now, I want to save it back to the CSV file but I want the 'Drive Time' column to be displayed as "hh:mm:ss" format while I open using Excel. And the code I used as below:
import pandas as pd
import numpy as np
df = pd.read_csv("1554897620.csv", parse_dates = ['Pickup Time', 'Delivered Time'])
df['Drive Time'] = df['Delivered Time'] - df['Pickup Time']
df.to_csv(index=False)
df.to_csv('test.csv', index=False)
In Conclusion, I want to save Drive Time column in the format "hh:mm:ss" while exporting to CSV
If you know that Delivered Time is never greater than 24 hours, you can use this trick:
import pandas as pd
import numpy as np
df = pd.DataFrame(columns=['Delivered Time', 'Pickup Time'])
df['Delivered Time'] = pd.date_range('2019-01-01 13:00', '2019-01-02 13:00', periods=12)
df['Pickup Time'] = pd.date_range('2019-01-01 12:00', '2019-01-02 12:00', periods=12)
df['Drive Time'] = df['Delivered Time'] - df['Pickup Time']
# Trick: transform timedelta to datetime object to enable strftime:
df['Drive Time'] = pd.to_datetime(df['Drive Time']).dt.strftime("%H:%M:%S")
df.to_csv('test.csv')
By transforming the timedelta to a datetime data type, you can use its strftime method.

Matplotlib plot displays date which is not in the pandas dateframe

I have a simple data-frame which has a list of 10 days. Out of those 10 days 1 is a holiday. I have assigned a value (1) to each date in the data frame. I am trying to plot the data. The 5th of January is supposed to be a holiday, but it gets printed on the plot even though I have excluded it during the creation of the dictionary using which the data frame was created. My code is as follows.
from datetime import date, timedelta
import matplotlib.pyplot as plt
from pandas import DataFrame, to_datetime
START_DATE = date(2019,1,1)
DATE_LIST = []
for i in range(10):
DATE = START_DATE + timedelta(i)
if DATE != date(2019,1,5):
DATE_LIST.append(DATE)
DATE_DICTIONARY = {}.fromkeys(DATE_LIST, 1)
DATAFRAME = DataFrame({"Value":DATE_DICTIONARY})
DATAFRAME.reindex(to_datetime(DATAFRAME.index)).plot(legend=False)
plt.show()
The output when the
if DATE != date(2019,1,5):
is removed and the 5th of January is also added to the DATE_LIST looks
something like this
Figure when the january 5th is added
I want the exact same format of x axis but, without 5th of januray but upon execution of my code this is what I get.
Figure when january 5th is excluded
It's not surprising that the number 5 is still present between 4 and 6, independent on if you actually have that number in your data.
If you want to treat dates as categories you can use matplotlib.
from datetime import date, timedelta
import matplotlib.pyplot as plt
from pandas import DataFrame, to_datetime
START_DATE = date(2019,1,1)
DATE_LIST = []
for i in range(10):
DATE = START_DATE + timedelta(i)
if DATE != date(2019,1,5):
DATE_LIST.append(DATE)
DATE_DICTIONARY = {}.fromkeys(DATE_LIST, 1)
df = DataFrame({"Value":DATE_DICTIONARY})
df = df.reindex(to_datetime(df.index))
plt.plot(df.index.astype(str), df.values)
plt.setp(plt.gca().get_xticklabels(), rotation=45, ha="right")
plt.tight_layout()
plt.show()

Reindexing A Pandas Data Frame gives only Nans

I wanted to do the tutorial Python for Finance, Part I: Yahoo & Google Finance API, pandas, and matplotlib. As the Google API is not working properly, I used IEX data. When I try to reindex the pandas data Frame, to have consecutive dates, all values that were available before are replaced with NaN. The new index I think has something to do with the two different types of the indexes. I am quite new to programming and have no clue how to solve this problem after reading the pandas documentation.
The code looks like this.
Modules used
matplotlib==2.2.3
pandas_datareader==0.6.0
googlefinance.client==1.3.0
pandas==0.23.4
googlefinance==0.7
import pandas as pd
pd.core.common.is_list_like = pd.api.types.is_list_like
import pandas_datareader as pdr
import matplotlib.pyplot as plt
from datetime import datetime
# Define the instruments to download. We would like to see Apple, Microsoft and the S&P500 index.
#tickers = ["AAPL","MSFT","SPY"]
# We would like all available data from 01/01/2017 until 9/1/2018.
start_date = datetime(2017,1,1)
end_date = datetime(2018,9,1)
# User pandas_reader.data.DataReader to load the desired data.
aapl_data = pdr.DataReader("SPY",'iex',start_date,end_date, pause = 5)
#ms_data = pdr.DataReader('SPY','iex',start_date,end_date,pause = 5)
print(aapl_data.tail(9))
#print(ms_data.tail(9))
print(aapl_data.index.dtype)
close = aapl_data
# Getting all weekdays between start_date and end_date
all_weekdays = pd.date_range(start=start_date, end=end_date, freq='B')
print(all_weekdays.dtype)
# How do we align the existing prices in adj_close with our new set of dates?
# All we need to do is reindex close using all_weekdays as the new index
close = close.reindex(all_weekdays)
# Reindexing will insert missing values (NaN) for the dates that were not present
print(close)
You need convert both the index to datetime dtype.
Use pd.to_datetime : Docs
aapl_data = pdr.DataReader("SPY",'iex',start_date,end_date, pause = 5)
aapl_data.index = pd.to_datetime(aapl_data.index) # converts index to datetime dtype
close = aapl_data
all_weekdays = pd.date_range(start=start_date, end=end_date, freq='B')
close = close.reindex(all_weekdays)
print(aapl_data.index.dtype)
# Output:
datetime64[ns]
print(all_weekdays.dtype)
# Output:
datetime64[ns]

Resources