I have a pandas dataframe of about 3 years with the resolution of 6 seconds and I want to group the data into 24-hour bins and plot each day using matplotlib in a loop.
This is my dataframe's head:
timestamp consumption
0 2012-11-11 12:00:03 468
1 2012-11-11 12:00:09 476
2 2012-11-11 12:00:16 463
3 2012-11-11 12:00:22 449
4 2012-11-11 12:00:28 449
It includes the power consumption of a house from 2012 till 2015. After the pre-processing, the dataframe starts at about 12 pm of the first day. I need to plot all of the dataframe in 24-hour intervals and each plot must represent for a single day that starts from about 12 pm and ends at about 12 pm of the next day
So, I need about 1500 plots that show the power consumption of each day starting from 12 pm, for about 1500 days of my dataframe.
Thanks in advance.
Update: The reason I want to plot 1500 days separately, is I want to check each night's power consumption and label the occupant's sleep pattern. And I considered each day from 12 pm to 12 pm to have a complete sleep cycle in one plot. And after preparing the labels I'll be able to use them as train and test data for classification
Consider this not only an answer but also a suggestion. First, convert the column 'timestamp' into the index (DatetimeIndex)
df.set_index(df['timestamp'], inplace=True, drop=True)
Then, get all the unique days that happen in your DataFrame
unique_days = list(set(df.index.to_period('D').strftime('%Y-%m-%d')))
We then squeeze the DataFrame into a Series
del df['timestamp']
df = df.squeeze()
Now, just plot unique days in your series in separate subplots.
import matplotlib.pyplot as plt
unique_days = list(set(df.index.to_period('D').strftime('%Y-%m-%d')))
fig, axes = plt.subplots(nrows=len(unique_days), ncols=1)
row = 0
for day in unique_days:
df[day].plot(ax=axes[row], figsize=(50,10))
row += 1
plt.show()
Now, it's time for you to play around with the parameters of plots so that you can customize them to your needs.
This is kind of a strange request. If we knew what your end objective is, it might be easier to understand, but I'm going to assume you want to plot and then save figures for each of the days.
df['day'] = (df['timestamp'] + pd.Timedelta('12h')).dt.date
for day in df['day'].unique():
mask = (df['day'] == day)
#<the code for the plot that you want>
plt.plot(x=df[mask]['timestamp'].dt.time,y=df[mask]['consumption'])
plt.savefig('filename'+str(day)+'.png')
plt.close()
Related
I have data on 5 runners for 1 week; start time and end time in a race. I want to graph how long it took each runner to finish the race
when I subtract the cells for start and end time, I get a result as 3:00 AM instead of 3 hours. I think this is why I'm having an issue creating a stacked column chart
how can I subtract 2 timestamps and get it to spit out just the hours instead of another time stamp?
I want the date on x axis and time to complete the race on y axis
thank you
here is a sample of the data
https://imgur.com/xRyDynl
i want a stacked column chart that has dates on the x axis. so 1/1 will have 1 bar that has 30 mins for racer 1, 35mins for 2, and 40 mins for 3
I would like to create a series of simulated values by resampling from empirical observations. The data I have are time series of 1-minute frequency. The simulations should be made on an arbitrary number of days with the same times each day. The twist is, that I need to sample conditional on the time, i.e. when sampling for a time of 8:00, it should be more probable to sample a value around 8:00 (but not limited to 8:00) from the original serie.
I have made a small sketch to show, how the draw-distribution changes depending on which time the a value is simulated for:
I.e. for T=0 it is more probable to draw a value from the actual distribution where the time of day is close to 0 and not probable to draw a value from the original distribution at the time of day of T=n/2 or later, where n is the number of unique timestamps in a day.
Here is a code snippet to generate sample data (I am aware that there is no need to sample conditional on this test data, but it is just to show the structure of the data)
import numpy as np
import pandas as pd
# Create a test data frame (only for illustration)
df = pd.DataFrame(index=pd.date_range(start='2020-01-01', end='2020-12-31', freq='T'))
df['MyValue'] = np.random.normal(0, scale=1, size=len(df))
print(df)
MyValue
2020-01-01 00:00:00 0.635688
2020-01-01 00:01:00 0.246370
2020-01-01 00:02:00 1.424229
2020-01-01 00:03:00 0.173026
2020-01-01 00:04:00 -1.122581
...
2020-12-30 23:56:00 -0.331882
2020-12-30 23:57:00 -2.463465
2020-12-30 23:58:00 -0.039647
2020-12-30 23:59:00 0.906604
2020-12-31 00:00:00 -0.912604
[525601 rows x 1 columns]
# Objective: Create a new time series, where each time the values are
# drawn conditional on the time of the day
I have not been able to find an answer on here, that fits my requirements. All help are appreciated.
I consider this sentence:
need to sample conditional on the time, i.e. when sampling for a time of 8:00, it should be more probable to sample a value around 8:00 (but not limited to 8:00) from the original serie.
Then, assuming the standard deviation is one sixth of the day (given your drawing)
value = np.random.normal(loc=current_time_sample, scale=total_samples/6)
i have a database that contains all flights data for 2019. I want to plot a time series where the y-axis is the number of flights that are delayed ('DEP_DELAY_NEW')and x-axis is the day of the week.
The day of the week column is an integer, i.e. 1 is Monday, 2 is Tuesday etc.
`# only select delayed flights`
delayed_flights = df_airports_clean[df_airports_clean['ARR_DELAY_NEW'] >0]
delayed_flights['DAY_OF_WEEK'].value_counts()
1 44787
7 40678
2 33145
5 29629
4 27991
3 26499
6 24847
Name: DAY_OF_WEEK, dtype: int64
How do i convert the above into a time series? Additionally how do i change the integer for the 'day of week' into a string (i.e. 'Monday instead of '1'). i couldn't find the answer to those questions in this forum. Thank you
Let's break down the problem into two parts.
Converting the num_delayed columns into a time series
I am not sure what you meant by a time-series here. But the below code would work well for your plotting purpose.
delayed_flights = df_airports_clean[df_airports_clean['ARR_DELAY_NEW'] > 0]
delayed_series = delayed_flights['DAY_OF_WEEK'].value_counts()
delayed_df = pd.DataFrame(delayed_series, columns=['NUM_DELAYS'])
delayed_array = delayed_df['NUM_DELAYS'].values
delayed_array contains the array of delayed flight counts in order.
Converting the day in int into a weekday
You can easily do this by using the calendar module.
>>> import calendar
>>> calendar.day_name[0]
'Monday'
If Monday is not the first day of week, you can use setfirstweekday to change it.
In your case, your day integers are 1-indexed and hence you would need to subtract 1 to make it 0-indexed. Another easy workaround would be to define a dictionary with keys as day_int and values as weekday.
I have written code for customizing my x ticks, snippet of the same is below
arr_label = ['sum_msg_len','log_count','info_hit','debug_hit','error_hit']
for label in arr_label :
fig = plt.figure(figsize=(15,6))
axes = fig.add_axes([1,1,1,1])
axes.xaxis.set_major_locator(plt.LinearLocator(30))
axes.tick_params(axis ='x',labelsize=6)
axes.plot(df.index,df[label],'g',label =label)
axes.legend()
fig.autofmt_xdate()
fig.savefig('images_indv/'+app_index+"_"+label+".png",bbox_inches='tight')
#fig.close()
fig.clf()
my requirement is that is have timestamps spaced by minute and i want to plot timestamp vs ('sum_msg_len'/'log_count'/'info_hit'/'debug_hit'/'error_hit') one by one,
but problem is X ticks, i want some specified no of ticks to appear within the range of the data which i am plotting.
Earlier when i was not specifing any Locator then all the timestamps got overlapped and one cannot read the timestamps properly. So when i try to use a locator, it labels the x-axis with out any relation to the plotted value.
Like if i use LinearLocator(30) it just plots the first 00 to 29 mins in the graph,and if i use LinearLocator(50) it just plots the first 00 to 49 mins in the graph with no change to the y axis values. Plots of both I am putting below. I also tried with different locators Like MultipleLocator and MaxNlocator, but issue sustains
In short, I just want the graph plotted for 21July 00:00:00 to 22 July 00:00:00 which will be 1440 entries but the i want to see around 30-40 intermediate entries mentioned on the plot.
greeting for every one,
I have data in excel file and i want to draw a plot in Matlab in which the Y axis represent the time with starting time in 10:45 for 24 hours i.e, from 10:00 am to the next day in 10:00 am. The x-axis represents the excel file data( the values of frequencies during 24 hours)
how to put the different times in the y axis showing the values of time in the formula of time(00:00 am/pm) using matlab?
if i use this code: ylim(subplot2,[1 24]) and xlim(subplot2,[170 230]) it will be plotted but the y-axis shows only the hours from 1 to 24 hours and i need the y-axis from 10:45 am(starting time) to(10:45)am in interval 24 hours
You can create custom tick labels by specifying tick strings with the command:
time_cells = {'10:45','11:45',...,'9:45','10:45'};
set(gca, 'YTickLabel', time_cells)
Where gca is the handle of your current plot (axes), and the time_cells is a cell array containing all your required tick labels (without the ellipse). It is probably easiest to generate this using a for-loop to create the numbers you want, and then num2str to convert to the strings you need.