I need to make a diagram which shows the lines of different ceramic firing schedules. I want them to be plotted in one diagram and they need to be plotted in time-relative ax. It needs to show the different durations in a right way. I don't seem to be able to achieve this.
What I have is the following:
First table:
Pendelen
Temp. per uur
Stooktemp.
Stooktijd 4
Stooktijd Cum.4
95
120
1:15:47
1,26
205
537
2:02:03
3,30
80
620
1:02:15
4,33
150
1075
3:02:00
7,37
50
1196
2:25:12
9,79
10
1196
0:10:00
9,95
Total
9:57:17
Second table:
Pendelen
Temp. per uur
Stooktemp.
Stooktijd 5
Stooktijd Cum.5
140
540
3:51:26
3,86
65
650
1:41:32
5,55
140
1095
3:10:43
8,73
50
1222
2:32:24
11,27
Total
11:16:05
The lines to be shown in a diagram should represent the 'stooktijd cum.' for both programs 4 and 5 (which is a cumulation of the time needed to fire up the kiln from it's previous temp. in the schedule). One should be able to see in the diagram that program 5 takes more time to reach it's endtemp.
What I achieved is nothing more than a diagram with two lines, but only plotted in the 'stooktijd cum.4' points from program 4. The image shows a screenshot of this diagram.
But as you can see, this doesn't look like program 5 takes more time to reach it's end. I would like it to show something like this:
Create this table :
p4
p5
0
10
3.86
540
5.55
650
8.73
1095
11.27
1222
0
0
1.26
120
3.3
537
4.33
620
7.37
1075
9.79
1196
9.95
1196
Select all > F11 > Design > Chg Chart type > scatter with straight line and marker
Here's my tryout :
Please share if it works/not. ( :
I have attached dataset
Time podId Batt (avg) Temp (avg)
0 2019-10-07 9999 6.1 71.271053
1 2019-10-08 9999 6.0 71.208285
2 2019-10-09 9999 5.9 77.896628
3 2019-10-10 9999 5.8 78.709279
4 2019-10-11 9999 5.7 71.849283
59 2019-12-05 8888 5.5 76.548780
60 2019-12-06 8888 5.4 73.975295
61 2019-12-07 8888 5.3 76.209434
62 2019-12-08 8888 5.2 76.717481
63 2019-12-09 8888 5.1 70.433920
I imported it using- batt2 = pd.read_csv('battV2.csv')
I need to determine when battery change occurs, i.e. when Batt (avg) increases from previous row. I am able to do this by using the 'diff' in this manner batt2['Vdiff']=batt2['Batt (avg)'].diff(-1)
Now for each podId I need to sum the Vdiff column between battery changes, i.e. between two negative Vdiff values
Also I need to average Temp (avg) over the same range
Count Time to determine the number of days between battery changes
Thanks.
There are a couple of steps involved:
Import data
Be aware that I have changed your dataset a bit to provide a valid test case for your requirements (in your given dataset, Batt_avg never increases).
from io import StringIO
import pandas as pd
data = StringIO('''Time podId Batt_avg Temp_avg
0 2019-10-07 9999 6.1 71.271053
1 2019-10-08 9999 6.0 71.208285
2 2019-10-09 9999 5.9 77.896628
3 2019-10-10 9999 5.8 78.709279
4 2019-10-11 9999 5.7 71.849283
5 2019-10-12 9999 6.0 71.208285
6 2019-10-13 9999 5.9 77.896628
7 2019-10-14 9999 5.8 78.709279
8 2019-10-15 9999 5.7 71.849283
59 2019-12-05 8888 5.5 76.548780
60 2019-12-06 8888 5.4 73.975295
61 2019-12-07 8888 5.3 76.209434
62 2019-12-08 8888 5.2 76.717481
63 2019-12-09 8888 5.1 70.433920''')
df = pd.read_csv(data, delim_whitespace=True)
Determine changes in battery voltage
As you have already found out, you can do this with diff(). I am not certain that the code you have given with df.Batt_avg.diff(-1) satisfies your requirement of: "i.e. when Batt (avg) increases from previous row". Instead, for a given row, this shows how the value will change in the next row (multiplied by -1). If you need the negative change to the previous row, you can instead use -df.Batt_avg.diff().
df['Batt_avg_diff'] = df.Batt_avg.diff(-1)
Group data and apply the aggregation functions
You can express your grouping conditions as df.podId.diff().fillna(0.0) != 0 for the podIds and df.Batt_avg_diff.fillna(0.0) < 0 for the condition "between battery changes, i.e. between two negative Vdiff values" - either of these will trigger a new group. Use cumsum() on the triggers to create the groups. Then you can use groupby() to act on these groups and transform() to expand the results to the dimensions of the original dataframe.
df['group'] = ((df.podId.diff().fillna(0.0) != 0) | (df.Batt_avg_diff.fillna(0.0) < 0)).cumsum()
df['Batt_avg_diff_sum'] = df.Batt_avg_diff.groupby(df.group).transform('sum')
df['Temp_avg_mean'] = df.Temp_avg.groupby(df.group).transform('mean')
Datetime calculations
For the final step, you need to first convert the string to datetime to allow date operations. Then you can use groupby operations to get the max and min in each group, and take the delta.
df.Time = pd.to_datetime(df.Time)
df['Time_days'] = df.Time.groupby(df.group).transform('max') - df.Time.groupby(df.group).transform('min')
Note: if you do not need or want the aggregate data in the original dataframe, just apply the functions directly (without transform):
df_group = pd.DataFrame()
df_group['Batt_avg_diff_sum'] = df.Batt_avg_diff.groupby(df.group).sum()
df_group['Temp_avg_mean'] = df.Temp_avg.groupby(df.group).mean()
df_group['Time_days'] = df.Time.groupby(df.group).max() - df.Time.groupby(df.group).min()
I am following this section, I realize this code was made using Python 2 but they have xticks showing on the 'Start Date' axis and I do not. My chart only shows Start Date and no dates are provided. I have attempted to convert the object to datetime but that shows the dates and breaks the graph below it and the line is missing:
Graph
# Set as_index=False to keep the 0,1,2,... index. Then we'll take the mean of the polls on that day.
poll_df = poll_df.groupby(['Start Date'],as_index=False).mean()
# Let's go ahead and see what this looks like
poll_df.head()
Start Date Number of Observations Obama Romney Undecided Difference
0 2009-03-13 1403 44 44 12 0.00
1 2009-04-17 686 50 39 11 0.11
2 2009-05-14 1000 53 35 12 0.18
3 2009-06-12 638 48 40 12 0.08
4 2009-07-15 577 49 40 11 0.09
Great! Now plotting the Difference versus time should be straight forward.
# Plotting the difference in polls between Obama and Romney
fig = poll_df.plot('Start Date','Difference',figsize=(12,4),marker='o',linestyle='-',color='purple')
Notebook is here
I would like to convert 'bytes' data into a Pandas dataframe.
The data looks like this (few first lines):
(b'#Settlement Date,Settlement Period,CCGT,OIL,COAL,NUCLEAR,WIND,PS,NPSHYD,OCGT'
b',OTHER,INTFR,INTIRL,INTNED,INTEW,BIOMASS\n2017-01-01,1,7727,0,3815,7404,3'
b'923,0,944,0,2123,948,296,856,238,\n2017-01-01,2,8338,0,3815,7403,3658,16,'
b'909,0,2124,998,298,874,288,\n2017-01-01,3,7927,0,3801,7408,3925,0,864,0,2'
b'122,998,298,816,286,\n2017-01-01,4,6996,0,3803,7407,4393,0,863,0,2122,998'
The columns headers appear at the top. each subsequent line is a timestamp and numbers.
Is there a straightforward way to do this?
Thank you very much
#Paula Livingstone:
This seems to work:
s=str(bytes_data,'utf-8')
file = open("data.txt","w")
file.write(s)
df=pd.read_csv('data.txt')
maybe this can be done without using a file in between.
I had the same issue and found this library https://docs.python.org/2/library/stringio.html from the answer here: How to create a Pandas DataFrame from a string
Try something like:
from io import StringIO
s=str(bytes_data,'utf-8')
data = StringIO(s)
df=pd.read_csv(data)
You can also use BytesIO directly:
from io import BytesIO
df = pd.read_csv(BytesIO(bytes_data))
This will save you the step of transforming bytes_data to a string
Ok cool, your input formatting is quite awkward but the following works:
with open('file.txt', 'r') as myfile:
data=myfile.read().replace('\n', '') #read in file as a string
df = pd.Series(" ".join(data.strip(' b\'').strip('\'').split('\' b\'')).split('\\n')).str.split(',', expand=True)
print(df)
this produces the following:
0 1 2 3 4 5 6 7 \
0 #Settlement Date Settlement Period CCGT OIL COAL NUCLEAR WIND PS
1 2017-01-01 1 7727 0 3815 7404 3923 0
2 2017-01-01 2 8338 0 3815 7403 3658 16
3 2017-01-01 3 7927 0 3801 7408 3925 0
8 9 10 11 12 13 14 15
0 NPSHYD OCGT OTHER INTFR INTIRL INTNED INTEW BIOMASS
1 944 0 2123 948 296 856 238
2 909 0 2124 998 298 874 288
3 864 0 2122 998 298 816 286 None
In order for this to work you will need to ensure that your input file contains only a collection of complete rows. For this reason I removed the partial row for the purposes of the test.
As you have said that the data source is an http GET request then the initial read would take place using pandas.read_html.
More detail on this can be found here. Note specifically the section on io (io : str or file-like).
Just trying to get used to gnuplot. I searched a few pages on this site looking for the answer, read the documentation (4.6), and still haven't found the answer. say I have a data file like this:
0.0 0
1.0 25
2.0 55
3.0 110
4.0 456
5.0 554
6.0 345
and I want to label all the data points on the plot. How do I do this? I tried this suggestion plot 'exp.dat' u 1:2 w labels point offset character 0,character 1 tc rgb "blue" but it didn't work. It gave me a Not enough columns for this style response. I'm sure it's something I'm doing but I'm not sure what. Any help would be appreciated. Thanks.
I think you are missing strings for labels. You can do
flabel(y)=sprintf("y=%.2f", y)
plot '-' u 1:2:(flabel($2)) w labels point offset character 0,character 1 tc rgb "blue"
0.0 0
1.0 25
2.0 55
3.0 110
4.0 456
5.0 554
6.0 345