How to display dataframe without break in Tkinter app - python-3.x

Im using code bellow to display pandas dataframe, but in my tkinter app i get columns displayed and splited with '\'
def on_show_frame(self, event):
T = tk.Text(self, height=30, width=120, wrap=None)
T.grid(row=2, column=4, columnspan=14, rowspan=14, padx=10, pady=10)
T.insert(tk.END, self.controller.df)
Is there option to add to tk.Text to fix this problem? Soo data is displayed without break.
Here is how it looks in my app ,data display is contionued after all rows:
datetime mean std sum 1 2 3 4 5 6 7 8 9 \
0 2017-07-12 08:00:00 1.805556 2.447383 65 8 9 1 0 0 2 0 0 0
1 2017-07-12 08:01:00 0.833333 1.133893 30 0 0 1 0 0 1 0 0 0
2 2017-07-12 08:02:00 1.027778 1.182881 37 0 0 1 0 0 2 0 2 0
3 2017-07-12 08:03:00 0.944444 1.286067 34 0 0 1 0 0 0 0 1 0
10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 \
0 0 0 3 1 0 3 0 1 1 1 0 0 1 3 0 0 2 0
1 0 1 2 4 0 3 1 2 0 1 0 0 2 4 0 2 2 0
2 1 1 3 0 0 3 1 2 1 2 1 0 2 3 0 2 0 0
3 1 1 2 1 0 3 1 3 0 1 1 0 2 4 0 0 4 0
etc.
238 1 1 0 0 0 2 0 0 0 0 1 0 1 0 0 0 0 0
239 1 1 0 0 2 0 0 0 0 0 2 0 3 0 0 0 0 0
240 0 4 0 0 0 1 0 0 0 0 1 0 2 0 0 0 1 0
28 29 30 31 32
0 0 3 1 1 2
1 0 1 1 1 1
Than you for all help in advance.

Related

How to write a LAMBDA function in Excel for this recursive calculation

I'm trying to come up with a LAMBDA formula that captures the following recursive calculation:
Column A has 40 rows with integers between 1 and 40. Column B divides each integer in column A by 6 and rounds it up. Column C divides each integer in column B by 6 and rounds it up. This continues until the integer is 1 or less, and then I want the sum of the full row for a given integer. So, for example, for the number 25 in column A, I get 6 (5 from column B and 1 from column C). For the number 40 in column A, I get 10 (7 from column B, 2 from column C, 1 from column D).
Is it possible to come up with a LAMBDA function that would get me the correct output for a given number in column A? I don't want to use VBA - just want to use the LAMBDA function for this.
Image of the XL
Data
Column 1
Column 2
Column 3
Column 4
Sum
1
0
0
0
0
1
2
1
0
0
0
1
3
1
0
0
0
1
4
1
0
0
0
1
5
1
0
0
0
1
6
1
0
0
0
1
7
2
1
0
0
3
8
2
1
0
0
3
9
2
1
0
0
3
10
2
1
0
0
3
11
2
1
0
0
3
12
2
1
0
0
3
13
3
1
0
0
4
14
3
1
0
0
4
15
3
1
0
0
4
16
3
1
0
0
4
17
3
1
0
0
4
18
3
1
0
0
4
19
4
1
0
0
5
20
4
1
0
0
5
21
4
1
0
0
5
22
4
1
0
0
5
23
4
1
0
0
5
24
4
1
0
0
5
25
5
1
0
0
6
26
5
1
0
0
6
27
5
1
0
0
6
28
5
1
0
0
6
29
5
1
0
0
6
30
5
1
0
0
6
31
6
1
0
0
7
32
6
1
0
0
7
33
6
1
0
0
7
34
6
1
0
0
7
35
6
1
0
0
7
36
6
1
0
0
7
37
7
2
1
0
10
Use BYROW and SCAN:
=BYROW(A1:A40,LAMBDA(c,SUM(SCAN(c,SEQUENCE(,4,6,0),LAMBDA(a,b,IF(a=1,0,ROUNDUP(a/b,0)))))))

How to return first item when the items in the pandas dataframe window are the same?

I am a python beginner.
I have the following pandas DataFrame, with only two columns; "Time" and "Input".
I want to loop over the "Input" column. Assuming we have a window size w= 3. (three consecutive values) such that for every selected window, we will check if all the items/elements within that window are 1's, then return the first item as 1 and change the remaining values to 0's.
index Time Input
0 11 0
1 22 0
2 33 0
3 44 1
4 55 1
5 66 1
6 77 0
7 88 0
8 99 0
9 1010 0
10 1111 1
11 1212 1
12 1313 1
13 1414 0
14 1515 0
My intended output is as follows
index Time Input What_I_got What_I_Want
0 11 0 0 0
1 22 0 0 0
2 33 0 0 0
3 44 1 1 1
4 55 1 1 0
5 66 1 1 0
6 77 1 1 1
7 88 1 0 0
8 99 1 0 0
9 1010 0 0 0
10 1111 1 1 1
11 1212 1 0 0
12 1313 1 0 0
13 1414 0 0 0
14 1515 0 0 0
What should I do to get the desired output? Am I missing something in my code?
import pandas as pd
import re
pd.Series(list(re.sub('111', '100', ''.join(df.Input.astype(str))))).astype(int)
Out[23]:
0 0
1 0
2 0
3 1
4 0
5 0
6 1
7 0
8 0
9 0
10 1
11 0
12 0
13 0
14 0
dtype: int32

How to create a confusion matrix from an incomplete dataframe in python

I have a dataframe which looks like this:
I1 I2 V
0 1 1 300
1 1 5 7
2 1 9 3
3 2 2 280
4 2 3 4
5 5 1 5
6 5 5 400
I1 and I2 represent indexes while V represent values.
The indexes with values equal to 0 have been omitted, but I'd like to get a confusion matrix showing all the values, i.e. something like this:
1 2 3 4 5 6 7 8 9
1 300 0 0 0 7 0 0 0 3
2 0 280 4 0 0 0 0 0 0
3 0 0 0 0 0 0 0 0 0
4 0 0 0 0 0 0 0 0 0
5 5 0 0 0 400 0 0 0 0
6 0 0 0 0 0 0 0 0 0
7 0 0 0 0 0 0 0 0 0
8 0 0 0 0 0 0 0 0 0
9 0 0 0 0 0 0 0 0 0
How can I do it?
Thanks in advance!
Use set_index with unstack for reshape, for append missing values add reindex and for data cleaning rename_axis :
r = range(1, 10)
df = (df.set_index(['I1','I2'])['V']
.unstack(fill_value=0)
.reindex(index=r, columns=r, fill_value=0)
.rename_axis(None)
.rename_axis(None, axis=1))
print (df)
1 2 3 4 5 6 7 8 9
1 300 0 0 0 7 0 0 0 3
2 0 280 4 0 0 0 0 0 0
3 0 0 0 0 0 0 0 0 0
4 0 0 0 0 0 0 0 0 0
5 5 0 0 0 400 0 0 0 0
6 0 0 0 0 0 0 0 0 0
7 0 0 0 0 0 0 0 0 0
8 0 0 0 0 0 0 0 0 0
9 0 0 0 0 0 0 0 0 0
Detail:
print (df.set_index(['I1','I2'])['V']
.unstack(fill_value=0))
I2 1 2 3 5 9
I1
1 300 0 0 7 3
2 0 280 4 0 0
5 5 0 0 400 0
Alternative solution with pivot, if all values are integers:
r = range(1, 10)
df = (df.pivot('I1','I2', 'V')
.fillna(0)
.astype(int)
.reindex(index=r, columns=r, fill_value=0)
.rename_axis(None)
.rename_axis(None, axis=1))
print (df)
1 2 3 4 5 6 7 8 9
1 300 0 0 0 7 0 0 0 3
2 0 280 4 0 0 0 0 0 0
3 0 0 0 0 0 0 0 0 0
4 0 0 0 0 0 0 0 0 0
5 5 0 0 0 400 0 0 0 0
6 0 0 0 0 0 0 0 0 0
7 0 0 0 0 0 0 0 0 0
8 0 0 0 0 0 0 0 0 0
9 0 0 0 0 0 0 0 0 0
Option 1: Using numpy you can
In [150]: size = df[['I1', 'I2']].values.max()
In [151]: arr = np.zeros((size, size))
In [152]: arr[df.I1-1, df.I2-1] = df.V
In [153]: idx = np.arange(1, size+1)
In [154]: pd.DataFrame(arr, index=idx, columns=idx).astype(int)
Out[154]:
1 2 3 4 5 6 7 8 9
1 300 0 0 0 7 0 0 0 3
2 0 280 4 0 0 0 0 0 0
3 0 0 0 0 0 0 0 0 0
4 0 0 0 0 0 0 0 0 0
5 5 0 0 0 400 0 0 0 0
6 0 0 0 0 0 0 0 0 0
7 0 0 0 0 0 0 0 0 0
8 0 0 0 0 0 0 0 0 0
9 0 0 0 0 0 0 0 0 0
Option 2: Using scipy.sparse.csr_matrix
In [178]: from scipy.sparse import csr_matrix
In [179]: size = df[['I1', 'I2']].values.max()
In [180]: idx = np.arange(1, size+1)
In [181]: pd.DataFrame(csr_matrix((df['V'], (df['I1']-1, df['I2']-1)), shape=(size, si
...: ze)).toarray(), index=idx, columns=idx)
Out[181]:
1 2 3 4 5 6 7 8 9
1 300 0 0 0 7 0 0 0 3
2 0 280 4 0 0 0 0 0 0
3 0 0 0 0 0 0 0 0 0
4 0 0 0 0 0 0 0 0 0
5 5 0 0 0 400 0 0 0 0
6 0 0 0 0 0 0 0 0 0
7 0 0 0 0 0 0 0 0 0
8 0 0 0 0 0 0 0 0 0
9 0 0 0 0 0 0 0 0 0

Sumproduct ranges excluding rows with sum=0 or row containing blank cells

Here is a sample data set:
1 2 3 4 5 6 7 8 9 10 11 12
a 20 0 9 0 0 0 0 0 0 0 5 9
a 0 10 0 0 0 0 0 0 0 0 0 10
a 20 10 0 0 0 0 0 0 0 0 0 18
a 0 10 7 0 0 0 0 0 0 0 18
a 0 0 2 0 0 0 0 5 4.5 0 0 18
a 0 10 8 0 0 0 0 0 0 0 5 8
b 0 10 6 0 0 0 0 0 0 0 0 0
b 10 0 9 0 0 0 0 0 0 0 0 5
b 0 10 9 3.5 0 0 0 0 0 0 0
b 0 10 5 0 0 0 0 0 0 0 5 8
b 10 8 6 0 0 0 0 0 0 0 5 10
b 0 15 24 0 5 0 0 0 0 0 5 9
c 0 0 8 0 4.5 0 0 5 0 0 0 0
c 0 0 0 0 0 0 0 0 0 0 0 0
c 10 10 27 0 0 0 0 0 0 0 0
c 5 20 5 0 10 0 0 0 0 0 0 0
c 10 10 10 0 0 0 0 0 0 0 0 10
d 0 0 0 0 0 0 0 0 0 0 0 0
d 10 5 5 0 0 0 0 0 0 5 10
I have to calculate the circular vector length (r) of each type: a, b, c and d.
The individuals of each set containing blank or like the second c containing all 0's give error for the formula I am using which forces me to calculate r for each individual first using formula-
For first a (in O column):
=IF(OR(COUNTBLANK(B2:M2)>0,SUM(B2:M2)=0),"",SQRT(SUMPRODUCT(B2:M2,B$1:M$1)^2+SUMPRODUCT(B2:M2,B$1:M$1)^2)/SUM(B2:M2))
Then for average over all a with:
=IF(SUMIFS(O$2:O$20,A$2:A$20,Q2)=0,"",AVERAGEIFS(O$2:O$20,A$2:A$20,Q2))
What I need is something to combine both formula so that excel:
First checks for matching type: a, b, c, d
Excludes rows with blanks or sum=0
Sumproduct each remaining row
Average over all a's for example
Any help is greatly appreciated

gnuplot: scale axis of matrix plot

I have a matrix, output.dat:
0 0 0 0 0 0 3 7 1 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 11 16 6 1 0 0 0 0 0 0 0 0 0
0 0 0 0 0 7 8 4 16 4 1 0 0 0 0 0 0 0 0 0
0 0 0 0 0 2 2 5 11 3 1 0 0 0 0 0 0 0 0 0
0 0 0 0 0 3 1 9 10 9 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 12 28 13 11 5 0 0 0 0 0 0 0 0 0 0
0 0 0 0 1 6 17 33 14 2 0 5 2 0 0 0 0 0 0 0
0 0 0 0 0 1 13 15 11 6 0 0 5 7 0 0 0 0 0 0
0 0 0 0 0 0 3 3 8 3 0 0 0 3 0 0 0 0 0 0
0 0 0 0 0 0 0 8 8 8 1 2 1 3 2 0 0 0 0 0
0 0 0 0 0 0 0 1 17 10 4 7 4 12 3 0 0 0 0 0
0 0 0 0 0 3 2 3 6 22 9 5 8 5 1 0 0 0 0 0
0 0 0 0 0 1 5 7 10 35 4 6 6 9 4 0 0 0 0 0
0 0 0 0 0 2 12 12 30 52 23 11 8 7 5 1 0 0 0 0
0 0 0 0 1 7 25 16 33 30 26 16 21 19 5 2 0 0 0 0
0 0 0 0 0 0 12 36 19 22 28 19 30 17 9 0 0 0 0 0
0 0 0 0 0 11 18 12 37 32 27 26 33 21 10 12 3 0 0 0
0 0 0 0 0 11 14 23 44 59 45 26 28 9 3 7 0 0 0 0
0 0 0 0 0 0 0 8 19 23 22 11 34 32 25 7 0 0 0 0
0 0 0 0 0 0 0 0 4 8 9 16 21 26 20 11 12 4 6 2
Using this in a bash script results in a perfectly fine looking plot of the matrix:
echo "set terminal png font arial 30 size 1600,1200;
set output 'output.png';set xrange [1:20];set yrange [1:20];set xlabel 'x';set ylabel 'y';
set pm3d map;set pm3d interpolate 0,0;splot 'output.dat' matrix" | gnuplot
However, I'd like the x-axis and y-axis to say "0...1" instead of "1...20". If I simply change the xrange [1:20] to [0:1] no data is plotted. And scaling the data doesn't work. Using xticlabels (at least as I understand it) hasn't successfully changed the axes either.
How can I change the x and y to say "0...1" instead of "1...20"?
I don't know what you tried, but scaling in the using statement works fine:
echo "set terminal pngcairo;set autoscale fix; set tics out nomirror;
set xlabel 'x';set ylabel 'y'; set pm3d map interpolate 0,0;
splot 'output.dat' matrix using (\$1/19.0):(\$2/19.0):3" | gnuplot > output.png

Resources