Rescaling the plot of a tree with gnuplot

Rescaling the plot of a tree with gnuplot - gnuplot

I am using the following code in gnuplot to draw a tree from different inputs.
### tree diagram with gnuplot
reset session
#ID Parent Name Colors shape
# put datablock into strings
IDs = Parents = Names = Colors = Shape = ""
set table $Dummy
plot "tmp.dat" u (IDs = IDs.strcol(1)." "): \
(Parents = Parents.strcol(2)." "): \
(Names = Names.strcol(3)." "): \
(Colors = Colors.strcol(4)." "): \
(Shape = Shape.strcol(5)." ") w table
unset table
# Top node has no parent ID "NaN"
Start(n) = int(sum [i=1:words(Parents)] (word(Parents,i) eq "NaN" ? int(word(IDs,i)) : 0))
# get list index by ID
ItemIdx(s,n) = n == n ? (tmp=NaN, sum [i=1:words(s)] ((word(s,i)) == n ? (tmp=i,0) : 0), tmp) : NaN
# get parent of ID n
Parent(n) = word(Parents,ItemIdx(IDs,n))
# get level of ID n, recursive function
Level(n) = n == n ? Parent(n)>0 ? Level(Parent(n))-1 : 0 : NaN
# get number of children of ID n
ChildCount(n) = int(sum [i=1:words(Parents)] (word(Parents,i)==n))
# Create child list of ID n
ChildList(n) = (Ch = " ", sum [i=1:words(IDs)] (word(Parents,i)==n ? (Ch = Ch.word(IDs,i)." ",1) : (Ch,0) ), Ch )
# m-th child of ID n
Child(n,m) = word(ChildList(n),m)
# List of leaves, recursive function
LeafList(n) = (LL="", ChildCount(n)==0 ? LL=LL.n." " : sum [i=1:ChildCount(n)]
(LL=LL.LeafList(Child(n,i)), 0),LL)
# create list of all leaves
LeafAll = LeafList(Start(0))
# get x-position of ID n, recursive function
XPos(n) = ChildCount(n) == 0 ? ItemIdx(LeafAll,n) : (sum [i=1:ChildCount(n)](XPos(Child(n,i))))/(ChildCount(n))
# create the tree datablock for plotting
set print $Tree
do for [j=1:words(IDs)] {
n = int(word(IDs,j))
print sprintf("% 3d % 7.2f % 4d % 5s % 8s", n, XPos(n), Level(n), word(Names,j), word(Colors,j))
}
set print
print $Tree
# get x and y distance from ID n to its parent
dx(n) = XPos(Parent(int(n))) - XPos(int(n))
dy(n) = Level(Parent(int(n))) - Level(int(n))
unset border
unset tics
set offsets 0.25, 0.25, 0.25, 0.25
array shape[words(IDs)] # pointtype 6 = circle, pointtype 4 = square
array color[words(IDs)]
do for [i=1:words(IDs)] {
color[i] = int(word(Colors,i))
shape[i] = int(word(Shape,i))
print sprintf("color[%2d] = %d",i,color[i])
}
plot $Tree u 2:3:(dx($1)):(dy($1)) w vec nohead ls -1 not,\
"" u 2:3:(shape[$1]+1):(color[$1]) w p pt variable ps 6 lc rgb variable not, \
"" u 2:3:(shape[$1]) w p pt variable ps 6 lw 1.5 lc rgb "black" not, \
"" u 2:3:4 w labels offset 0,0.1 center not
### end of code
for a small dataset like this one, the output works perfect
1 2.00 0 y_{45} 0xFE1034
2 1.00 -1 - 0x118C4B
3 2.99 -1 y_{37} 0xFE1034
4 2.00 -2 - 0xC6C1C1
5 3.98 -2 y_{13} 0xFE1034
6 3.00 -3 - 0x118C4B
7 4.97 -3 y_{14} 0xFE1034
8 4.00 -4 - 0x118C4B
9 5.94 -4 y_{20} 0xFE1034
10 5.00 -5 - 0xC6C1C1
11 6.88 -5 y_{27} 0xFE1034
12 6.00 -6 - 0xC6C1C1
13 7.75 -6 y_{41} 0xFE1034
14 7.00 -7 - 0xC6C1C1
15 8.50 -7 y_{54} 0xFE1034
16 8.00 -8 - 0xC6C1C1
17 9.00 -8 - 0xC6C1C1
But, for larger datasets the tree becomes cramped, the nodes overlap, and looks ugly.
Moreover, when there are more than a few hundred nodes like below, I get a stack overflow error and the plot does not appear. The error comes from this line
LeafAll = LeafList(Start(0))
Any help with this will be appreciated.
1 NaN y_{295} 0xFE1034 6
2 1 x_{0} 0x33B2FF 6
3 1 y_{1285} 0xFE1034 6
4 2 - 0xC6C1C1 8
5 2 - 0xC6C1C1 8
6 3 x_{3} 0x33B2FF 6
7 3 y_{18} 0xFE1034 6
8 6 - 0xC6C1C1 8
9 6 - 0xC6C1C1 8
10 7 x_{13} 0x33B2FF 6
11 7 y_{21} 0xFE1034 6
12 10 - 0xC6C1C1 8
13 10 - 0xC6C1C1 8
14 11 x_{10} 0x33B2FF 6
15 11 y_{50} 0xFE1034 6
16 14 - 0xC6C1C1 8
17 14 - 0xC6C1C1 8
18 15 - 0x118C4B 4
19 15 y_{62} 0xFE1034 6
20 19 - 0xC6C1C1 8
21 19 y_{48} 0xFE1034 6
22 21 x_{41} 0x33B2FF 6
23 21 y_{1839} 0xFE1034 6
24 22 - 0xC6C1C1 8
25 22 - 0xC6C1C1 8
26 23 - 0xC6C1C1 8
27 23 y_{44} 0xFE1034 6
28 27 x_{12} 0x33B2FF 6
29 27 y_{15} 0xFE1034 6
30 28 - 0xC6C1C1 8
31 28 - 0xC6C1C1 8
32 29 x_{58} 0x33B2FF 6
33 29 y_{127} 0xFE1034 6
34 32 - 0xC6C1C1 8
35 32 - 0xC6C1C1 8
36 33 - 0xC6C1C1 8
37 33 y_{60} 0xFE1034 6
38 37 - 0xC6C1C1 8
39 37 y_{1825} 0xFE1034 6
40 39 - 0xC6C1C1 8
41 39 y_{1878} 0xFE1034 6
42 41 - 0xC6C1C1 8
43 41 y_{33} 0xFE1034 6
44 43 - 0xC6C1C1 8
45 43 y_{3} 0xFE1034 6
46 45 - 0xC6C1C1 8
47 45 y_{1435} 0xFE1034 6
48 47 - 0xC6C1C1 8
49 47 y_{218} 0xFE1034 6
50 49 - 0xC6C1C1 8
51 49 y_{20} 0xFE1034 6
52 51 - 0xC6C1C1 8
53 51 y_{13} 0xFE1034 6
54 53 - 0xC6C1C1 8
55 53 y_{47} 0xFE1034 6
56 55 - 0xC6C1C1 8
57 55 y_{2321} 0xFE1034 6
58 57 - 0xC6C1C1 8
59 57 y_{28} 0xFE1034 6
60 59 - 0xC6C1C1 8
61 59 y_{52} 0xFE1034 6
62 61 - 0xC6C1C1 8
63 61 y_{2410} 0xFE1034 6
64 63 - 0xC6C1C1 8
65 63 y_{1751} 0xFE1034 6
66 65 - 0xC6C1C1 8
67 65 y_{186} 0xFE1034 6
68 67 - 0xC6C1C1 8
69 67 y_{1850} 0xFE1034 6
70 69 - 0xC6C1C1 8
71 69 y_{491} 0xFE1034 6
72 71 - 0xC6C1C1 8
73 71 y_{23} 0xFE1034 6
74 73 - 0xC6C1C1 8
75 73 y_{0} 0xFE1034 6
76 75 x_{52} 0x33B2FF 6
77 75 y_{1110} 0xFE1034 6
78 76 - 0xC6C1C1 8
79 76 - 0xC6C1C1 8
80 77 - 0xC6C1C1 8
81 77 y_{57} 0xFE1034 6
82 81 - 0xC6C1C1 8
83 81 y_{12} 0xFE1034 6
84 83 - 0xC6C1C1 8
85 83 y_{1269} 0xFE1034 6
86 85 - 0xC6C1C1 8
87 85 y_{1278} 0xFE1034 6
88 87 - 0x118C4B 4
89 87 y_{63} 0xFE1034 6
90 89 - 0xC6C1C1 8
91 89 y_{1338} 0xFE1034 6
92 91 - 0xC6C1C1 8
93 91 y_{1271} 0xFE1034 6
94 93 - 0xC6C1C1 8
95 93 y_{41} 0xFE1034 6
96 95 - 0xC6C1C1 8
97 95 y_{65} 0xFE1034 6
98 97 - 0x118C4B 4
99 97 y_{1630} 0xFE1034 6
100 99 - 0xC6C1C1 8
101 99 y_{2068} 0xFE1034 6
102 101 - 0xC6C1C1 8
103 101 y_{2532} 0xFE1034 6
104 103 - 0xC6C1C1 8
105 103 y_{1760} 0xFE1034 6
106 105 - 0xC6C1C1 8
107 105 y_{188} 0xFE1034 6
108 107 - 0xC6C1C1 8
109 107 y_{2405} 0xFE1034 6
110 109 - 0xC6C1C1 8
111 109 y_{1867} 0xFE1034 6
112 111 - 0xC6C1C1 8
113 111 y_{1482} 0xFE1034 6
114 113 - 0xC6C1C1 8
115 113 y_{79} 0xFE1034 6
116 115 - 0xC6C1C1 8
117 115 y_{11} 0xFE1034 6
118 117 - 0xC6C1C1 8
119 117 y_{5226} 0xFE1034 6
120 119 - 0xC6C1C1 8
121 119 y_{354} 0xFE1034 6
122 121 - 0xC6C1C1 8
123 121 y_{2748} 0xFE1034 6
124 123 - 0xC6C1C1 8
125 123 y_{27} 0xFE1034 6
126 125 - 0xC6C1C1 8
127 125 y_{426} 0xFE1034 6
128 127 - 0xC6C1C1 8
129 127 y_{12571} 0xFE1034 6
130 129 - 0xC6C1C1 8
131 129 y_{5089} 0xFE1034 6
132 131 - 0xC6C1C1 8
133 131 y_{2490} 0xFE1034 6
134 133 - 0xC6C1C1 8
135 133 y_{1752} 0xFE1034 6
136 135 - 0xC6C1C1 8
137 135 y_{1874} 0xFE1034 6
138 137 - 0xC6C1C1 8
139 137 y_{370} 0xFE1034 6
140 139 - 0xC6C1C1 8
141 139 y_{1453} 0xFE1034 6
142 141 - 0xC6C1C1 8
143 141 y_{2756} 0xFE1034 6
144 143 - 0xC6C1C1 8
145 143 y_{545} 0xFE1034 6
146 145 - 0xC6C1C1 8
147 145 y_{36} 0xFE1034 6
148 147 - 0xC6C1C1 8
149 147 y_{2409} 0xFE1034 6
150 149 - 0xC6C1C1 8
151 149 y_{96} 0xFE1034 6
152 151 - 0xC6C1C1 8
153 151 y_{82} 0xFE1034 6
154 153 - 0xC6C1C1 8
155 153 y_{1788} 0xFE1034 6
156 155 - 0xC6C1C1 8
157 155 y_{2812} 0xFE1034 6
158 157 - 0xC6C1C1 8
159 157 y_{10357} 0xFE1034 6
160 159 - 0xC6C1C1 8
161 159 y_{1801} 0xFE1034 6
162 161 - 0xC6C1C1 8
163 161 y_{55} 0xFE1034 6
164 163 - 0xC6C1C1 8
165 163 y_{2868} 0xFE1034 6
166 165 - 0xC6C1C1 8
167 165 y_{453} 0xFE1034 6
168 167 - 0xC6C1C1 8
169 167 y_{31} 0xFE1034 6
170 169 - 0xC6C1C1 8
171 169 y_{1281} 0xFE1034 6
172 171 - 0xC6C1C1 8
173 171 y_{17} 0xFE1034 6
174 173 - 0xC6C1C1 8
175 173 y_{1748} 0xFE1034 6
176 175 - 0xC6C1C1 8
177 175 y_{58} 0xFE1034 6
178 177 - 0xC6C1C1 8
179 177 y_{2420} 0xFE1034 6
180 179 - 0xC6C1C1 8
181 179 y_{7128} 0xFE1034 6
182 181 - 0xC6C1C1 8
183 181 y_{11164} 0xFE1034 6
184 183 - 0xC6C1C1 8
185 183 y_{1820} 0xFE1034 6
186 185 - 0xC6C1C1 8
187 185 y_{1713} 0xFE1034 6
188 187 - 0xC6C1C1 8
189 187 y_{387} 0xFE1034 6
190 189 - 0xC6C1C1 8
191 189 y_{5253} 0xFE1034 6
192 191 - 0xC6C1C1 8
193 191 y_{1699} 0xFE1034 6
194 193 - 0xC6C1C1 8
195 193 - 0xC6C1C1 8

The depth of gnuplot's evaluation stack is capped at at 250 to prevent run-away recursion. In order to increase that you would have to edit the source and recompile the program. If you really want to do that, the relevant definition is here:
[gnuplot-5.2.8/src] grep -n -A 3 -B 3 STACK_DEPTH eval.h
44-
45-#include <stdio.h> /* for FILE* */
46-
47:#define STACK_DEPTH 250 /* maximum size of the execution stack */
48-#define MAX_AT_LEN 150 /* max number of entries in action table */
49-
50-/* These are used by add_action() to index the subroutine list ft[] in eval.c */
I have not looked at your recursion algorithm very closely, but I would think it possible to re-order the evaluation so that the subtree information is computed bottom-up rather than top-down. In that direction it may become purely an iteration rather than a recursive descent.
On the other hand you also say that larger trees don't fit into a single plot. So another approach may be to split the tree at a depth that both fits on the page and doesn't exceed the stack depth. Then you restart the process over again for each node that was truncated, and mark that node with an arrow or annotation or other indication like "subtree continued in figure 1b". Here I have hand-mangled your large figure to show the idea

Related

Gnuplot fit error - singular matrix in Givens()

So I want to fit a function with a dataset using gnuplot. In the file "cn20x2012", at the lines [1:300] I have this data:
1 -7.576723949519277e-06
2 4.738414366971162e-05
3 2.5908117324519247e-05
4 7.233786749999952e-06
5 4.94720225240387e-06
6 -1.857620375000113e-06
7 5.697280584855734e-06
8 -1.867760712716345e-05
9 6.64096591257211e-05
10 2.756199717307687e-05
11 4.7755705550480866e-05
12 6.590865376225963e-05
13 4.1522206877403805e-05
14 3.145294946394234e-05
15 5.9346948090625035e-05
16 5.405458204471163e-05
17 0.0001484469089218749
18 0.00011236895265264405
19 0.00010798644697620197
20 8.656723035552881e-05
21 0.00019917737876442313
22 0.00022625750686778835
23 0.00023183354141658626
24 0.0003373178915148073
25 0.00032313619574999994
26 0.0003451188893915866
27 0.0003303809005983172
28 0.0003534148565745192
29 0.00039690566743750015
30 0.0004182810016802884
31 0.00045198626877403865
32 0.00047311462195192373
33 0.0004962054400408655
34 0.0004969566757524037
35 0.0005561838221274039
36 0.0005353567324539659
37 0.00052834133201923
38 0.0005980226227637016
39 0.0005446277144831731
40 0.0005960780049278846
41 0.0006076488594567314
42 0.000710219997610289
43 0.0006714079307259616
44 0.0006990041531870184
45 0.000694646402266827
46 0.0006910307645889419
47 0.0007918124250492787
48 0.0007699669760728367
49 0.0007850042712259613
50 0.0007735240355776444
51 0.0008333605652980768
52 0.0007914544977620185
53 0.0008254284036610573
54 0.0008578590784536057
55 0.0008597165395913466
56 0.0009350752655120189
57 0.0009355867078822116
58 0.0009413161534519229
59 0.001003045837043269
60 0.0009530084342740383
61 0.000981287851927885
62 0.000986143934318509
63 0.00096895140692548
64 0.0010671633388319713
65 0.0010884129846995196
66 0.0010974424039567304
67 0.0011198829067163459
68 0.0010649422789374995
69 0.0010909547135769227
70 0.0010858300892451934
71 0.00114890178018774
72 0.0011503018930817308
73 0.0012209814370937495
74 0.001264080502711538
75 0.0012453762294132222
76 0.0012725116258625
77 0.0012649334953990384
78 0.0012195748153341352
79 0.0013151443892213466
80 0.0013003322635283651
81 0.0013099768888799042
82 0.0013227992394807694
83 0.0013325137669168274
84 0.001356943212587259
85 0.0014541924819278852
86 0.0014094004314177883
87 0.0014273633669975969
88 0.0014393176087403859
89 0.0014372794673365393
90 0.0015051545220959143
91 0.0015432813234807683
92 0.0015832276965293275
93 0.001540622433288461
94 0.0016007491118125
95 0.0016195978358533654
96 0.0016447077023067317
97 0.0016350138695504803
98 0.0017352804136629807
99 0.001731106189370192
100 0.0017407015898704323
101 0.0017367582300937506
102 0.0018164239404875008
103 0.0017829769448653838
104 0.0018303930988165871
105 0.0017893320000211548
106 0.0018727349292259614
107 0.0018745909637668267
108 0.0018425366172147846
109 0.0019053739892581727
110 0.0018849885474855762
111 0.0018689524590103368
112 0.0019431807910961535
113 0.001951890517350962
114 0.0019308973497776446
115 0.0019990349471177894
116 0.002009245176572116
117 0.0020004240575882213
118 0.002020795320423557
119 0.0020148423748725963
120 0.002070277553975961
121 0.002112121992170673
122 0.002081609846093749
123 0.0020899822853341346
124 0.002214996736841347
125 0.002210968677028846
126 0.002204230691923077
127 0.0022059340675168264
128 0.002244672249610577
129 0.002243725570633895
130 0.002198417606970913
131 0.002326686848007212
132 0.002298981945014423
133 0.002412905193465384
134 0.0023317473012668287
135 0.0023255737818221145
136 0.0024042900543605767
137 0.0023814333208341345
138 0.002414946342495192
139 0.002451134140336538
140 0.002435468088014424
141 0.002541540709086779
142 0.0024759180712812523
143 0.002562872725209133
144 0.002554363054353367
145 0.002525350243064904
146 0.0026228594448966342
147 0.002640361090600963
148 0.0026968734518557683
149 0.002687729582449518
150 0.0026799173813848555
151 0.002751626483175481
152 0.0026916526068317286
153 0.002682602742860577
154 0.0027658840884567304
155 0.0028385319315024035
156 0.002733288245524039
157 0.002805041072350961
158 0.002798724552451201
159 0.00284738398885577
160 0.002833892571264423
161 0.0028506943730673084
162 0.0028578405825413463
163 0.0028141271324870197
164 0.0029047532288887
165 0.002916689246838943
166 0.003006111659274039
167 0.0030388357088942325
168 0.0030117903270181707
169 0.003023639132084136
170 0.0030182642660336535
171 0.0029788478969250015
172 0.003086049268993511
173 0.0030530940010240377
174 0.00309287048297596
175 0.0030892688902187473
176 0.0032070964353437493
177 0.0031308958387163454
178 0.003262165689711538
179 0.0032348496648947093
180 0.003334092027257212
181 0.0032702121678230764
182 0.0032887867663149036
183 0.00333782536743269
184 0.0033132179587812513
185 0.003400563164048078
186 0.003322215536028365
187 0.0033691419445264436
188 0.00340692471343654
189 0.003370118822997599
190 0.003414042435545674
191 0.003460621729710913
192 0.003487680921019232
193 0.0034814484875360595
194 0.003528280852358173
195 0.0035260558732403864
196 0.0035947047098653846
197 0.003583761358336538
198 0.003589446784643749
199 0.0035488957604610572
200 0.0036106514596322115
201 0.003633161542855769
202 0.003596668943564904
203 0.003621647520017789
204 0.0037260161142259616
205 0.0036873544761057684
206 0.003693311409786057
207 0.0037485618958747594
208 0.0037277801700697126
209 0.003731768419286058
210 0.0037200943660144225
211 0.0037368698886754786
212 0.0038266932486634626
213 0.003786905602120193
214 0.0038484308669038464
215 0.003837662506102065
216 0.003877989966946875
217 0.0038711451977908673
218 0.0039796825709810125
219 0.003955763375971154
220 0.003983664920576924
221 0.004019112007471154
222 0.003996646585913461
223 0.004061509550884613
224 0.004015245551199519
225 0.004009779120920672
226 0.004148229009661058
227 0.0040645974335312505
228 0.0041522345293678545
229 0.004216267765944711
230 0.004191517977733654
231 0.004280319721466346
232 0.004210795761447114
233 0.004258393462563462
234 0.004267925011272355
235 0.00427713419340625
236 0.004323331966394231
237 0.004361159201735935
238 0.004351708975694715
239 0.004359997178644953
240 0.00437384325853894
241 0.004375188742463941
242 0.004424559629495192
243 0.004461955226487498
244 0.004489655863850963
245 0.0045503420149230756
246 0.0045185560829999975
247 0.004506067166336778
248 0.004585396025798076
249 0.004530840472406252
250 0.0045934151490120215
251 0.004602146584228363
252 0.004643262102497593
253 0.004707265035608172
254 0.004766505116052884
255 0.004744165929896635
256 0.0047756718030625015
257 0.004802170611427885
258 0.004896239463478368
259 0.0048845448341901425
260 0.004845213594302884
261 0.004915008781204327
262 0.004838528640802884
263 0.0048121374747617796
264 0.004895357859576925
265 0.0048793476575266816
266 0.004958465852682693
267 0.005007965180538941
268 0.0049839032653341345
269 0.005068383734646637
270 0.00498556504900495
271 0.005014623260019232
272 0.005066327855785335
273 0.0050290740743365375
274 0.005152934708140861
275 0.005174238921781968
276 0.005123581464772355
277 0.005155969777822114
278 0.005169396608004327
279 0.00516497090489663
280 0.005145110646115385
281 0.005209611399110575
282 0.005163211771749997
283 0.005181044847507209
284 0.005281641245183894
285 0.005323840847189907
286 0.005230924322329326
287 0.005256136984014422
288 0.005374876757439424
289 0.0053137727444009615
290 0.005468482116127402
291 0.005453857539401205
292 0.005417081656274039
293 0.005393994523838937
294 0.005506909240446873
295 0.005449365350307692
296 0.005551215606367787
297 0.005505932791992786
298 0.0055918512302572145
299 0.005663100163579326
300 0.0056382443690432705
When I do
f(x) = a/b*(1-exp(-b*x))
fit[1:300] f(x) "cn20x2012" using 1:2 via a,b
The curve fits perfectly. But when I try to fit the curve with
a/b*(1-exp(-b*x/(3e-26))
I get the error message. Note that I've only added a constant to the exponential part of the function.
What can I do to fit the function with the constant 3e-26?
I'm using gnuplot 5.2 patchlevel 8 on linux

Adding that constant makes the values of exp(-b*x/(3.e-26) so close to zero that the term (1-exp(-b*x/(3e-26)) differs from 1 by less than the precision available for IEEE double precision floating point numbers. So you are essentially fitting the function g(x) = a/b, which is a very poor fit to your data.
Since you already have a good fit using your original function f(x), perhaps you can explain what your goal is to change the function to something else? What question are you trying to answer?

pandas read_csv not reading entire file

I have a really strange problem and don't know how to solve it.
I am using Ubuntu 18.04.2 together with Python 3.7.3 64-bit and use VScode as an editor.
I am reading data from a database and write it to a csv file with csv.writer
import pandas as pd
import csv
with open(raw_path + station + ".csv", "w+") as f:
file = csv.writer(f)
# Write header into csv
colnames = [par for par in param]
file.writerow(colnames)
# Write data into csv
for row in data:
file.writerow(row)
This works perfectly fine, it provides a .csv file with all the data I read from the database up to the current timestep. However in a later working step I have to read this data to a pandas dataframe and merge it with another pandas dataframe. I read the files like this:
data1 = pd.read_csv(raw_path + file1, sep=',')
data2 = pd.read_csv(raw_path + file2, sep=',')
And then merge the data like this:
comb_data = pd.merge(data1, data2, on="datumsec", how="left").fillna(value=-999)
For 5 out of 6 locations that I do this, everything works perfectly fine, the combined dataset has the same length as the two seperate ones. However for one location pd.read_csv seems not to read the csv files properly. I checked whether the problem is already in the database readout but everything is OK there, I can open both files with sublime and they have the same length, however when I read them with pandas.read_csv one shows less lines. The best part is, this problem is appearing totally random. Sometimes it works and reads the entire file, sometimes not. AND it occures at different locations in the file. Sometimes it stops after approx. 20000 entries, sometimes at 45000, sometimes somewhere else.. just totally random.
Here is an overview of my test output when I print all the lengths of the files
print(len(data1)): 57105
print(len(data2)): 57105
both values directly after read out from database, before writing it anywhere..
After saving the data as csv as described above and opening it in excel or sublime or anything I can confirm that the data contains 57105 rows. Everything is where it is supposed to be.
However if I try to read the data as with pd.read_csv
print(len(data1)): 48612
print(len(data2)): 57105
both values after reading in the data from the csv file
data1 48612
datumsec tl rf ff dd ffx
0 1538352000 46 81 75 288 89
1 1538352600 47 79 78 284 93
2 1538353200 45 82 79 282 93
3 1538353800 44 84 71 284 91
4 1538354400 43 86 77 288 96
5 1538355000 43 85 78 289 91
6 1538355600 46 80 79 286 84
7 1538356200 51 72 68 285 83
8 1538356800 52 71 68 281 73
9 1538357400 48 75 68 276 80
10 1538358000 45 78 62 271 76
11 1538358600 42 82 66 273 76
12 1538359200 43 81 70 274 78
13 1538359800 44 80 68 275 78
14 1538360400 45 78 66 279 72
15 1538361000 45 78 67 282 73
16 1538361600 43 79 63 275 71
17 1538362200 43 81 69 280 74
18 1538362800 42 80 70 281 76
19 1538363400 43 78 69 285 77
20 1538364000 43 78 71 285 77
21 1538364600 44 75 61 288 71
22 1538365200 45 73 56 290 62
23 1538365800 45 72 44 297 57
24 1538366400 44 73 51 286 57
25 1538367000 43 76 61 281 70
26 1538367600 40 79 66 284 73
27 1538368200 39 78 70 291 76
28 1538368800 38 80 71 287 81
29 1538369400 36 81 74 285 81
... ... .. ... .. ... ...
48582 1567738800 7 100 0 210 0
48583 1567739400 6 100 0 210 0
48584 1567740000 5 100 0 210 0
48585 1567740600 6 100 0 210 0
48586 1567741200 4 100 0 210 0
48587 1567741800 4 100 0 210 0
48588 1567742400 5 100 0 210 0
48589 1567743000 4 100 0 210 0
48590 1567743600 4 100 0 210 0
48591 1567744200 4 100 0 209 0
48592 1567744800 4 100 0 209 0
48593 1567745400 5 100 0 210 0
48594 1567746000 6 100 0 210 0
48595 1567746600 5 100 0 210 0
48596 1567747200 5 100 0 210 0
48597 1567747800 5 100 0 210 0
48598 1567748400 5 100 0 210 0
48599 1567749000 6 100 0 210 0
48600 1567749600 6 100 0 210 0
48601 1567750200 5 100 0 210 0
48602 1567750800 4 100 0 210 0
48603 1567751400 5 100 0 210 0
48604 1567752000 6 100 0 210 0
48605 1567752600 7 100 0 210 0
48606 1567753200 6 100 0 210 0
48607 1567753800 5 100 0 210 0
48608 1567754400 6 100 0 210 0
48609 1567755000 7 100 0 210 0
48610 1567755600 7 100 0 210 0
48611 1567756200 7 100 0 210 0
[48612 rows x 6 columns]
datumsec tl rf schnee ival6
0 1538352000 115 61 25 107
1 1538352600 115 61 25 107
2 1538353200 115 61 25 107
3 1538353800 115 61 25 107
4 1538354400 115 61 25 107
5 1538355000 115 61 25 107
6 1538355600 115 61 25 107
7 1538356200 115 61 25 107
8 1538356800 115 61 25 107
9 1538357400 115 61 25 107
10 1538358000 115 61 25 107
11 1538358600 115 61 25 107
12 1538359200 115 61 25 107
13 1538359800 115 61 25 107
14 1538360400 115 61 25 107
15 1538361000 115 61 25 107
16 1538361600 115 61 25 107
17 1538362200 115 61 25 107
18 1538362800 115 61 25 107
19 1538363400 115 61 25 107
20 1538364000 115 61 25 107
21 1538364600 115 61 25 107
22 1538365200 115 61 25 107
23 1538365800 115 61 25 107
24 1538366400 115 61 25 107
25 1538367000 115 61 25 107
26 1538367600 115 61 25 107
27 1538368200 115 61 25 107
28 1538368800 115 61 25 107
29 1538369400 115 61 25 107
... ... ... ... ... ...
57075 1572947400 -23 100 -2 -999
57076 1572948000 -23 100 -2 -999
57077 1572948600 -22 100 -2 -999
57078 1572949200 -23 100 -2 -999
57079 1572949800 -24 100 -2 -999
57080 1572950400 -23 100 -2 -999
57081 1572951000 -21 100 -1 -999
57082 1572951600 -21 100 -1 -999
57083 1572952200 -23 100 -1 -999
57084 1572952800 -23 100 -1 -999
57085 1572953400 -22 100 -1 -999
57086 1572954000 -23 100 -1 -999
57087 1572954600 -22 100 -1 -999
57088 1572955200 -24 100 0 -999
57089 1572955800 -24 100 0 -999
57090 1572956400 -25 100 0 -999
57091 1572957000 -26 100 -1 -999
57092 1572957600 -26 100 -1 -999
57093 1572958200 -27 100 -1 -999
57094 1572958800 -25 100 -1 -999
57095 1572959400 -27 100 -1 -999
57096 1572960000 -29 100 -1 -999
57097 1572960600 -28 100 -1 -999
57098 1572961200 -28 100 -1 -999
57099 1572961800 -27 100 -1 -999
57100 1572962400 -29 100 -2 -999
57101 1572963000 -29 100 -2 -999
57102 1572963600 -29 100 -2 -999
57103 1572964200 -30 100 -2 -999
57104 1572964800 -28 100 -2 -999
[57105 rows x 5 columns]
To me there is no obvious reason in the data why it should have problems reading the entire file and obviously there are none, considering that sometimes it reads the entire file and sometimes not.
I am really clueless about this. Do you have any idea how to cope with that and what could be the problem?

I finally solved my problem and as expected it was not within the file itself. I am using multiprocesses to run the named functions and some other things in parallel. The reading from database + writing to csv file and reading from csv file are performed in two different processes. Therefore the second process (reading from csv) did not know that the csv file was still being written and read only what was already available in the csv file. Because the file was opened by a different process it did not throw an exception when being opened.
I thought I already took care of this but obviously not thoroughly enough, excluding every possible case.

I had completely the same problem with a different application and also did not understand what was wrong, because sometimes it worked and sometimes it didn't.
In a for loop, I was extracting the last two rows of a dataframe that I was creating in the same file. Sometimes, the extracted rows where not the last two at all, but most of the times it worked fine. I guess the program started extracting the last two rows before the writing process was done.
I paused the script for half a second to make sure the writing process is done:
import time
time.sleep(0.5)
However, I don't think this is not a very elegant solution, since it might not be sufficient if somebody with a slower computer uses the script for instance.
Vroni, how did you solve this in the end, is there a way to define that a specific process must not be processed parallel with other tasks. I did not define anything about parallel processing in my program, so I think if this is the cause it is done automatically.

group-by values obtained from splitting indexes

I need to find the max of two columns (p_1_logreg, p_2_logreg) where the comparison should be limited only to 14 rows.
My csv file
I tried to slice my index into:
int1_str1_str2_int2_str3_int4
The max should be found between rows where int1, str1, str2 int2 and str3 are fixed, and only the int4 would change (from index 0 to index 13, and so on).
I tried to fix each element at a time and use groupby, but I couldn't iterate over int4 value only.
Here is the code to find the max for column p_1_label, but the result is not what I am looking for.
max_1_row=raw_prob.loc[raw_prob.groupby(raw_prob['id'].str.split('_').str[1])['p_1_'+label].idxmax()]
max_1_row=max_1_row.loc[raw_prob.groupby(raw_prob['id'].str.split('_').str[3])['p_1_'+label].idxmax()]
max_1_row=max_1_row.loc[raw_prob.groupby(raw_prob['id'].str.split('_').str[5])['p_1_'+label].idxmax()]
Any ideas?

I think you need DataFrameGroupBy.idxmax by replaced last _ with empty string and then select by loc:
df = pd.read_csv('myProb.csv', index_col=[0])
idx = df.drop('id', 1).groupby(df['id'].str.replace('_\d+$', '')).idxmax()
print (idx.head(15))
p_0_logreg p_1_logreg p_2_logreg
id
6_PanaCleanerJune_sub_12_ICA 2 9 6
6_PanaCleanerJune_sub_13_ICA 17 19 23
6_PanaCleanerJune_sub_14_ICA 34 37 33
6_PanaCleanerJune_sub_15_ICA 52 51 43
6_PanaCleanerJune_sub_17_ICA 66 67 69
6_PanaCleanerJune_sub_18_ICA 82 79 76
6_PanaCleanerJune_sub_19_ICA 89 87 90
6_PanaCleanerJune_sub_20_ICA 98 103 104
6_PanaCleanerJune_sub_21_ICA 114 117 112
6_PanaCleanerJune_sub_22_ICA 129 133 127
6_PanaCleanerJune_sub_23_ICA 145 146 143
6_PanaCleanerJune_sub_24_ICA 155 166 161
6_PanaCleanerJune_sub_25_ICA 176 173 174
6_PanaCleanerJune_sub_26_ICA 186 191 189
6_PanaCleanerJune_sub_27_ICA 202 203 209
df1 = df.loc[idx['p_1_logreg']]
print (df1.head(15))
id p_0_logreg p_1_logreg p_2_logreg
9 6_PanaCleanerJune_sub_12_ICA_10 0.013452 0.985195 0.001353
19 6_PanaCleanerJune_sub_13_ICA_6 0.051184 0.948816 0.000000
37 6_PanaCleanerJune_sub_14_ICA_10 0.013758 0.979351 0.006890
51 6_PanaCleanerJune_sub_15_ICA_10 0.076056 0.923944 0.000000
67 6_PanaCleanerJune_sub_17_ICA_12 0.051060 0.947660 0.001280
79 6_PanaCleanerJune_sub_18_ICA_10 0.051184 0.948816 0.000000
87 6_PanaCleanerJune_sub_19_ICA_4 0.078162 0.917751 0.004087
103 6_PanaCleanerJune_sub_20_ICA_6 0.076400 0.921263 0.002337
117 6_PanaCleanerJune_sub_21_ICA_6 0.155002 0.791753 0.053245
133 6_PanaCleanerJune_sub_22_ICA_8 0.000000 0.998623 0.001377
146 6_PanaCleanerJune_sub_23_ICA_7 0.017549 0.973995 0.008457
166 6_PanaCleanerJune_sub_24_ICA_13 0.025215 0.974785 0.000000
173 6_PanaCleanerJune_sub_25_ICA_6 0.025656 0.960220 0.014124
191 6_PanaCleanerJune_sub_26_ICA_10 0.098872 0.895526 0.005602
203 6_PanaCleanerJune_sub_27_ICA_8 0.066493 0.932470 0.001037
df2 = df.loc[idx['p_2_logreg']]
print (df2.head(15))
id p_0_logreg p_1_logreg p_2_logreg
6 6_PanaCleanerJune_sub_12_ICA_7 0.000000 0.000351 0.999649
23 6_PanaCleanerJune_sub_13_ICA_10 0.000000 0.000351 0.999649
33 6_PanaCleanerJune_sub_14_ICA_6 0.080748 0.000352 0.918900
43 6_PanaCleanerJune_sub_15_ICA_2 0.017643 0.000360 0.981996
69 6_PanaCleanerJune_sub_17_ICA_14 0.882449 0.000290 0.117261
76 6_PanaCleanerJune_sub_18_ICA_7 0.010929 0.000360 0.988711
90 6_PanaCleanerJune_sub_19_ICA_7 0.010929 0.000351 0.988720
104 6_PanaCleanerJune_sub_20_ICA_7 0.006714 0.000360 0.992925
112 6_PanaCleanerJune_sub_21_ICA_1 0.869393 0.000339 0.130269
127 6_PanaCleanerJune_sub_22_ICA_2 0.000000 0.000351 0.999649
143 6_PanaCleanerJune_sub_23_ICA_4 0.017218 0.000360 0.982421
161 6_PanaCleanerJune_sub_24_ICA_8 0.369685 0.000712 0.629603
174 6_PanaCleanerJune_sub_25_ICA_7 0.307056 0.000496 0.692448
189 6_PanaCleanerJune_sub_26_ICA_8 0.850195 0.000368 0.149437
209 6_PanaCleanerJune_sub_27_ICA_14 0.000000 0.000351 0.999649
Detail:
print (df['id'].str.replace('_\d+$', '').head(15))
0 6_PanaCleanerJune_sub_12_ICA
1 6_PanaCleanerJune_sub_12_ICA
2 6_PanaCleanerJune_sub_12_ICA
3 6_PanaCleanerJune_sub_12_ICA
4 6_PanaCleanerJune_sub_12_ICA
5 6_PanaCleanerJune_sub_12_ICA
6 6_PanaCleanerJune_sub_12_ICA
7 6_PanaCleanerJune_sub_12_ICA
8 6_PanaCleanerJune_sub_12_ICA
9 6_PanaCleanerJune_sub_12_ICA
10 6_PanaCleanerJune_sub_12_ICA
11 6_PanaCleanerJune_sub_12_ICA
12 6_PanaCleanerJune_sub_12_ICA
13 6_PanaCleanerJune_sub_12_ICA
14 6_PanaCleanerJune_sub_13_ICA
Name: id, dtype: object

Gnuplot log plot y-axis

I have have a .txt file with some values as a function of iteration count. And I am trying to log plot it. I have managed to do this with the following code plot 'solchange.txt' using 1:(log($2)) with lines
The x axis is perfect and the shape is perfect, but my y axis is weird. I want it to be say 10^-2 and 10^-3 and so on how can this be done?
What does -16 even mean? My value stops at 10^-7
solchange.txt
1 0.20870164249629861
2 3.0540936828943599E-002
3 2.1622388854567132E-002
4 1.7070994407582529E-002
5 1.4155375579083168E-002
6 1.2069370098131457E-002
7 1.0482626276465484E-002
8 9.2258609277672127E-003
9 8.2010529631910967E-003
10 7.3466561929682317E-003
11 6.6216556909214075E-003
12 5.9973025525987822E-003
13 5.4526144028317000E-003
14 4.9718850942694140E-003
15 4.5432279303643033E-003
16 4.1576291151408026E-003
17 3.8082604242292567E-003
18 3.4899438987894341E-003
19 3.1987266873885617E-003
20 2.9315478643644408E-003
21 2.6859845807917955E-003
22 2.4600648490906499E-003
23 2.2521338021345080E-003
24 2.0607609516045851E-003
25 1.8846776035151558E-003
26 1.7227356558349102E-003
27 1.5738810584753488E-003
28 1.4371370123238449E-003
29 1.3115934299711522E-003
30 1.1964002798033561E-003
31 1.0907632352794312E-003
32 9.9394061400687704E-004
33 9.0524097544450455E-004
34 8.2402100116123200E-004
35 7.4968344624489966E-004
36 6.8167505353953529E-004
37 6.1948438470904935E-004
38 5.6263955830880997E-004
39 5.1070590513836635E-004
40 4.6328356186664012E-004
41 4.2000502958110253E-004
42 3.8053272709547871E-004
43 3.4455657088288370E-004
44 3.1179161480603731E-004
45 2.8197578322736866E-004
46 2.5486773011298286E-004
47 2.3024485386036100E-004
48 2.0790149243582034E-004
49 1.8764731601648640E-004
50 1.6930592515617369E-004
51 1.5271365239171797E-004
52 1.3771855529436799E-004
53 1.2417958039480804E-004
54 1.1196587112561468E-004
55 1.0095618950593192E-004
56 9.1038420860557225E-005
57 8.2109133101636398E-005
58 7.4073166362883126E-005
59 6.6843234255131376E-005
60 6.0339523889261799E-005
61 5.4489287395236962E-005
62 4.9226422450250433E-005
63 4.4491043040285547E-005
64 4.0229044229763214E-005
65 3.6391666180476002E-005
66 3.2935063208632334E-005
67 2.9819883516037614E-005
68 2.7010864597994382E-005
69 2.4476448415957416E-005
70 2.2188419389643915E-005
71 2.0121567231942521E-005
72 1.8253375701611941E-005
73 1.6563737530794070E-005
74 1.5034695117064744E-005
75 1.3650206056065907E-005
76 1.2395932219765905E-005
77 1.1259050839256858E-005
78 1.0228085910393529E-005
79 9.2927581834059692E-006
80 8.4438520049430650E-006
81 7.6730973352498108E-006
82 6.9730653504927742E-006
83 6.3370761471119759E-006
84 5.7591171842940984E-006
85 5.2337712233988323E-006
86 4.7561526453150822E-006
87 4.3218511441380815E-006
88 3.9268819066553586E-006
89 3.5676414890935086E-006
90 3.2408686962536598E-006
91 2.9436098531714777E-006
92 2.6731879338693330E-006
93 2.4271750801384794E-006
94 2.2033681015580162E-006
95 1.9997666008135691E-006
96 1.8145534130165238E-006
97 1.6460770886038423E-006
98 1.4928361827763054E-006
99 1.3534651455843205E-006
100 1.2267216317977253E-006
101 1.1114750729309016E-006
102 1.0066963732594143E-006
103 9.1144860808961906E-007
104 8.2487861805891419E-007
105 7.4620940497668528E-007
106 6.7473324671810456E-007
107 6.0980545750098300E-007
108 5.5083872884395882E-007
109 4.9729799312444450E-007
110 4.4869575892107771E-007
111 4.0458787177585429E-007
112 3.6456965996078949E-007
113 3.2827242845258689E-007
114 2.9536026832557155E-007
115 2.6552715221570336E-007
116 2.3849428917172011E-007
117 2.1400771520012352E-007
118 1.9183609798172768E-007
119 1.7176873612963911E-007
120 1.5361373552607444E-007
121 1.3719634701231387E-007
122 1.2235745044898369E-007
123 1.0895217281928909E-007

log(x) is natural log. You need to use log10(x) if you want base 10.
Another, probably better way would be to use a logarithmic y axis like so:
set format y '%g'
set logscale y
plot 'solchange.txt' using 1:2 with lines
Use help set format to figure out how to change the y-axis tics.

Have one query regarding sum if formula

I am working in excel using SUMIF formula, my data is as follows:
Region Opr Qty Cost Combo(col B&A)
192 114 50 500 104192
192 104 453 548 104192
192 114 125 54654 114192
192 114 155 1545 114192
192 124 12 1553 124192
192 134 12222 1554545 134192
192 174 256 15478 174192
192 104 12 1555 104192
192 104 210 1156 104192
192 114 47 448953 114192
192 114 29 59479 114192
192 124 124 32451 124192
192 134 114 290240 134192
4192 10 210 115656 104192
4192 10 47 44896 104192
4192 11 29 12866 114192
4192 11 549 290240 114192
4192 12 124 59480 124192
4192 13 114 61343 134192
4192 17 310 45339 174192
4192 10 56 32451 104192
4192 10 103 82483 104192
4192 11 685 111380 114192
4192 11 646 201858 114192
4192 12 26 6489 124192
4192 13 87 44543 134192
If you see the last column it's giving same combination result but the operator and region are not always the same. I want to do SUMIF against Region which is throwing wrong values.

You can try SUMPRODUCT:
=SUMPRODUCT(((B2:B27&A2:A27)*1<>E2:E27)*1)
If the concatenation of column B to A is not equal to the Combo, count as 1, then add all the 1 together in SUMPRODUCT.
Change the range accordingly.
The *1 convert any text to number.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Rescaling the plot of a tree with gnuplot - gnuplot

Related

Gnuplot fit error - singular matrix in Givens()

pandas read_csv not reading entire file

group-by values obtained from splitting indexes

Gnuplot log plot y-axis

Have one query regarding sum if formula

Categories

Resources