How do I compare values in two dataframe in an efficient way - python-3.x
df1
df2
I am new with python, pandas and Stack Overflow, so I will appreciate any help. I have two panda dataframes, the first one is in ascending order(values from 0 to 100 in steps of 0.1), the second one has 26000 values from 2.3 to 38.5, in no order, some values are also repeated in that dataframe. What I am trying to do is, for each value in the first dataframe, find how many values in the second dataframe are less than or equal to that value in an efficient way.
My code below does it in 45 seconds, but I'd like it to be done in around 10.
Thanks in advance:
Code:
def get_CDF2(df1, df2):
x=df1 #The first dataframe is already sorted in ascending order
y = np.sort(df2, axis=0) #Sort the columns of the second dataframe in ascending order
df_res = [] # keep the results here
yi = iter(y) # Use of an iterator to move over y
yindex = 0
flag = 0 #Flag, when set to 1 no comparison is done
y_val = next(yi)
for value in x:
if flag >=1:
df_res.append(largest_ind)#append the number of y_val smaller than value
#yindex+1
else:
# Search through y to find the index of an item bigger than value
while (y_val) <= (value) and yindex < len(y)-1:
y_val= next(yi) #Point at the next value in df2
yindex += 1 #Keep track of how many y_val are smaller than value
'''if for any value in df1 we iterate through the entire df2 and they are all less, that means
the rest of values in df1 will have the same effect since df1 is in ascending other, so no need to iterate again,
just set flag to 1'''
if ((yindex==len(y)-1)) and ((y_val <= float(value))):
flag=1
largest_ind=yindex+1
df_res.append(largest_ind)#append the number of y_val smaller than value
else:
df_res.append(yindex) #append the number of y_val smaller than value
return df_res
df1:
0. , 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8,
0.9, 1. , 1.1, 1.2, 1.3, 1.4, 1.5, 1.6, 1.7,
1.8, 1.9, 2. , 2.1, 2.2, 2.3, 2.4, 2.5, 2.6,
2.7, 2.8, 2.9, 3. , 3.1, 3.2, 3.3, 3.4, 3.5,
3.6, 3.7, 3.8, 3.9, 4. , 4.1, 4.2, 4.3, 4.4,
4.5, 4.6, 4.7, 4.8, 4.9, 5. , 5.1, 5.2, 5.3,
5.4, 5.5, 5.6, 5.7, 5.8, 5.9, 6. , 6.1, 6.2,
6.3, 6.4, 6.5, 6.6, 6.7, 6.8, 6.9, 7. , 7.1,
7.2, 7.3, 7.4, 7.5, 7.6, 7.7, 7.8, 7.9, 8. ,
8.1, 8.2, 8.3, 8.4, 8.5, 8.6, 8.7, 8.8, 8.9,
9. , 9.1, 9.2, 9.3, 9.4, 9.5, 9.6, 9.7, 9.8,
9.9, 10. , 10.1, 10.2, 10.3, 10.4, 10.5, 10.6, 10.7,
10.8, 10.9, 11. , 11.1, 11.2, 11.3, 11.4, 11.5, 11.6,
11.7, 11.8, 11.9, 12. , 12.1, 12.2, 12.3, 12.4, 12.5,
12.6, 12.7, 12.8, 12.9, 13. , 13.1, 13.2, 13.3, 13.4,
13.5, 13.6, 13.7, 13.8, 13.9, 14. , 14.1, 14.2, 14.3,
14.4, 14.5, 14.6, 14.7, 14.8, 14.9, 15. , 15.1, 15.2,
15.3, 15.4, 15.5, 15.6, 15.7, 15.8, 15.9, 16. , 16.1,
16.2, 16.3, 16.4, 16.5, 16.6, 16.7, 16.8, 16.9, 17. ,
17.1, 17.2, 17.3, 17.4, 17.5, 17.6, 17.7, 17.8, 17.9,
18. , 18.1, 18.2, 18.3, 18.4, 18.5, 18.6, 18.7, 18.8,
18.9, 19. , 19.1, 19.2, 19.3, 19.4, 19.5, 19.6, 19.7,
19.8, 19.9, 20. , 20.1, 20.2, 20.3, 20.4, 20.5, 20.6,
20.7, 20.8, 20.9, 21. , 21.1, 21.2, 21.3, 21.4, 21.5,
21.6, 21.7, 21.8, 21.9, 22. , 22.1, 22.2, 22.3, 22.4,
22.5, 22.6, 22.7, 22.8, 22.9, 23. , 23.1, 23.2, 23.3,
23.4, 23.5, 23.6, 23.7, 23.8, 23.9, 24. , 24.1, 24.2,
24.3, 24.4, 24.5, 24.6, 24.7, 24.8, 24.9, 25. , 25.1,
25.2, 25.3, 25.4, 25.5, 25.6, 25.7, 25.8, 25.9, 26. ,
26.1, 26.2, 26.3, 26.4, 26.5, 26.6, 26.7, 26.8, 26.9,
27. , 27.1, 27.2, 27.3, 27.4, 27.5, 27.6, 27.7, 27.8,
27.9, 28. , 28.1, 28.2, 28.3, 28.4, 28.5, 28.6, 28.7,
28.8, 28.9, 29. , 29.1, 29.2, 29.3, 29.4, 29.5, 29.6
df2:
0 12.993
1 12.054
2 21.957
3 10.917
4 33.890
5 10.597
6 22.911
7 7.431
8 10.437
9 19.165
10 12.169
11 14.847
12 10.093
13 10.795
14 14.419
15 27.199
16 15.045
17 12.764
18 7.766
19 18.066
20 10.254
21 16.922
22 7.011
23 10.322
24 11.619
25 25.719
26 18.142
27 14.557
28 26.367
29 13.443
30 17.318
31 10.971
32 6.073
33 20.050
34 11.863
35 25.619
36 18.326
37 30.830
38 13.130
39 11.734
40 14.457
41 22.659
42 16.479
43 17.845
44 23.712
45 16.670
46 10.322
47 16.250
48 20.920
49 17.479
50 15.526
51 15.732
52 19.836
53 10.513
54 24.818
55 10.933
56 14.785
57 25.253
58 15.732
59 14.290
60 23.979
61 24.788
62 12.420
63 21.324
64 9.658
65 24.307
66 17.601
67 12.352
68 18.089
69 23.353
70 12.718
71 18.707
72 9.147
73 17.494
74 8.743
75 22.407
76 16.227
77 15.396
78 16.807
79 26.733
80 14.084
81 19.516
82 15.106
83 21.187
84 13.008
85 13.618
86 16.266
87 19.706
88 6.591
89 14.999
90 16.449
91 18.883
92 15.243
93 15.976
94 18.242
95 16.662
96 6.691
97 16.952
98 25.940
99 23.018
100 29.365
101 14.564
102 15.625
103 9.727
104 7.652
105 12.726
106 7.263
107 19.943
108 17.540
109 7.469
110 10.360
111 17.898
112 20.393
113 7.011
114 15.999
115 12.985
116 16.624
117 18.753
118 12.520
119 13.488
120 17.959
121 16.433
122 14.518
123 12.909
124 19.752
125 9.277
126 25.566
127 19.272
128 10.360
129 22.148
130 20.294
131 18.402
132 17.631
133 17.341
134 13.672
135 19.600
136 20.653
137 15.999
138 15.480
139 30.655
140 15.426
141 16.067
142 29.838
143 13.099
144 12.184
145 15.693
146 26.031
147 16.052
148 8.087
149 16.754
150 17.029
151 16.601
152 9.956
153 20.363
154 11.215
155 15.106
156 13.809
157 23.178
158 21.484
159 13.359
160 31.860
161 14.564
162 19.737
163 19.424
164 29.556
165 15.678
166 22.148
167 28.389
168 21.309
169 22.262
170 11.314
171 8.018
172 24.551
173 14.740
174 15.716
175 24.269
176 20.042
177 15.968
178 11.337
179 27.618
180 22.522
181 19.066
182 9.323
183 20.622
184 13.092
185 15.464
186 21.171
187 11.604
188 19.050
189 15.823
190 33.859
191 15.106
192 13.549
193 17.296
194 13.740
195 12.054
196 10.955
197 21.164
198 14.427
199 9.719
200 12.176
201 9.742
202 21.278
203 20.515
204 18.265
205 9.666
206 13.870
207 15.968
208 13.313
209 16.517
210 18.417
211 15.419
212 20.523
213 15.655
214 26.977
215 13.084
216 31.349
217 29.854
218 13.008
219 11.306
220 22.384
221 20.798
222 17.433
223 12.916
224 11.284
225 20.248
226 9.803
227 10.376
228 9.315
229 14.976
230 16.327
231 9.590
232 16.830
233 23.979
234 11.558
235 13.183
236 18.776
237 20.416
238 9.163
239 10.345
240 28.252
241 22.888
242 20.538
243 6.912
244 24.040
245 8.682
246 31.929
247 14.908
248 19.195
249 17.112
250 18.379
251 15.869
252 13.794
253 14.129
254 12.458
255 10.795
256 25.291
257 26.382
258 20.881
Try this. It will add a column called check to df1. The column will contain the count of the values in df2 that are <= each value in df1.
df1['check'] = df1[0].apply(lambda x: df2[df2[0] <= x].size)
You may have to replace the [0] with the names of the first column in your data frames.
Related
Retaining bad_lines identified by pandas in the output file instead of skipping those lines
I have to convert text files into csv's after processing the contents of the text file as pandas dataframe. Below is the code i am using. out_txt is my input text file and out_csv is my output csv file. df = pd.read_csv(out_txt, sep='\s', header=None, on_bad_lines='warn', encoding = "ANSI") df = df.replace(r'[^\w\s]|_]/()|~"{}="', '', regex=True) df.to_csv(out_csv, header=None) If "on_bad_lines = 'warn'" is not decalred the csv files are not created. But if i use this condition those bad lines are getting skipped (obviously) with the warning Skipping line 6: Expected 8 fields in line 7, saw 9. Error could possibly be due to quotes being ignored when a multi-char delimiter is used. I would like to retain these bad lines in the csv. I have highlighted the bad lines detected in the below image (my input text file). Below is the contents of the text file which is getting saved. In this content i would like to remove characters like #, &, (, ). 75062 220 8 6 110 220 250 <1 75063 260 5 2 584 878 950 <1 75064 810 <2 <2 456 598 3700 <1 75065 115 5 2 96 74 5000 <1 75066 976 <5 2 5 68 4200 <1 75067 22 210 4 348 140 4050 <1 75068 674 5 4 - 54 1130 3850 <1 75069 414 5 y) 446 6.6% 2350 <1 75070 458 <5 <2 548 82 3100 <1 75071 4050 <5 2 780 6430 3150 <1 75072 115 <7 <1 64 5.8% 4050 °#&4«x<i1 75073 456 <7 4 46 44 3900 <1 75074 376 <7 <2 348 3.8% 2150 <1 75075 378 <6 y) 30 40 2000 <1
I would split on \s later with str.split rather than read_csv : df = ( pd.read_csv(out_txt, header=None, encoding='ANSI') .replace(r'[^\w\s]|_]/()|~"{}="', '', regex=True) .squeeze().str.split(expand=True) ) Another variant (skipping everything that comes in-between the numbers): df = ( pd.read_csv(out_txt, header=None, encoding='ANSI') [0].str.findall(r"\b(\d+)\b")) .str.split(expand=True) ) Output : print(df) 0 1 2 3 4 5 6 7 0 375020 1060 115 38 440 350 7800 1 1 375021 920 80 26 310 290 5000 1 2 375022 1240 110 28 460 430 5900 1 3 375023 830 150 80 650 860 6200 1 4 375024 185 175 96 800 1020 2400 1 5 375025 680 370 88 1700 1220 172 1 6 375026 550 290 72 2250 1460 835 2 7 375027 390 120 60 1620 1240 158 1 8 375028 630 180 76 820 1360 180 1 9 375029 460 280 66 380 790 3600 1 10 375030 660 260 62 11180 1040 300 1 11 375031 530 200 84 1360 1060 555 1
Gnuplot fit error - singular matrix in Givens()
So I want to fit a function with a dataset using gnuplot. In the file "cn20x2012", at the lines [1:300] I have this data: 1 -7.576723949519277e-06 2 4.738414366971162e-05 3 2.5908117324519247e-05 4 7.233786749999952e-06 5 4.94720225240387e-06 6 -1.857620375000113e-06 7 5.697280584855734e-06 8 -1.867760712716345e-05 9 6.64096591257211e-05 10 2.756199717307687e-05 11 4.7755705550480866e-05 12 6.590865376225963e-05 13 4.1522206877403805e-05 14 3.145294946394234e-05 15 5.9346948090625035e-05 16 5.405458204471163e-05 17 0.0001484469089218749 18 0.00011236895265264405 19 0.00010798644697620197 20 8.656723035552881e-05 21 0.00019917737876442313 22 0.00022625750686778835 23 0.00023183354141658626 24 0.0003373178915148073 25 0.00032313619574999994 26 0.0003451188893915866 27 0.0003303809005983172 28 0.0003534148565745192 29 0.00039690566743750015 30 0.0004182810016802884 31 0.00045198626877403865 32 0.00047311462195192373 33 0.0004962054400408655 34 0.0004969566757524037 35 0.0005561838221274039 36 0.0005353567324539659 37 0.00052834133201923 38 0.0005980226227637016 39 0.0005446277144831731 40 0.0005960780049278846 41 0.0006076488594567314 42 0.000710219997610289 43 0.0006714079307259616 44 0.0006990041531870184 45 0.000694646402266827 46 0.0006910307645889419 47 0.0007918124250492787 48 0.0007699669760728367 49 0.0007850042712259613 50 0.0007735240355776444 51 0.0008333605652980768 52 0.0007914544977620185 53 0.0008254284036610573 54 0.0008578590784536057 55 0.0008597165395913466 56 0.0009350752655120189 57 0.0009355867078822116 58 0.0009413161534519229 59 0.001003045837043269 60 0.0009530084342740383 61 0.000981287851927885 62 0.000986143934318509 63 0.00096895140692548 64 0.0010671633388319713 65 0.0010884129846995196 66 0.0010974424039567304 67 0.0011198829067163459 68 0.0010649422789374995 69 0.0010909547135769227 70 0.0010858300892451934 71 0.00114890178018774 72 0.0011503018930817308 73 0.0012209814370937495 74 0.001264080502711538 75 0.0012453762294132222 76 0.0012725116258625 77 0.0012649334953990384 78 0.0012195748153341352 79 0.0013151443892213466 80 0.0013003322635283651 81 0.0013099768888799042 82 0.0013227992394807694 83 0.0013325137669168274 84 0.001356943212587259 85 0.0014541924819278852 86 0.0014094004314177883 87 0.0014273633669975969 88 0.0014393176087403859 89 0.0014372794673365393 90 0.0015051545220959143 91 0.0015432813234807683 92 0.0015832276965293275 93 0.001540622433288461 94 0.0016007491118125 95 0.0016195978358533654 96 0.0016447077023067317 97 0.0016350138695504803 98 0.0017352804136629807 99 0.001731106189370192 100 0.0017407015898704323 101 0.0017367582300937506 102 0.0018164239404875008 103 0.0017829769448653838 104 0.0018303930988165871 105 0.0017893320000211548 106 0.0018727349292259614 107 0.0018745909637668267 108 0.0018425366172147846 109 0.0019053739892581727 110 0.0018849885474855762 111 0.0018689524590103368 112 0.0019431807910961535 113 0.001951890517350962 114 0.0019308973497776446 115 0.0019990349471177894 116 0.002009245176572116 117 0.0020004240575882213 118 0.002020795320423557 119 0.0020148423748725963 120 0.002070277553975961 121 0.002112121992170673 122 0.002081609846093749 123 0.0020899822853341346 124 0.002214996736841347 125 0.002210968677028846 126 0.002204230691923077 127 0.0022059340675168264 128 0.002244672249610577 129 0.002243725570633895 130 0.002198417606970913 131 0.002326686848007212 132 0.002298981945014423 133 0.002412905193465384 134 0.0023317473012668287 135 0.0023255737818221145 136 0.0024042900543605767 137 0.0023814333208341345 138 0.002414946342495192 139 0.002451134140336538 140 0.002435468088014424 141 0.002541540709086779 142 0.0024759180712812523 143 0.002562872725209133 144 0.002554363054353367 145 0.002525350243064904 146 0.0026228594448966342 147 0.002640361090600963 148 0.0026968734518557683 149 0.002687729582449518 150 0.0026799173813848555 151 0.002751626483175481 152 0.0026916526068317286 153 0.002682602742860577 154 0.0027658840884567304 155 0.0028385319315024035 156 0.002733288245524039 157 0.002805041072350961 158 0.002798724552451201 159 0.00284738398885577 160 0.002833892571264423 161 0.0028506943730673084 162 0.0028578405825413463 163 0.0028141271324870197 164 0.0029047532288887 165 0.002916689246838943 166 0.003006111659274039 167 0.0030388357088942325 168 0.0030117903270181707 169 0.003023639132084136 170 0.0030182642660336535 171 0.0029788478969250015 172 0.003086049268993511 173 0.0030530940010240377 174 0.00309287048297596 175 0.0030892688902187473 176 0.0032070964353437493 177 0.0031308958387163454 178 0.003262165689711538 179 0.0032348496648947093 180 0.003334092027257212 181 0.0032702121678230764 182 0.0032887867663149036 183 0.00333782536743269 184 0.0033132179587812513 185 0.003400563164048078 186 0.003322215536028365 187 0.0033691419445264436 188 0.00340692471343654 189 0.003370118822997599 190 0.003414042435545674 191 0.003460621729710913 192 0.003487680921019232 193 0.0034814484875360595 194 0.003528280852358173 195 0.0035260558732403864 196 0.0035947047098653846 197 0.003583761358336538 198 0.003589446784643749 199 0.0035488957604610572 200 0.0036106514596322115 201 0.003633161542855769 202 0.003596668943564904 203 0.003621647520017789 204 0.0037260161142259616 205 0.0036873544761057684 206 0.003693311409786057 207 0.0037485618958747594 208 0.0037277801700697126 209 0.003731768419286058 210 0.0037200943660144225 211 0.0037368698886754786 212 0.0038266932486634626 213 0.003786905602120193 214 0.0038484308669038464 215 0.003837662506102065 216 0.003877989966946875 217 0.0038711451977908673 218 0.0039796825709810125 219 0.003955763375971154 220 0.003983664920576924 221 0.004019112007471154 222 0.003996646585913461 223 0.004061509550884613 224 0.004015245551199519 225 0.004009779120920672 226 0.004148229009661058 227 0.0040645974335312505 228 0.0041522345293678545 229 0.004216267765944711 230 0.004191517977733654 231 0.004280319721466346 232 0.004210795761447114 233 0.004258393462563462 234 0.004267925011272355 235 0.00427713419340625 236 0.004323331966394231 237 0.004361159201735935 238 0.004351708975694715 239 0.004359997178644953 240 0.00437384325853894 241 0.004375188742463941 242 0.004424559629495192 243 0.004461955226487498 244 0.004489655863850963 245 0.0045503420149230756 246 0.0045185560829999975 247 0.004506067166336778 248 0.004585396025798076 249 0.004530840472406252 250 0.0045934151490120215 251 0.004602146584228363 252 0.004643262102497593 253 0.004707265035608172 254 0.004766505116052884 255 0.004744165929896635 256 0.0047756718030625015 257 0.004802170611427885 258 0.004896239463478368 259 0.0048845448341901425 260 0.004845213594302884 261 0.004915008781204327 262 0.004838528640802884 263 0.0048121374747617796 264 0.004895357859576925 265 0.0048793476575266816 266 0.004958465852682693 267 0.005007965180538941 268 0.0049839032653341345 269 0.005068383734646637 270 0.00498556504900495 271 0.005014623260019232 272 0.005066327855785335 273 0.0050290740743365375 274 0.005152934708140861 275 0.005174238921781968 276 0.005123581464772355 277 0.005155969777822114 278 0.005169396608004327 279 0.00516497090489663 280 0.005145110646115385 281 0.005209611399110575 282 0.005163211771749997 283 0.005181044847507209 284 0.005281641245183894 285 0.005323840847189907 286 0.005230924322329326 287 0.005256136984014422 288 0.005374876757439424 289 0.0053137727444009615 290 0.005468482116127402 291 0.005453857539401205 292 0.005417081656274039 293 0.005393994523838937 294 0.005506909240446873 295 0.005449365350307692 296 0.005551215606367787 297 0.005505932791992786 298 0.0055918512302572145 299 0.005663100163579326 300 0.0056382443690432705 When I do f(x) = a/b*(1-exp(-b*x)) fit[1:300] f(x) "cn20x2012" using 1:2 via a,b The curve fits perfectly. But when I try to fit the curve with a/b*(1-exp(-b*x/(3e-26)) I get the error message. Note that I've only added a constant to the exponential part of the function. What can I do to fit the function with the constant 3e-26? I'm using gnuplot 5.2 patchlevel 8 on linux
Adding that constant makes the values of exp(-b*x/(3.e-26) so close to zero that the term (1-exp(-b*x/(3e-26)) differs from 1 by less than the precision available for IEEE double precision floating point numbers. So you are essentially fitting the function g(x) = a/b, which is a very poor fit to your data. Since you already have a good fit using your original function f(x), perhaps you can explain what your goal is to change the function to something else? What question are you trying to answer?
Rescaling the plot of a tree with gnuplot
I am using the following code in gnuplot to draw a tree from different inputs. ### tree diagram with gnuplot reset session #ID Parent Name Colors shape # put datablock into strings IDs = Parents = Names = Colors = Shape = "" set table $Dummy plot "tmp.dat" u (IDs = IDs.strcol(1)." "): \ (Parents = Parents.strcol(2)." "): \ (Names = Names.strcol(3)." "): \ (Colors = Colors.strcol(4)." "): \ (Shape = Shape.strcol(5)." ") w table unset table # Top node has no parent ID "NaN" Start(n) = int(sum [i=1:words(Parents)] (word(Parents,i) eq "NaN" ? int(word(IDs,i)) : 0)) # get list index by ID ItemIdx(s,n) = n == n ? (tmp=NaN, sum [i=1:words(s)] ((word(s,i)) == n ? (tmp=i,0) : 0), tmp) : NaN # get parent of ID n Parent(n) = word(Parents,ItemIdx(IDs,n)) # get level of ID n, recursive function Level(n) = n == n ? Parent(n)>0 ? Level(Parent(n))-1 : 0 : NaN # get number of children of ID n ChildCount(n) = int(sum [i=1:words(Parents)] (word(Parents,i)==n)) # Create child list of ID n ChildList(n) = (Ch = " ", sum [i=1:words(IDs)] (word(Parents,i)==n ? (Ch = Ch.word(IDs,i)." ",1) : (Ch,0) ), Ch ) # m-th child of ID n Child(n,m) = word(ChildList(n),m) # List of leaves, recursive function LeafList(n) = (LL="", ChildCount(n)==0 ? LL=LL.n." " : sum [i=1:ChildCount(n)] (LL=LL.LeafList(Child(n,i)), 0),LL) # create list of all leaves LeafAll = LeafList(Start(0)) # get x-position of ID n, recursive function XPos(n) = ChildCount(n) == 0 ? ItemIdx(LeafAll,n) : (sum [i=1:ChildCount(n)](XPos(Child(n,i))))/(ChildCount(n)) # create the tree datablock for plotting set print $Tree do for [j=1:words(IDs)] { n = int(word(IDs,j)) print sprintf("% 3d % 7.2f % 4d % 5s % 8s", n, XPos(n), Level(n), word(Names,j), word(Colors,j)) } set print print $Tree # get x and y distance from ID n to its parent dx(n) = XPos(Parent(int(n))) - XPos(int(n)) dy(n) = Level(Parent(int(n))) - Level(int(n)) unset border unset tics set offsets 0.25, 0.25, 0.25, 0.25 array shape[words(IDs)] # pointtype 6 = circle, pointtype 4 = square array color[words(IDs)] do for [i=1:words(IDs)] { color[i] = int(word(Colors,i)) shape[i] = int(word(Shape,i)) print sprintf("color[%2d] = %d",i,color[i]) } plot $Tree u 2:3:(dx($1)):(dy($1)) w vec nohead ls -1 not,\ "" u 2:3:(shape[$1]+1):(color[$1]) w p pt variable ps 6 lc rgb variable not, \ "" u 2:3:(shape[$1]) w p pt variable ps 6 lw 1.5 lc rgb "black" not, \ "" u 2:3:4 w labels offset 0,0.1 center not ### end of code for a small dataset like this one, the output works perfect 1 2.00 0 y_{45} 0xFE1034 2 1.00 -1 - 0x118C4B 3 2.99 -1 y_{37} 0xFE1034 4 2.00 -2 - 0xC6C1C1 5 3.98 -2 y_{13} 0xFE1034 6 3.00 -3 - 0x118C4B 7 4.97 -3 y_{14} 0xFE1034 8 4.00 -4 - 0x118C4B 9 5.94 -4 y_{20} 0xFE1034 10 5.00 -5 - 0xC6C1C1 11 6.88 -5 y_{27} 0xFE1034 12 6.00 -6 - 0xC6C1C1 13 7.75 -6 y_{41} 0xFE1034 14 7.00 -7 - 0xC6C1C1 15 8.50 -7 y_{54} 0xFE1034 16 8.00 -8 - 0xC6C1C1 17 9.00 -8 - 0xC6C1C1 But, for larger datasets the tree becomes cramped, the nodes overlap, and looks ugly. Moreover, when there are more than a few hundred nodes like below, I get a stack overflow error and the plot does not appear. The error comes from this line LeafAll = LeafList(Start(0)) Any help with this will be appreciated. 1 NaN y_{295} 0xFE1034 6 2 1 x_{0} 0x33B2FF 6 3 1 y_{1285} 0xFE1034 6 4 2 - 0xC6C1C1 8 5 2 - 0xC6C1C1 8 6 3 x_{3} 0x33B2FF 6 7 3 y_{18} 0xFE1034 6 8 6 - 0xC6C1C1 8 9 6 - 0xC6C1C1 8 10 7 x_{13} 0x33B2FF 6 11 7 y_{21} 0xFE1034 6 12 10 - 0xC6C1C1 8 13 10 - 0xC6C1C1 8 14 11 x_{10} 0x33B2FF 6 15 11 y_{50} 0xFE1034 6 16 14 - 0xC6C1C1 8 17 14 - 0xC6C1C1 8 18 15 - 0x118C4B 4 19 15 y_{62} 0xFE1034 6 20 19 - 0xC6C1C1 8 21 19 y_{48} 0xFE1034 6 22 21 x_{41} 0x33B2FF 6 23 21 y_{1839} 0xFE1034 6 24 22 - 0xC6C1C1 8 25 22 - 0xC6C1C1 8 26 23 - 0xC6C1C1 8 27 23 y_{44} 0xFE1034 6 28 27 x_{12} 0x33B2FF 6 29 27 y_{15} 0xFE1034 6 30 28 - 0xC6C1C1 8 31 28 - 0xC6C1C1 8 32 29 x_{58} 0x33B2FF 6 33 29 y_{127} 0xFE1034 6 34 32 - 0xC6C1C1 8 35 32 - 0xC6C1C1 8 36 33 - 0xC6C1C1 8 37 33 y_{60} 0xFE1034 6 38 37 - 0xC6C1C1 8 39 37 y_{1825} 0xFE1034 6 40 39 - 0xC6C1C1 8 41 39 y_{1878} 0xFE1034 6 42 41 - 0xC6C1C1 8 43 41 y_{33} 0xFE1034 6 44 43 - 0xC6C1C1 8 45 43 y_{3} 0xFE1034 6 46 45 - 0xC6C1C1 8 47 45 y_{1435} 0xFE1034 6 48 47 - 0xC6C1C1 8 49 47 y_{218} 0xFE1034 6 50 49 - 0xC6C1C1 8 51 49 y_{20} 0xFE1034 6 52 51 - 0xC6C1C1 8 53 51 y_{13} 0xFE1034 6 54 53 - 0xC6C1C1 8 55 53 y_{47} 0xFE1034 6 56 55 - 0xC6C1C1 8 57 55 y_{2321} 0xFE1034 6 58 57 - 0xC6C1C1 8 59 57 y_{28} 0xFE1034 6 60 59 - 0xC6C1C1 8 61 59 y_{52} 0xFE1034 6 62 61 - 0xC6C1C1 8 63 61 y_{2410} 0xFE1034 6 64 63 - 0xC6C1C1 8 65 63 y_{1751} 0xFE1034 6 66 65 - 0xC6C1C1 8 67 65 y_{186} 0xFE1034 6 68 67 - 0xC6C1C1 8 69 67 y_{1850} 0xFE1034 6 70 69 - 0xC6C1C1 8 71 69 y_{491} 0xFE1034 6 72 71 - 0xC6C1C1 8 73 71 y_{23} 0xFE1034 6 74 73 - 0xC6C1C1 8 75 73 y_{0} 0xFE1034 6 76 75 x_{52} 0x33B2FF 6 77 75 y_{1110} 0xFE1034 6 78 76 - 0xC6C1C1 8 79 76 - 0xC6C1C1 8 80 77 - 0xC6C1C1 8 81 77 y_{57} 0xFE1034 6 82 81 - 0xC6C1C1 8 83 81 y_{12} 0xFE1034 6 84 83 - 0xC6C1C1 8 85 83 y_{1269} 0xFE1034 6 86 85 - 0xC6C1C1 8 87 85 y_{1278} 0xFE1034 6 88 87 - 0x118C4B 4 89 87 y_{63} 0xFE1034 6 90 89 - 0xC6C1C1 8 91 89 y_{1338} 0xFE1034 6 92 91 - 0xC6C1C1 8 93 91 y_{1271} 0xFE1034 6 94 93 - 0xC6C1C1 8 95 93 y_{41} 0xFE1034 6 96 95 - 0xC6C1C1 8 97 95 y_{65} 0xFE1034 6 98 97 - 0x118C4B 4 99 97 y_{1630} 0xFE1034 6 100 99 - 0xC6C1C1 8 101 99 y_{2068} 0xFE1034 6 102 101 - 0xC6C1C1 8 103 101 y_{2532} 0xFE1034 6 104 103 - 0xC6C1C1 8 105 103 y_{1760} 0xFE1034 6 106 105 - 0xC6C1C1 8 107 105 y_{188} 0xFE1034 6 108 107 - 0xC6C1C1 8 109 107 y_{2405} 0xFE1034 6 110 109 - 0xC6C1C1 8 111 109 y_{1867} 0xFE1034 6 112 111 - 0xC6C1C1 8 113 111 y_{1482} 0xFE1034 6 114 113 - 0xC6C1C1 8 115 113 y_{79} 0xFE1034 6 116 115 - 0xC6C1C1 8 117 115 y_{11} 0xFE1034 6 118 117 - 0xC6C1C1 8 119 117 y_{5226} 0xFE1034 6 120 119 - 0xC6C1C1 8 121 119 y_{354} 0xFE1034 6 122 121 - 0xC6C1C1 8 123 121 y_{2748} 0xFE1034 6 124 123 - 0xC6C1C1 8 125 123 y_{27} 0xFE1034 6 126 125 - 0xC6C1C1 8 127 125 y_{426} 0xFE1034 6 128 127 - 0xC6C1C1 8 129 127 y_{12571} 0xFE1034 6 130 129 - 0xC6C1C1 8 131 129 y_{5089} 0xFE1034 6 132 131 - 0xC6C1C1 8 133 131 y_{2490} 0xFE1034 6 134 133 - 0xC6C1C1 8 135 133 y_{1752} 0xFE1034 6 136 135 - 0xC6C1C1 8 137 135 y_{1874} 0xFE1034 6 138 137 - 0xC6C1C1 8 139 137 y_{370} 0xFE1034 6 140 139 - 0xC6C1C1 8 141 139 y_{1453} 0xFE1034 6 142 141 - 0xC6C1C1 8 143 141 y_{2756} 0xFE1034 6 144 143 - 0xC6C1C1 8 145 143 y_{545} 0xFE1034 6 146 145 - 0xC6C1C1 8 147 145 y_{36} 0xFE1034 6 148 147 - 0xC6C1C1 8 149 147 y_{2409} 0xFE1034 6 150 149 - 0xC6C1C1 8 151 149 y_{96} 0xFE1034 6 152 151 - 0xC6C1C1 8 153 151 y_{82} 0xFE1034 6 154 153 - 0xC6C1C1 8 155 153 y_{1788} 0xFE1034 6 156 155 - 0xC6C1C1 8 157 155 y_{2812} 0xFE1034 6 158 157 - 0xC6C1C1 8 159 157 y_{10357} 0xFE1034 6 160 159 - 0xC6C1C1 8 161 159 y_{1801} 0xFE1034 6 162 161 - 0xC6C1C1 8 163 161 y_{55} 0xFE1034 6 164 163 - 0xC6C1C1 8 165 163 y_{2868} 0xFE1034 6 166 165 - 0xC6C1C1 8 167 165 y_{453} 0xFE1034 6 168 167 - 0xC6C1C1 8 169 167 y_{31} 0xFE1034 6 170 169 - 0xC6C1C1 8 171 169 y_{1281} 0xFE1034 6 172 171 - 0xC6C1C1 8 173 171 y_{17} 0xFE1034 6 174 173 - 0xC6C1C1 8 175 173 y_{1748} 0xFE1034 6 176 175 - 0xC6C1C1 8 177 175 y_{58} 0xFE1034 6 178 177 - 0xC6C1C1 8 179 177 y_{2420} 0xFE1034 6 180 179 - 0xC6C1C1 8 181 179 y_{7128} 0xFE1034 6 182 181 - 0xC6C1C1 8 183 181 y_{11164} 0xFE1034 6 184 183 - 0xC6C1C1 8 185 183 y_{1820} 0xFE1034 6 186 185 - 0xC6C1C1 8 187 185 y_{1713} 0xFE1034 6 188 187 - 0xC6C1C1 8 189 187 y_{387} 0xFE1034 6 190 189 - 0xC6C1C1 8 191 189 y_{5253} 0xFE1034 6 192 191 - 0xC6C1C1 8 193 191 y_{1699} 0xFE1034 6 194 193 - 0xC6C1C1 8 195 193 - 0xC6C1C1 8
The depth of gnuplot's evaluation stack is capped at at 250 to prevent run-away recursion. In order to increase that you would have to edit the source and recompile the program. If you really want to do that, the relevant definition is here: [gnuplot-5.2.8/src] grep -n -A 3 -B 3 STACK_DEPTH eval.h 44- 45-#include <stdio.h> /* for FILE* */ 46- 47:#define STACK_DEPTH 250 /* maximum size of the execution stack */ 48-#define MAX_AT_LEN 150 /* max number of entries in action table */ 49- 50-/* These are used by add_action() to index the subroutine list ft[] in eval.c */ I have not looked at your recursion algorithm very closely, but I would think it possible to re-order the evaluation so that the subtree information is computed bottom-up rather than top-down. In that direction it may become purely an iteration rather than a recursive descent. On the other hand you also say that larger trees don't fit into a single plot. So another approach may be to split the tree at a depth that both fits on the page and doesn't exceed the stack depth. Then you restart the process over again for each node that was truncated, and mark that node with an arrow or annotation or other indication like "subtree continued in figure 1b". Here I have hand-mangled your large figure to show the idea
Generating all the combinations of 7 columns in a dataframe and add the corresponding rows to generate new columns
I have a dataframe that looks similar to below: Wave A B C 340 77 70 15 341 80 73 15 342 83 76 16 343 86 78 17 I want to generate columns that will have all the possible combinations of the existing columns. I showed 3 cols here but in my actual data, I have 7 columns and therefore 127 total combinations. The desired output is as follows: Wave A B C AB AC AD BC ... ABC 340 77 70 15 147 92 ... 341 80 73 15 153 95 ... 342 83 76 16 159 99 ... I implemented a quite inefficient version where the user inputs the combinations (AB, AC, etc.) and a new col is created with the sum of the rows. This seems almost impossible to accomplish for 127 combinations, esp with descriptive col names.
Create a list of all combinations with chain + combinations from itertools, then sum the appropriate columns: from itertools import combinations, chain cols = [*df.iloc[:,1:]] l = list(chain.from_iterable(combinations(cols, n+2) for n in range(len(cols)))) #[('A', 'B'), ('A', 'C'), ('B', 'C'), ('A', 'B', 'C')] for items in l: df[''.join(items)] = df.loc[:, items].sum(1) Wave A B C AB AC BC ABC 0 340 77 70 15 147 92 85 162 1 341 80 73 15 153 95 88 168 2 342 83 76 16 159 99 92 175 3 343 86 78 17 164 103 95 181
You need to get the all combination first , then we just get the combination , and we need create the maps dict or Series l=df.columns[1:].tolist() l1=[list(map(list, itertools.combinations(l, i))) for i in range(len(l) + 1)] d=[dict.fromkeys(y,''.join(y))for x in l1 for y in x ] maps=pd.Series(d).apply(pd.Series).stack() df.set_index('Wave',inplace=True) df=df.reindex(columns=maps.index.get_level_values(1)) #here using reindex , get the order of your new df to the maps keys df.columns=maps.tolist() # here assign the new value to the column , since the order is same that why here I am assign it back df.sum(level=0,axis=1) Out[303]: A B C AB AC BC ABC Wave 340 77 70 15 147 92 85 162 341 80 73 15 153 95 88 168 342 83 76 16 159 99 92 175 343 86 78 17 164 103 95 181
Have one query regarding sum if formula
I am working in excel using SUMIF formula, my data is as follows: Region Opr Qty Cost Combo(col B&A) 192 114 50 500 104192 192 104 453 548 104192 192 114 125 54654 114192 192 114 155 1545 114192 192 124 12 1553 124192 192 134 12222 1554545 134192 192 174 256 15478 174192 192 104 12 1555 104192 192 104 210 1156 104192 192 114 47 448953 114192 192 114 29 59479 114192 192 124 124 32451 124192 192 134 114 290240 134192 4192 10 210 115656 104192 4192 10 47 44896 104192 4192 11 29 12866 114192 4192 11 549 290240 114192 4192 12 124 59480 124192 4192 13 114 61343 134192 4192 17 310 45339 174192 4192 10 56 32451 104192 4192 10 103 82483 104192 4192 11 685 111380 114192 4192 11 646 201858 114192 4192 12 26 6489 124192 4192 13 87 44543 134192 If you see the last column it's giving same combination result but the operator and region are not always the same. I want to do SUMIF against Region which is throwing wrong values.
You can try SUMPRODUCT: =SUMPRODUCT(((B2:B27&A2:A27)*1<>E2:E27)*1) If the concatenation of column B to A is not equal to the Combo, count as 1, then add all the 1 together in SUMPRODUCT. Change the range accordingly. The *1 convert any text to number.