Gnuplot log plot y-axis - gnuplot

I have have a .txt file with some values as a function of iteration count. And I am trying to log plot it. I have managed to do this with the following code plot 'solchange.txt' using 1:(log($2)) with lines
The x axis is perfect and the shape is perfect, but my y axis is weird. I want it to be say 10^-2 and 10^-3 and so on how can this be done?
What does -16 even mean? My value stops at 10^-7
solchange.txt
1 0.20870164249629861
2 3.0540936828943599E-002
3 2.1622388854567132E-002
4 1.7070994407582529E-002
5 1.4155375579083168E-002
6 1.2069370098131457E-002
7 1.0482626276465484E-002
8 9.2258609277672127E-003
9 8.2010529631910967E-003
10 7.3466561929682317E-003
11 6.6216556909214075E-003
12 5.9973025525987822E-003
13 5.4526144028317000E-003
14 4.9718850942694140E-003
15 4.5432279303643033E-003
16 4.1576291151408026E-003
17 3.8082604242292567E-003
18 3.4899438987894341E-003
19 3.1987266873885617E-003
20 2.9315478643644408E-003
21 2.6859845807917955E-003
22 2.4600648490906499E-003
23 2.2521338021345080E-003
24 2.0607609516045851E-003
25 1.8846776035151558E-003
26 1.7227356558349102E-003
27 1.5738810584753488E-003
28 1.4371370123238449E-003
29 1.3115934299711522E-003
30 1.1964002798033561E-003
31 1.0907632352794312E-003
32 9.9394061400687704E-004
33 9.0524097544450455E-004
34 8.2402100116123200E-004
35 7.4968344624489966E-004
36 6.8167505353953529E-004
37 6.1948438470904935E-004
38 5.6263955830880997E-004
39 5.1070590513836635E-004
40 4.6328356186664012E-004
41 4.2000502958110253E-004
42 3.8053272709547871E-004
43 3.4455657088288370E-004
44 3.1179161480603731E-004
45 2.8197578322736866E-004
46 2.5486773011298286E-004
47 2.3024485386036100E-004
48 2.0790149243582034E-004
49 1.8764731601648640E-004
50 1.6930592515617369E-004
51 1.5271365239171797E-004
52 1.3771855529436799E-004
53 1.2417958039480804E-004
54 1.1196587112561468E-004
55 1.0095618950593192E-004
56 9.1038420860557225E-005
57 8.2109133101636398E-005
58 7.4073166362883126E-005
59 6.6843234255131376E-005
60 6.0339523889261799E-005
61 5.4489287395236962E-005
62 4.9226422450250433E-005
63 4.4491043040285547E-005
64 4.0229044229763214E-005
65 3.6391666180476002E-005
66 3.2935063208632334E-005
67 2.9819883516037614E-005
68 2.7010864597994382E-005
69 2.4476448415957416E-005
70 2.2188419389643915E-005
71 2.0121567231942521E-005
72 1.8253375701611941E-005
73 1.6563737530794070E-005
74 1.5034695117064744E-005
75 1.3650206056065907E-005
76 1.2395932219765905E-005
77 1.1259050839256858E-005
78 1.0228085910393529E-005
79 9.2927581834059692E-006
80 8.4438520049430650E-006
81 7.6730973352498108E-006
82 6.9730653504927742E-006
83 6.3370761471119759E-006
84 5.7591171842940984E-006
85 5.2337712233988323E-006
86 4.7561526453150822E-006
87 4.3218511441380815E-006
88 3.9268819066553586E-006
89 3.5676414890935086E-006
90 3.2408686962536598E-006
91 2.9436098531714777E-006
92 2.6731879338693330E-006
93 2.4271750801384794E-006
94 2.2033681015580162E-006
95 1.9997666008135691E-006
96 1.8145534130165238E-006
97 1.6460770886038423E-006
98 1.4928361827763054E-006
99 1.3534651455843205E-006
100 1.2267216317977253E-006
101 1.1114750729309016E-006
102 1.0066963732594143E-006
103 9.1144860808961906E-007
104 8.2487861805891419E-007
105 7.4620940497668528E-007
106 6.7473324671810456E-007
107 6.0980545750098300E-007
108 5.5083872884395882E-007
109 4.9729799312444450E-007
110 4.4869575892107771E-007
111 4.0458787177585429E-007
112 3.6456965996078949E-007
113 3.2827242845258689E-007
114 2.9536026832557155E-007
115 2.6552715221570336E-007
116 2.3849428917172011E-007
117 2.1400771520012352E-007
118 1.9183609798172768E-007
119 1.7176873612963911E-007
120 1.5361373552607444E-007
121 1.3719634701231387E-007
122 1.2235745044898369E-007
123 1.0895217281928909E-007

log(x) is natural log. You need to use log10(x) if you want base 10.
Another, probably better way would be to use a logarithmic y axis like so:
set format y '%g'
set logscale y
plot 'solchange.txt' using 1:2 with lines
Use help set format to figure out how to change the y-axis tics.

Related

Calculate weighted average results for multiple columns based on another dataframe in Pandas

Let's say we have a students' score data df1 and credit data df2 as follows:
df1:
stu_id major Python English C++
0 U202010521 computer 56 81 82
1 U202010522 management 92 56 64
2 U202010523 management 95 88 81
3 U202010524 BigData&AI 79 53 74
4 U202010525 computer 53 71 -1
5 U202010526 computer 78 96 53
6 U202010527 BigData&AI 69 63 74
7 U202010528 BigData&AI 86 57 82
8 U202010529 BigData&AI 81 100 85
9 U202010530 BigData&AI 79 67 80
df2:
class credit
0 Python 2
1 English 4
2 C++ 3
I need to calculate weighted average for each students' scores.
df2['credit_ratio'] = df2['credit']/9
Out:
class credit credit_ratio
0 Python 2 0.222222
1 English 4 0.444444
2 C++ 3 0.333333
ie., for U202010521, his/her weighted score will be 56*0.22 + 81*0.44 + 82*0.33 = 75.02, I need to calculate each student's weighted_score as a new column, how could I do that in Pandas?
Try with set_index + mul then sum on axis=1:
df1['weighted_score'] = (
df1[df2['class']].mul(df2.set_index('class')['credit_ratio']).sum(axis=1)
)
df1:
stu_id major Python English C++ weighted_score
0 U202010521 computer 56 81 82 75.777778
1 U202010522 management 92 56 64 66.666667
2 U202010523 management 95 88 81 87.222222
3 U202010524 BigData&AI 79 53 74 65.777778
4 U202010525 computer 53 71 -1 43.000000
5 U202010526 computer 78 96 53 77.666667
6 U202010527 BigData&AI 69 63 74 68.000000
7 U202010528 BigData&AI 86 57 82 71.777778
8 U202010529 BigData&AI 81 100 85 90.777778
9 U202010530 BigData&AI 79 67 80 74.000000
Explaination:
By setting the index of df2 to class, multiplication will now align correctly with the columns of df1:
df2.set_index('class')['credit_ratio']
class
Python 0.222222
English 0.444444
C++ 0.333333
Name: credit_ratio, dtype: float64
Select the specific columns from df1 using the values from df2:
df1[df2['class']]
Python English C++
0 56 81 82
1 92 56 64
2 95 88 81
3 79 53 74
4 53 71 -1
5 78 96 53
6 69 63 74
7 86 57 82
8 81 100 85
9 79 67 80
Multiply to apply the weights:
df1[df2['class']].mul(df2.set_index('class')['credit_ratio'])
Python English C++
0 12.444444 36.000000 27.333333
1 20.444444 24.888889 21.333333
2 21.111111 39.111111 27.000000
3 17.555556 23.555556 24.666667
4 11.777778 31.555556 -0.333333
5 17.333333 42.666667 17.666667
6 15.333333 28.000000 24.666667
7 19.111111 25.333333 27.333333
8 18.000000 44.444444 28.333333
9 17.555556 29.777778 26.666667
Then sum across rows to get the total value.
df1[df2['class']].mul(df2.set_index('class')['credit_ratio']).sum(axis=1)
0 75.777778
1 66.666667
2 87.222222
3 65.777778
4 43.000000
5 77.666667
6 68.000000
7 71.777778
8 90.777778
9 74.000000
dtype: float64
I can do it in several steps, complete workflow is below:
import pandas as pd
from io import StringIO
df = pd.read_csv(StringIO(
"""stu_id major Python English C++
U202010521 computer 56 81 82
U202010522 management 92 56 64
U202010523 management 95 88 81
U202010524 BigData&AI 79 53 74
U202010525 computer 53 71 -1
U202010526 computer 78 96 53
U202010527 BigData&AI 69 63 74
U202010528 BigData&AI 86 57 82
U202010529 BigData&AI 81 100 85
U202010530 BigData&AI 79 67 80"""), sep="\s+")
df2 = pd.read_csv(StringIO(
"""class credit
Python 2
English 4
C++ 3"""), sep="\s+")
df2['credit_ratio'] = df2['credit']/9
df3 = df.melt(id_vars=["stu_id", "major"])
df3["credit_ratio"] = df3["variable"].map(df2[["class", "credit_ratio"]].set_index("class").to_dict()["credit_ratio"])
df3["G"] = df3["value"] * df3["credit_ratio"]
>>> df3.groupby("stu_id")["G"].sum()
stu_id
U202010521 75.777778
U202010522 66.666667
U202010523 87.222222
U202010524 65.777778
U202010525 43.000000
U202010526 77.666667
U202010527 68.000000
U202010528 71.777778
U202010529 90.777778
U202010530 74.000000

how to sample from a data frame along with the original indexing?

I have a pandas dataframe and would like to use .sample(frac=0.1) and create a subset of the dataset. However, I would like to have the original indexing maintained in the subset. Is this possible?
If need original index values just use your solution:
df = pd.DataFrame({'a':range(10, 100)})
print (df.sample(frac=0.1))
a
60 70
64 74
70 80
63 73
40 50
57 67
77 87
30 40
66 76
If need default index values add DataFrame.reset_index with drop=True:
print (df.sample(frac=0.1).reset_index(drop=True))
a
0 87
1 92
2 47
3 81
4 68
5 75
6 14
7 80
8 34

What can I do to run this BASIC Program?

I recently dug out an old book of mine, The Hawaiian Computer Mystery, published in 1985. There is a fragment of code in BASIC on page 81,
1 For N = 7 to 77
2 Print N, SQR(N) - INT (SQR [N] )
3 Next N
4 End
I can sort of see what it should do, but I can't get it to run. There's apparently an error in the second line, but I can't figure out what.
Assuming that you must find the digits after the decimal point of the square root of a number, than the issue is with the square brackets - they must be round. The following code:
1 For N = 7 to 77
2 Print N, SQR(N) - INT (SQR (N) )
3 Next N
4 End
(blank line in the end)
will produce the following result:
7 .64575124
8 .8284271
9 0
10 .1622777
11 .31662488
12 .46410155
13 .60555124
14 .7416575
15 .87298346
16 0
17 1.23105526E-1
18 .2426405
19 .35889912
20 .47213602
21 .5825758
22 .69041586
23 .7958317
24 .89897966
25 0
26 .09901953
27 .19615221
28 .29150248
29 .38516474
30 .47722578
31 .5677643
32 .65685415
33 .7445626
34 .8309517
35 .91608
36 0
37 .08276272
38 .16441393
39 .24499798
40 .3245554
41 .40312433
42 .48074055
43 .5574384
44 .63324976
45 .7082038
46 .78233004
47 .8556547
48 .9282031
49 0
50 .07106781
51 .14142847
52 .21110249
53 .28010988
54 .34846926
55 .41619825
56 .483315
57 .54983425
58 .6157732
59 .68114567
60 .7459669
61 .8102498
62 .8740077
63 .93725395
64 0
65 6.2257767E-2
66 .1240387
67 .18535233
68 .24621105
69 .30662346
70 .36660004
71 .42614937
72 .485281
73 .5440035
74 .60232544
75 .6602545
76 .71779823
77 .77496433

pandas read_csv not reading entire file

I have a really strange problem and don't know how to solve it.
I am using Ubuntu 18.04.2 together with Python 3.7.3 64-bit and use VScode as an editor.
I am reading data from a database and write it to a csv file with csv.writer
import pandas as pd
import csv
with open(raw_path + station + ".csv", "w+") as f:
file = csv.writer(f)
# Write header into csv
colnames = [par for par in param]
file.writerow(colnames)
# Write data into csv
for row in data:
file.writerow(row)
This works perfectly fine, it provides a .csv file with all the data I read from the database up to the current timestep. However in a later working step I have to read this data to a pandas dataframe and merge it with another pandas dataframe. I read the files like this:
data1 = pd.read_csv(raw_path + file1, sep=',')
data2 = pd.read_csv(raw_path + file2, sep=',')
And then merge the data like this:
comb_data = pd.merge(data1, data2, on="datumsec", how="left").fillna(value=-999)
For 5 out of 6 locations that I do this, everything works perfectly fine, the combined dataset has the same length as the two seperate ones. However for one location pd.read_csv seems not to read the csv files properly. I checked whether the problem is already in the database readout but everything is OK there, I can open both files with sublime and they have the same length, however when I read them with pandas.read_csv one shows less lines. The best part is, this problem is appearing totally random. Sometimes it works and reads the entire file, sometimes not. AND it occures at different locations in the file. Sometimes it stops after approx. 20000 entries, sometimes at 45000, sometimes somewhere else.. just totally random.
Here is an overview of my test output when I print all the lengths of the files
print(len(data1)): 57105
print(len(data2)): 57105
both values directly after read out from database, before writing it anywhere..
After saving the data as csv as described above and opening it in excel or sublime or anything I can confirm that the data contains 57105 rows. Everything is where it is supposed to be.
However if I try to read the data as with pd.read_csv
print(len(data1)): 48612
print(len(data2)): 57105
both values after reading in the data from the csv file
data1 48612
datumsec tl rf ff dd ffx
0 1538352000 46 81 75 288 89
1 1538352600 47 79 78 284 93
2 1538353200 45 82 79 282 93
3 1538353800 44 84 71 284 91
4 1538354400 43 86 77 288 96
5 1538355000 43 85 78 289 91
6 1538355600 46 80 79 286 84
7 1538356200 51 72 68 285 83
8 1538356800 52 71 68 281 73
9 1538357400 48 75 68 276 80
10 1538358000 45 78 62 271 76
11 1538358600 42 82 66 273 76
12 1538359200 43 81 70 274 78
13 1538359800 44 80 68 275 78
14 1538360400 45 78 66 279 72
15 1538361000 45 78 67 282 73
16 1538361600 43 79 63 275 71
17 1538362200 43 81 69 280 74
18 1538362800 42 80 70 281 76
19 1538363400 43 78 69 285 77
20 1538364000 43 78 71 285 77
21 1538364600 44 75 61 288 71
22 1538365200 45 73 56 290 62
23 1538365800 45 72 44 297 57
24 1538366400 44 73 51 286 57
25 1538367000 43 76 61 281 70
26 1538367600 40 79 66 284 73
27 1538368200 39 78 70 291 76
28 1538368800 38 80 71 287 81
29 1538369400 36 81 74 285 81
... ... .. ... .. ... ...
48582 1567738800 7 100 0 210 0
48583 1567739400 6 100 0 210 0
48584 1567740000 5 100 0 210 0
48585 1567740600 6 100 0 210 0
48586 1567741200 4 100 0 210 0
48587 1567741800 4 100 0 210 0
48588 1567742400 5 100 0 210 0
48589 1567743000 4 100 0 210 0
48590 1567743600 4 100 0 210 0
48591 1567744200 4 100 0 209 0
48592 1567744800 4 100 0 209 0
48593 1567745400 5 100 0 210 0
48594 1567746000 6 100 0 210 0
48595 1567746600 5 100 0 210 0
48596 1567747200 5 100 0 210 0
48597 1567747800 5 100 0 210 0
48598 1567748400 5 100 0 210 0
48599 1567749000 6 100 0 210 0
48600 1567749600 6 100 0 210 0
48601 1567750200 5 100 0 210 0
48602 1567750800 4 100 0 210 0
48603 1567751400 5 100 0 210 0
48604 1567752000 6 100 0 210 0
48605 1567752600 7 100 0 210 0
48606 1567753200 6 100 0 210 0
48607 1567753800 5 100 0 210 0
48608 1567754400 6 100 0 210 0
48609 1567755000 7 100 0 210 0
48610 1567755600 7 100 0 210 0
48611 1567756200 7 100 0 210 0
[48612 rows x 6 columns]
datumsec tl rf schnee ival6
0 1538352000 115 61 25 107
1 1538352600 115 61 25 107
2 1538353200 115 61 25 107
3 1538353800 115 61 25 107
4 1538354400 115 61 25 107
5 1538355000 115 61 25 107
6 1538355600 115 61 25 107
7 1538356200 115 61 25 107
8 1538356800 115 61 25 107
9 1538357400 115 61 25 107
10 1538358000 115 61 25 107
11 1538358600 115 61 25 107
12 1538359200 115 61 25 107
13 1538359800 115 61 25 107
14 1538360400 115 61 25 107
15 1538361000 115 61 25 107
16 1538361600 115 61 25 107
17 1538362200 115 61 25 107
18 1538362800 115 61 25 107
19 1538363400 115 61 25 107
20 1538364000 115 61 25 107
21 1538364600 115 61 25 107
22 1538365200 115 61 25 107
23 1538365800 115 61 25 107
24 1538366400 115 61 25 107
25 1538367000 115 61 25 107
26 1538367600 115 61 25 107
27 1538368200 115 61 25 107
28 1538368800 115 61 25 107
29 1538369400 115 61 25 107
... ... ... ... ... ...
57075 1572947400 -23 100 -2 -999
57076 1572948000 -23 100 -2 -999
57077 1572948600 -22 100 -2 -999
57078 1572949200 -23 100 -2 -999
57079 1572949800 -24 100 -2 -999
57080 1572950400 -23 100 -2 -999
57081 1572951000 -21 100 -1 -999
57082 1572951600 -21 100 -1 -999
57083 1572952200 -23 100 -1 -999
57084 1572952800 -23 100 -1 -999
57085 1572953400 -22 100 -1 -999
57086 1572954000 -23 100 -1 -999
57087 1572954600 -22 100 -1 -999
57088 1572955200 -24 100 0 -999
57089 1572955800 -24 100 0 -999
57090 1572956400 -25 100 0 -999
57091 1572957000 -26 100 -1 -999
57092 1572957600 -26 100 -1 -999
57093 1572958200 -27 100 -1 -999
57094 1572958800 -25 100 -1 -999
57095 1572959400 -27 100 -1 -999
57096 1572960000 -29 100 -1 -999
57097 1572960600 -28 100 -1 -999
57098 1572961200 -28 100 -1 -999
57099 1572961800 -27 100 -1 -999
57100 1572962400 -29 100 -2 -999
57101 1572963000 -29 100 -2 -999
57102 1572963600 -29 100 -2 -999
57103 1572964200 -30 100 -2 -999
57104 1572964800 -28 100 -2 -999
[57105 rows x 5 columns]
To me there is no obvious reason in the data why it should have problems reading the entire file and obviously there are none, considering that sometimes it reads the entire file and sometimes not.
I am really clueless about this. Do you have any idea how to cope with that and what could be the problem?
I finally solved my problem and as expected it was not within the file itself. I am using multiprocesses to run the named functions and some other things in parallel. The reading from database + writing to csv file and reading from csv file are performed in two different processes. Therefore the second process (reading from csv) did not know that the csv file was still being written and read only what was already available in the csv file. Because the file was opened by a different process it did not throw an exception when being opened.
I thought I already took care of this but obviously not thoroughly enough, excluding every possible case.
I had completely the same problem with a different application and also did not understand what was wrong, because sometimes it worked and sometimes it didn't.
In a for loop, I was extracting the last two rows of a dataframe that I was creating in the same file. Sometimes, the extracted rows where not the last two at all, but most of the times it worked fine. I guess the program started extracting the last two rows before the writing process was done.
I paused the script for half a second to make sure the writing process is done:
import time
time.sleep(0.5)
However, I don't think this is not a very elegant solution, since it might not be sufficient if somebody with a slower computer uses the script for instance.
Vroni, how did you solve this in the end, is there a way to define that a specific process must not be processed parallel with other tasks. I did not define anything about parallel processing in my program, so I think if this is the cause it is done automatically.

Reading in multidigit command line parameter

I'm learning J and have modified a tutorial into a jconsole script invoked by ./knight.j N to return as output the Knight's tour for an NxN board.
#!/usr/local/bin/j
kmoves=: 3 : 0
t=. (>,{;~i.y) +"1/ _2]\2 1 2 _1 1 2 1 _2 _1 2 _1 _2 _2 1 _2 _1
(*./"1 t e. i.y) <##"1 y#.t
)
ktour=: 3 : 0
M=. >kmoves y
p=. k=. 0
b=. 1 $~ *:y
for. i.<:*:y do.
b=. 0 k}b
p=. p,k=. ((i.<./) +/"1 b{~j{M){j=. ({&b # ]) k{M
end.
assert. ~:p
(,~y)$/:p
)
echo ktour 0".>2}.ARGV
exit''
However, I'm having difficulty in handling ARGV for numbers greater than 9. The script works correctly with single digit input:
$ ./knight.j 8
0 25 14 23 28 49 12 31
15 22 27 50 13 30 63 48
26 1 24 29 62 59 32 11
21 16 51 58 43 56 47 60
2 41 20 55 52 61 10 33
17 38 53 42 57 44 7 46
40 3 36 19 54 5 34 9
37 18 39 4 35 8 45 6
But fails on double digit input:
$ ./knight.j 10
|length error: kmoves
| (*./"1 t e.i.y)<##"1 y #.t
ARGV
┌─────────────────┬──────────┬──┐
│/Users/v64/.bin/j│./knight.j│10│
└─────────────────┴──────────┴──┘
It works if I separate the digits of the parameter into different arguments:
$ ./knight.j 1 0
0 17 96 67 14 19 84 35 12 21
99 64 15 18 97 68 13 20 37 34
16 1 98 95 66 85 36 83 22 11
63 92 65 86 81 94 69 72 33 38
2 87 90 93 76 71 82 39 10 23
91 62 53 78 89 80 75 70 73 32
44 3 88 61 52 77 40 59 24 9
47 50 45 54 79 60 27 74 31 58
4 43 48 51 6 41 56 29 8 25
49 46 5 42 55 28 7 26 57 30
ARGV
┌─────────────────┬──────────┬─┬─┐
│/Users/v64/.bin/j│./knight.j│1│0│
└─────────────────┴──────────┴─┴─┘
I understand conceptually why this works, but I can't figure out how to modify the script to accept "10" as a single argument.
Thanks for the additional information on ARGV.I think the issue is that 0 ". > 2}. ARGV is a list of length 1 when '10' is the third box and an atom with shape empty when '9' is in the third box.
ARGV=: '/Users/v64/.bin/j';'./knight.j';'10'
ARGV
┌─────────────────┬──────────┬──┐
│/Users/v64/.bin/j│./knight.j│10│
└─────────────────┴──────────┴──┘
$ 0 ".>2}. ARGV NB. 1 item list
1
0 ".>2}. ARGV
10
ARGV=: '/Users/v64/.bin/j';'./knight.j';'9'
$ 0 ".>2}. ARGV NB. atom with empty shape
0 ".>2}. ARGV
9
You can change the shape of the '10' result by using {. on the length 1 list to make it an atom and I think you will find that your verb now works for double digits.
ARGV=: '/Users/v64/.bin/j';'./knight.j';'10'
ARGV
┌─────────────────┬──────────┬──┐
│/Users/v64/.bin/j│./knight.j│10│
└─────────────────┴──────────┴──┘
$ {. 0 ".>2}. ARGV NB. Atom with empty shape
{. 0 ".>2}. ARGV
10
I don't imagine this was the reason that you expected, but it does happen from time to time that results that look like atoms are actually 1 item lists which can result in length errors.
Hope this helps.

Resources