Wrong number of data blocks given by stats command - gnuplot

I am using the following Gnuplot script with Gnuplot version 4.6 patchlevel 5 :
##### Prologue #####
clear # erases the current screen or output device
reset # all graph-related options take on their default values
###### Plot options #####
set style data lines
set surface
set dgrid3d 64,64 qnorm 2
set hidden3d
set ticslevel 0.8
set isosample 40,40
set view 60, 30, 1, 1
set contour base
###### Plot data #####
stats 'modele.out'
do for [i=0:int(STATS_blocks-1)] {
set key title 'density at t'.i
splot 'modele.out' every :::i::i using 2:3:5 notitle
pause 0.5
}
and the following file :
# Time is 0.000000000000000E+000
0.0000 0.0000 0.0000 0.9787 0.0213
0.0000 0.0000 0.3333 0.9077 0.0923
0.0000 0.0000 0.6667 0.9738 0.0262
0.0000 0.0000 1.0000 0.9673 0.0327
0.0000 0.3333 0.0000 0.9044 0.0956
0.0000 0.3333 0.3333 0.9246 0.0754
0.0000 0.3333 0.6667 0.9181 0.0819
0.0000 0.3333 1.0000 0.9089 0.0911
0.0000 0.6667 0.0000 0.9348 0.0652
0.0000 0.6667 0.3333 0.9372 0.0628
0.0000 0.6667 0.6667 0.9933 0.0067
0.0000 0.6667 1.0000 0.9273 0.0727
0.0000 1.0000 0.0000 0.9909 0.0091
0.0000 1.0000 0.3333 0.9771 0.0229
0.0000 1.0000 0.6667 0.9014 0.0986
0.0000 1.0000 1.0000 0.9312 0.0688
# Time is 50.0000000000000
50.0000 0.0000 0.0000 0.1036 0.8370
50.0000 0.0000 0.3333 0.1036 0.9093
50.0000 0.0000 0.6667 0.1031 0.9368
50.0000 0.0000 1.0000 0.1042 0.8378
50.0000 0.3333 0.0000 0.1034 0.9556
50.0000 0.3333 0.3333 0.1039 0.9127
50.0000 0.3333 0.6667 0.1041 0.9761
50.0000 0.3333 1.0000 0.1041 0.9587
50.0000 0.6667 0.0000 0.1033 0.9432
50.0000 0.6667 0.3333 0.1043 0.9503
50.0000 0.6667 0.6667 0.1087 0.5931
50.0000 0.6667 1.0000 0.1057 0.9579
50.0000 1.0000 0.0000 0.1044 0.8390
50.0000 1.0000 0.3333 0.1046 0.9101
50.0000 1.0000 0.6667 0.1062 0.9597
50.0000 1.0000 1.0000 0.1063 0.8494
# Time is 100.000000000000
100.0000 0.0000 0.0000 0.0997 0.8433
100.0000 0.0000 0.3333 0.0998 0.9123
100.0000 0.0000 0.6667 0.0995 0.9501
100.0000 0.0000 1.0000 0.0999 0.8442
100.0000 0.3333 0.0000 0.0999 0.9593
100.0000 0.3333 0.3333 0.1000 0.9157
100.0000 0.3333 0.6667 0.1000 0.9794
100.0000 0.3333 1.0000 0.1002 0.9612
100.0000 0.6667 0.0000 0.0997 0.9534
100.0000 0.6667 0.3333 0.1000 0.9542
100.0000 0.6667 0.6667 0.1001 0.6028
100.0000 0.6667 1.0000 0.1004 0.9584
100.0000 1.0000 0.0000 0.1000 0.8448
100.0000 1.0000 0.3333 0.1002 0.9143
100.0000 1.0000 0.6667 0.1005 0.9571
100.0000 1.0000 1.0000 0.1006 0.8490
I don't understand why the stats command returns that I have only 1 data block. That should be 3 in my opinion. Is the file badly formatted ?

stats gives you the number of indexable blocks in your data file. These blocks are separated by pairs of blank records (i.e. two blank lines).
If you did plot 'modele.out' index 0 you would find that it plotted all your data points as well, whereas index 1 would give you an error. There is only one (indexable) block in your data.
The solution
separate your blocks by two blank lines
change your splot command to splot 'modele.out' index i using 2:3:5 notitle
When you are using splot, a single blank line separates each row (or datablock, to use the term in the manual). This isn't the same thing as a block! In all other contexts (as far as I'm aware) there are two lines between each block (or indexable block to use the term in the manual).
update
As suggested by Christoph in the comments, if you wanted to keep your file in the same format and were sure that there were no blank lines at the end, you could change your loop to this:
do for [i=0:STATS_blank] {
and use your original splot line (with every, rather than index).

Related

How to solve (NaN error) when given column specific name

I have many text files include data as follow:
350.0 2.1021 0.0000 1.4769 0.0000
357.0 2.0970 0.0000 1.4758 0.0000
364.0 2.0920 0.0000 1.4747 0.0000
371.0 2.0874 0.0000 1.4737 0.0000
I need to give each column a specific name (Ex:a,b,c,d,e)
a b c d e
350.0 2.1021 0.0000 1.4769 0.0000
357.0 2.0970 0.0000 1.4758 0.0000
364.0 2.0920 0.0000 1.4747 0.0000
371.0 2.0874 0.0000 1.4737 0.0000
After that I will start to split columns and use them separately
I wrote this code
import glob
import pandas as pd
input_files = glob.glob('input/*.txt')
for file_name in input_files:
data = pd.read_csv(file_name)
columns_list = ["a", "b", "c","d", "e"]
data_list = pd.DataFrame(data,columns=columns_list)
print(data_list)
the result is
a b c d e
0 NaN NaN NaN NaN NaN
1 NaN NaN NaN NaN NaN
2 NaN NaN NaN NaN NaN
Could you please help me?
You can define columns while reading from CSV file.
data = pd.read_csv(file_name, names=columns_list)

Why are non-appearing classes shown in the classification report?

I' m working on NER and using sklearn.metrics.classification_report to caculate micro and macro f1 score. It printed a table like:
precision recall f1-score support
0 0.0000 0.0000 0.0000 0
3 0.0000 0.0000 0.0000 0
4 0.8788 0.9027 0.8906 257
5 0.9748 0.9555 0.9650 1617
6 0.9862 0.9888 0.9875 1156
7 0.9339 0.9138 0.9237 835
8 0.8542 0.7593 0.8039 216
9 0.8945 0.8575 0.8756 702
10 0.9428 0.9382 0.9405 1668
11 0.9234 0.9139 0.9186 1661
accuracy 0.9285 8112
macro avg 0.7388 0.7230 0.7305 8112
weighted avg 0.9419 0.9285 0.9350 8112
It's obvious that the predicted labels have '0' or '3', but there's no '0' or '3' in true labels. Why the classification report will show these two classes which don't have any samples? And how to do to prevent "0-support" classes from being shown. It seems that these two classes have a great impact to macro f1 score.
You can use the following snippet to ensure that all labels in the classification report are present in y_true labels:
from sklearn.metrics import classification_report
y_true = [0, 1, 2, 2, 2, 2]
y_pred = [0, 0, 2, 2, 1, 42]
print(classification_report(y_true, y_pred, labels=np.unique(y_true)))
Which output:
precision recall f1-score support
0 0.50 1.00 0.67 1
1 0.00 0.00 0.00 1
2 1.00 0.50 0.67 4
micro avg 0.60 0.50 0.55 6
macro avg 0.50 0.50 0.44 6
weighted avg 0.75 0.50 0.56 6
As you see the label 42 present in the prediction is not shown as it has no support in y_true.

Plot variable 1 against variable 2 with curves grouped by variable 3 in Python

I have this very simple problem which I can't figure out in Python. I have three columns in a dataset. The the first is made of integers (from 0 to 19), the second is made of dates in Y/M/D format, and the third is made of numbers ranging from negative to positive values (mostly 0s, but 200 negative and positive values in total overall).
My dataset looks like that:
Groups date values
0 2020-02-22 0.0000
2020-02-23 0.0000
2020-02-26 0.0000
2020-03-28 0.0000
2020-04-13 1.3433
2020-04-14 0.0000
2020-04-15 0.0000
2020-04-16 0.0000
2020-04-17 -1.3933
2020-04-28 0.0000
2020-05-31 0.0000
2020-06-15 0.0000
2020-08-02 0.0000
1 2020-02-21 0.0000
2020-02-22 0.0000
2020-02-23 0.0000
2020-02-24 0.0000
2020-02-25 0.0000
2020-04-29 0.0000
2020-06-01 0.4404
2020-06-02 0.4404
2020-06-07 0.0000
2 2020-02-22 0.0000
2020-02-23 0.0000
2020-02-24 0.0000
2020-02-28 0.0000
2020-03-01 0.0000
2020-03-07 0.0000
2020-03-08 0.0000
2020-03-14 0.0000
I want to plot curves grouped by column Groups, with the dates on the x axis and the third column ("values") on the y axis. In other words, I want a curve for each of the 20 possible groups (0 to 19) which goes up/down depending on the values of the third column, "value" (the 0s, positive, and negative numbers), all the while keeping the dates on the x axis.
I know how to do this very easily with ggplot on R but this project is all Python based and for some reason I just can't find how to do this there.
Thanks for the help.
It looks like Groups and date are the two levels of your dataframe's index. In which casem You can do:
df['values'].unstack('Groups').plot()

How to grep rows that have value less than 0.2 in a specific column?

ID RT EZ Z0 Z1 Z2 RHO PHE
1889 UN NA 1.0000 0.0000 0.0000 0.8765 -1
1890 UN NA 1.0000 0.0000 0.0000 0.4567 -1
1891 UN NA 1.0000 0.0000 0.0000 0.0012 -1
1892 UN NA 1.0000 0.0000 0.0000 0.1011 -1
I would like to grep all the IDs that have column 'RHO' with value less than 0.2, and the other columns are included for the selected rows.
Use awk directly by saying awk '$field < value':
$ awk '$7<0.2' file
1891 UN NA 1.0000 0.0000 0.0000 0.0012 -1
1892 UN NA 1.0000 0.0000 0.0000 0.1011 -1
As RHO is the column 7, it checks that field.
In case you just want to print a specific column, say awk '$field < value {print $another_field}'. For the ID:
$ awk '$7<0.2 {print $1}' file
1891
1892

problems about texture coordinate in Obj format

as far as i know, texture coordinates should range [0,1]
but in this obj file as follow, the texture coordinates seem to range [0,2]
vt 2.0000 2.0000 0.0000
vt 1.7500 2.0000 0.0000
vt 1.7500 1.9750 0.0000
vt 2.0000 1.9750 0.0000
vt 1.7500 1.9500 0.0000
vt 2.0000 1.9500 0.0000
vt 1.7500 1.9250 0.0000
vt 2.0000 1.9250 0.0000
vt 1.7500 1.9000 0.0000
vt 2.0000 1.9000 0.0000
vt 1.5000 2.0000 0.0000
vt 1.5000 1.9750 0.0000
vt 1.5000 1.9500 0.0000
vt 1.5000 1.9250 0.0000
vt 1.5000 1.9000 0.0000
vt 1.2500 2.0000 0.0000
vt 1.2500 1.9750 0.0000
vt 1.2500 1.9500 0.0000
vt 1.2500 1.9250 0.0000
vt 1.2500 1.9000 0.0000
vt 1.0000 2.0000 0.0000
vt 1.0000 1.9750 0.0000
vt 1.0000 1.9500 0.0000
vt 1.0000 1.9250 0.0000
Why the texture ordinates here can be greater than 1 ? Can anybody explain it to me? Thanks!
you see the texture seems weird
UV texture values outside [0,1] are expected to be tiled.
I've never seen this explicity stated in any of the informal OBJ specs floating about, but it mimics the behavior of OpenGL.
Here is a relevant quote from the OpenGL redbook:
You can assign texture coordinates outside the range [0,1] and have
them either clamp or repeat in the texture map. With repeating
textures, if you have a large plane with texture coordinates running
from 0.0 to 10.0 in both directions, for example, you'll get 100
copies of the texture tiled together on the screen. During repeating,
the integer part of texture coordinates is ignored, and copies of the
texture map tile the surface. For most applications where the texture
is to be repeated, the texels at the top of the texture should match
those at the bottom, and similarly for the left and right edges.

Resources