I have the following glyph path:
<glyph glyph-name="right-nav-workflow" unicode="" d="M251 101c0 65 0 131 0 196 26 0 52 0 78 0 0 27 0 54 0 81-29 7-52 23-67 50-11 20-14 41-10 64 8 45 48 79 92 81 49 1 88-29 102-79 2 0 4 0 6 0 20 0 40 0 60 0 5 0 8 1 11 5 32 32 64 64 96 96 2 1 3 3 4 4 42-42 84-84 127-127-1 0-2-2-4-4-32-32-65-65-98-98-2-2-4-6-4-10 0-21 0-43 0-64 30-8 53-24 67-52 12-21 15-44 9-68-11-46-55-78-102-75-48 3-88 42-92 91-4 48 28 91 78 104 0 1 1 3 1 4 0 21 0 42 0 62 0 3-2 5-3 7-28 28-55 55-83 83-1 1-3 3-5 3-22 0-45 0-68 0-5-19-13-36-27-50-14-14-31-23-50-27 0-27 0-53 0-81 26 0 52 0 78 0 0-65 0-130 0-196-65 0-131 0-196 0z m157 156c-40 0-79 0-118 0 0-39 0-78 0-117 40 0 79 0 118 0 0 39 0 78 0 117z m275-58c0 33-26 59-59 59-32 0-59-26-59-59 0-33 27-59 60-59 32 0 58 26 58 59z m-334 216c33 0 59 27 59 59 0 33-26 59-59 59-33 0-59-26-59-59 0-33 26-59 59-59z m275-11c23 23 46 46 69 69-23 23-47 46-69 68-23-22-46-46-69-69 23-22 46-45 69-68z" horiz-adv-x="1000" />
How can I convert this into SVG? I tried to save it in a text file with svg extension but seems is not working.
Take the raw drawing data, put it in a path element, remove the glyph specific attributes, add an appropriate viewBox. Now you have something that works as an inline SVG. If you want to save it as a standalone SVG, then you need to add a name space declaration.
<svg height="400px" width="400px" viewBox="0 0 1000 1000">
<path transform="scale(1,-1) translate(0,-650)" fill="none" stroke="red" stroke-width="1" d="M251 101c0 65 0 131 0 196 26 0 52 0 78 0 0 27 0 54 0 81-29 7-52 23-67 50-11 20-14 41-10 64 8 45 48 79 92 81 49 1 88-29 102-79 2 0 4 0 6 0 20 0 40 0 60 0 5 0 8 1 11 5 32 32 64 64 96 96 2 1 3 3 4 4 42-42 84-84 127-127-1 0-2-2-4-4-32-32-65-65-98-98-2-2-4-6-4-10 0-21 0-43 0-64 30-8 53-24 67-52 12-21 15-44 9-68-11-46-55-78-102-75-48 3-88 42-92 91-4 48 28 91 78 104 0 1 1 3 1 4 0 21 0 42 0 62 0 3-2 5-3 7-28 28-55 55-83 83-1 1-3 3-5 3-22 0-45 0-68 0-5-19-13-36-27-50-14-14-31-23-50-27 0-27 0-53 0-81 26 0 52 0 78 0 0-65 0-130 0-196-65 0-131 0-196 0z m157 156c-40 0-79 0-118 0 0-39 0-78 0-117 40 0 79 0 118 0 0 39 0 78 0 117z m275-58c0 33-26 59-59 59-32 0-59-26-59-59 0-33 27-59 60-59 32 0 58 26 58 59z m-334 216c33 0 59 27 59 59 0 33-26 59-59 59-33 0-59-26-59-59 0-33 26-59 59-59z m275-11c23 23 46 46 69 69-23 23-47 46-69 68-23-22-46-46-69-69 23-22 46-45 69-68z"/>
</svg>
Update - Robert points out that the SVG Fonts spec has a y axis (0 on the bottom) that's inverted from the SVG norm (0 is the top) -> so you also need to flip the drawing along the x axis by using a scale (1,-1).
Related
I have the following Dataframe called df_cam_cb_days:
3m 6m 9m 1y 18m 24m Effective
2021-03-30 49 49 49 49 49 49 2021-03-31
2021-05-13 40 44 44 44 44 44 2021-05-14
2021-06-08 0 26 26 26 26 26 2021-06-09
2021-07-14 0 36 36 36 36 36 2021-07-15
2021-08-31 0 26 48 48 48 48 2021-09-01
2021-10-13 0 0 43 43 43 43 2021-10-14
2021-12-14 0 0 27 62 62 62 2021-12-15
2022-01-26 0 0 0 43 43 43 2022-01-27
2022-03-30 0 0 0 14 63 63 2022-03-31
2022-05-11 0 0 0 0 42 42 2022-05-12
2022-06-08 0 0 0 0 28 28 2022-06-09
2022-07-13 0 0 0 0 35 35 2022-07-14
2022-08-31 0 0 0 0 27 49 2022-09-01
2022-10-12 0 0 0 0 0 42 2022-10-13
2022-12-14 0 0 0 0 0 63 2022-12-15
2023-01-25 0 0 0 0 0 42 2023-01-26
2023-02-10 0 0 0 0 0 15 2023-02-11
and I have the following function that receives the DataFrame and an array:
mon_policy =np.array([.5,
.75,
.75,
1,
1,
1,
1,
1,
1,
1,
1,
1,
1,
1,
1,
1,
1])
#returns numpy array with Breakeven info
def cam_be_mon(mp,df):
columns = ['3m','6m','9m','1y','18m','24m']
days_array = np.array([0,0,0,0,0,0])
days_array = df_cam_cb_days[columns].sum(axis=0).values
data_array= df_cam_cb_days[columns].values.T
c= np.log(mp/36000+1)
be = np.dot(data_array,c)
be = (np.exp(be[0:])-1)*36000/days_array
return be
target = np.array([.3525,.415,.475,.56,.715,.916366])
cam_be_mon(mon_policy,df_cam_cb_days)
The Function as is returns the solution: array([0.61281788, 0.76943154, 0.84886388, 0.88890188,
0.92955637, 0.95151633])
I need to find the set of <mon_policy> that returns a solution = to target , or the closest if there is no solution.
I found the answer with scipy.optimize
I am trying to plot the result of an experiment with the following
set encoding iso_8859_1
set key right top font "Helvetica,17"
# set key left top font "Helvetica,18"
# set key at 100,1.25 bottom center font "Helvetica,17"
# set ylabel "Percentage of non-completed batches" font "Helvetica,18"
set ylabel "Extra frames" font "Helvetica,18"
set xlabel "Dispersion [%]" font "Helvetica,18"
unset key
set xtics font "Helvetica,18"
set ytics font "Helvetica,18"
set terminal postscript eps enhanced color #size 6.5in,3in #"Helvetica" 16 #
set output "contour.eps"
set palette rgbformulae 33,13,10
$complete << EOD
0 0 2.000400080016007
0 5 1.160696417850715
0 10 0.48028817290374226
0 15 0.5003001801080598
0 20 0.5201040208041574
0 25 0.9803921568627416
0 30 2.360472094418886
0 35 14.942988597719541
0 40 28.6972183309986
0 45 34.78783026421137
0 50 39.771817453963166
0 55 47.28837302381429
0 60 62.3124624924985
0 65 74.25940752602081
0 70 79.45589117823565
0 75 86.93477390956382
0 80 91.47829565913183
0 85 94.21884376875374
0 90 95.75915183036608
0 95 97.47949589917984
0 100 98.33966793358671
20 0 0
20 5 0
20 10 0
20 15 0
20 20 0
20 25 0
20 30 0
20 35 0
20 40 0
20 45 0
20 50 0
20 55 0
20 60 0.22022022022022414
20 65 0.26026026026025884
20 70 1.6012810248198561
20 75 7.301460292058415
20 80 13.165266106442575
20 85 22.03321993195918
20 90 33.13325330132053
20 95 42.997198879551824
20 100 52.041633306645316
40 0 0
40 5 0
40 10 0
40 15 0
40 20 0
40 25 0
40 30 0
40 35 0
40 40 0
40 45 0
40 50 0
40 55 0
40 60 0
40 65 0.020016012810253336
40 70 0.08006405124099114
40 75 1.9619619619619666
40 80 7.703081232492992
40 85 12.264905962384953
40 90 22.168867547018813
40 95 31.81909145487293
40 100 43.17727090836334
60 0 0
60 5 0
60 10 0
60 15 0
60 20 0
60 25 0
60 30 0
60 35 0
60 40 0
60 45 0
60 50 0
60 55 0
60 60 0
60 65 0
60 70 0.02000400080015563
60 75 0.7201440288057581
60 80 5.484387510008004
60 85 9.263705482192874
60 90 18.170902541524914
60 95 28.665733146629325
60 100 38.21528611444578
80 0 0
80 5 0
80 10 0
80 15 0
80 20 0
80 25 0
80 30 0
80 35 0
80 40 0
80 45 0
80 50 0
80 55 0
80 60 0
80 65 0
80 70 0
80 75 0.4604604604604656
80 80 4.02080416083217
80 85 7.78155631126225
80 90 15.835835835835832
80 95 25.690276110444177
80 100 34.90792634107286
100 0 0
100 5 0
100 10 0
100 15 0
100 20 0
100 25 0
100 30 0
100 35 0
100 40 0
100 45 0
100 50 0
100 55 0
100 60 0
100 65 0
100 70 0
100 75 0.16009605763458445
100 80 3.220644128825767
100 85 7.145716573258609
100 90 13.985594237695075
100 95 22.83370022013208
100 100 32.0328131252501
120 0 0
120 5 0
120 10 0
120 15 0
120 20 0
120 25 0
120 30 0
120 35 0
120 40 0
120 45 0
120 50 0
120 55 0
120 60 0
120 65 0
120 70 0
120 75 0.08001600320064473
120 80 2.8411364545818274
120 85 5.841168233646732
120 90 12.782556511302257
120 95 21.73303982389434
120 100 30.550550550550547
140 0 0
140 5 0
140 10 0
140 15 0
140 20 0
140 25 0
140 30 0
140 35 0
140 40 0
140 45 0
140 50 0
140 55 0
140 60 0
140 65 0
140 70 0
140 75 0.02000800320127727
140 80 2.601040416166467
140 85 5.584467574059248
140 90 12.027216329797874
140 95 20.12012012012012
140 100 29.372871168102588
160 0 0
160 5 0
160 10 0
160 15 0
160 20 0
160 25 0
160 30 0
160 35 0
160 40 0
160 45 0
160 50 0
160 55 0
160 60 0
160 65 0
160 70 0
160 75 0.02000800320127727
160 80 1.8021625951141318
160 85 5.321064212842563
160 90 10.024009603841533
160 95 18.60744297719088
160 100 29.040656919687557
180 0 0
180 5 0
180 10 0
180 15 0
180 20 0
180 25 0
180 30 0
180 35 0
180 40 0
180 45 0
180 50 0
180 55 0
180 60 0
180 65 0
180 70 0
180 75 0.020012007204317506
180 80 1.8603720744148844
180 85 4.90196078431373
180 90 10.864345738295322
180 95 18.687474989996
180 100 27.09083633453382
200 0 0
200 5 0
200 10 0
200 15 0
200 20 0
200 25 0
200 30 0
200 35 0
200 40 0
200 45 0
200 50 0
200 55 0
200 60 0
200 65 0
200 70 0
200 75 0.020012007204317506
200 80 1.3605442176870763
200 85 4.520904180836172
200 90 9.601920384076813
200 95 17.363472694538906
200 100 26.485297059411884
220 0 0
220 5 0
220 10 0
220 15 0
220 20 0
220 25 0
220 30 0
220 35 0
220 40 0
220 45 0
220 50 0
220 55 0
220 60 0
220 65 0
220 70 0
220 75 0.020012007204317506
220 80 1.241489787745298
220 85 4.520904180836172
220 90 8.805283169901944
220 95 17.18687474989996
220 100 25.825165033006602
240 0 0
240 5 0
240 10 0
240 15 0
240 20 0
240 25 0
240 30 0
240 35 0
240 40 0
240 45 0
240 50 0
240 55 0
240 60 0
240 65 0
240 70 0
240 75 0
240 80 1.0806483890334229
240 85 4.740948189637928
240 90 8.30332132853141
240 95 16.643328665733147
240 100 25.245049009801956
260 0 0
260 5 0
260 10 0
260 15 0
260 20 0
260 25 0
260 30 0
260 35 0
260 40 0
260 45 0
260 50 0
260 55 0
260 60 0
260 65 0
260 70 0
260 75 0
260 80 1.040208041608326
260 85 4.181672669067627
260 90 8.641728345669131
260 95 15.606242496998801
260 100 25.415249149489693
280 0 0
280 5 0
280 10 0
280 15 0
280 20 0
280 25 0
280 30 0
280 35 0
280 40 0
280 45 0
280 50 0
280 55 0
280 60 0
280 65 0
280 70 0
280 75 0
280 80 0.9603841536614643
280 85 3.8007601520304024
280 90 7.74464678807284
280 95 15.395395395395395
280 100 24.16966786714686
300 0 0
300 5 0
300 10 0
300 15 0
300 20 0
300 25 0
300 30 0
300 35 0
300 40 0
300 45 0
300 50 0
300 55 0
300 60 0
300 65 0
300 70 0
300 75 0
300 80 0.860860860860857
300 85 3.923923923923922
300 90 7.401480296059216
300 95 14.811849479583671
300 100 25.27516509905944
320 0 0
320 5 0
320 10 0
320 15 0
320 20 0
320 25 0
320 30 0
320 35 0
320 40 0
320 45 0
320 50 0
320 55 0
320 60 0
320 65 0
320 70 0
320 75 0
320 80 1.00040016006403
320 85 3.7222333400039997
320 90 7.923169267707086
320 95 15.398478173808572
320 100 22.864572914582915
340 0 0
340 5 0
340 10 0
340 15 0
340 20 0
340 25 0
340 30 0
340 35 0
340 40 0
340 45 0
340 50 0
340 55 0
340 60 0
340 65 0
340 70 0
340 75 0
340 80 0.8001600320064028
340 85 3.640728145629124
340 90 7.488986784140971
340 95 14.165666266506605
340 100 23.024604920984192
360 0 0
360 5 0
360 10 0
360 15 0
360 20 0
360 25 0
360 30 0
360 35 0
360 40 0
360 45 0
360 50 0
360 55 0
360 60 0
360 65 0
360 70 0
360 75 0
360 80 0.7204322593556078
360 85 3.3806761352270454
360 90 6.742697078831528
360 95 14.943910256410254
360 100 22.08883553421368
380 0 0
380 5 0
380 10 0
380 15 0
380 20 0
380 25 0
380 30 0
380 35 0
380 40 0
380 45 0
380 50 0
380 55 0
380 60 0
380 65 0
380 70 0
380 75 0
380 80 0.5404323458767069
380 85 3.5007001400280013
380 90 6.8254603682946335
380 95 14.522904580916185
380 100 22.246696035242287
400 0 0
400 5 0
400 10 0
400 15 0
400 20 0
400 25 0
400 30 0
400 35 0
400 40 0
400 45 0
400 50 0
400 55 0
400 60 0
400 65 0
400 70 0
400 75 0
400 80 0.48067294211896483
400 85 3.3633633633633586
400 90 6.242496998799519
400 95 14.068441064638781
400 100 23.324664932986593
420 0 0
420 5 0
420 10 0
420 15 0
420 20 0
420 25 0
420 30 0
420 35 0
420 40 0
420 45 0
420 50 0
420 55 0
420 60 0
420 65 0
420 70 0
420 75 0
420 80 0.4003202562049668
420 85 3.141256502601042
420 90 6.761352270454091
420 95 13.568140884530722
420 100 22.98919567827131
440 0 0
440 5 0
440 10 0
440 15 0
440 20 0
440 25 0
440 30 0
440 35 0
440 40 0
440 45 0
440 50 0
440 55 0
440 60 0
440 65 0
440 70 0
440 75 0
440 80 0.200080032012806
440 85 3.0218130878527094
440 90 6.4051240992794245
440 95 13.4453781512605
440 100 21.024204840968196
460 0 0
460 5 0
460 10 0
460 15 0
460 20 0
460 25 0
460 30 0
460 35 0
460 40 0
460 45 0
460 50 0
460 55 0
460 60 0
460 65 0
460 70 0
460 75 0
460 80 0.4601840736294549
460 85 2.7005401080216096
460 90 6.841368273654735
460 95 12.70508203281312
460 100 20.92837134853942
480 0 0
480 5 0
480 10 0
480 15 0
480 20 0
480 25 0
480 30 0
480 35 0
480 40 0
480 45 0
480 50 0
480 55 0
480 60 0
480 65 0
480 70 0
480 75 0
480 80 0.3800760152030458
480 85 3.0224179343474766
480 90 5.542216886754703
480 95 13.522704540908181
480 100 21.87750200160128
500 0 0
500 5 0
500 10 0
500 15 0
500 20 0
500 25 0
500 30 0
500 35 0
500 40 0
500 45 0
500 50 0
500 55 0
500 60 0
500 65 0
500 70 0
500 75 0
500 80 0.46027616569942476
500 85 2.901741044626771
500 90 6.162464985994398
500 95 13.265306122448983
500 100 19.603524229074885
EOD
set contour base
set cntrparam level incremental 0, 10, 100
unset surface
set table 'cont.dat'
splot '$complete'
unset table
plot [0:100][0:150] '$complete' using 2:1:3 with image , 'cont.dat' w l lt -1 lw 1.5
and the figure is
which has the isolines rotated. I would like to ask how to rotate the isolines.
Besides, I would like to ask how I could make a plot of the data with colors not blurred like this
to compare which of the two represents better the data block.
Regards
If you do not specify using ... gnuplot will plot using 1:2, but in your case you need 2:1, that's why x and y are switched (rotated).
If you add/change the following lines:
set palette maxcolors 10
plot [0:100][0:150] '$complete' using 2:1:3 with image , 'cont.dat' u 2:1 w l lt -1 lw 1.5
You should get something like this:
For the code:
dataset = pd.read_csv("/Users/Akshita/Desktop/EE660/donor_raw_data_medmean.csv", header=None, names=None)
# Separate data and label
X_label = dataset[1:19373][0]
X_data = dataset[1:19373]
print(X_data[X_label==1])
I get the output:(There are actually 4000~ samples with label=1)
0 1 2 3 4 5 6 7 8 9 ... 51 52 53 54 55 56 57 58 \
16386 1 17 60 0 1 0 0 0 0 1 ... 0 20 20 20 5 10 15 15
16396 1 137 60 0 1 0 0 0 0 1 ... 15 25 10 15 6 14 16 120
16399 1 89 54 0 1 0 0 0 0 1 ... 10 15 5 15 6 14 16 79
16402 1 89 75 0 1 0 0 0 0 1 ... 25 35 10 35 6 13 15 79
..
..
19356 1 101 80 1 0 0 1 0 0 2 ... 25 30 5 28 7 16 18 101
19363 1 65 70 1 0 0 1 0 0 1 ... 7 12 5 10 4 8 20 63
19372 1 29 70 0 0 0 1 0 0 2 ... 0 25 25 25 4 9 24 24
..
[859 rows x 61 columns]
and for
print(X_data[X_label==0])
I get the output:(There are about 15000~ samples with label=0)
0 1 2 3 4 5 6 7 8 9 ... 51 52 53 54 55 56 57 58 \
16384 0 17 74 0 1 0 0 0 0 1 ... 0 15 15 15 4 10 17 17
16385 0 17 60 0 1 0 0 0 0 2 ... 0 15 15 15 4 11 17 17
16387 0 29 67 0 1 0 0 0 0 1 ... 0 20 20 20 5 11 23 28
16388 0 53 60 0 1 0 0 0 0 1 ... 5 30 25 30 5 11 26 52
16389 0 65 49 0 1 0 0 0 0 1 ... 30 35 5 27 6 13 16 56
..
..
19369 0 137 77 1 0 1 0 0 0 1 ... 9 10 1 10 6 13 21 130
19370 0 29 60 1 0 0 1 0 0 1 ... 0 15 15 15 3 9 23 23
19371 0 129 78 1 0 0 1 0 0 2 ... 20 25 5 25 7 24 8 129
What can I be doing wrong?
IGNORING columns 1 & 2 (only the rest of the columns); I would like to obtain the occurrence COUNT of UNIQUE EVEN values (ignoring ODD ones) for the following set of data.
I have tried:
awk '{ a[$3, $4, $5, $6, $7]++ } END { for (b in a) { cnt+=1 } {print cnt}}' file
I obtain 76 but I don’t expect this value.
> 0 0
> 1 0 0
> 2 0 2
> 3 0 0 6
> 4 0 0 8
> 5 0 0 10
> 6 0 2 14
> 7 0 2 16
> 8 0 0 6 20
> 9 0 0 8 24
> 10 0 0 8 26
> 11 0 0 10 32
> 12 0 0 10 34
> 13 0 2 14 40
> 14 0 2 16 42
> 15 0 0 8 24 48
> 16 0 0 8 24 50
> 17 0 0 8 26 56
> 18 0 0 10 32 60
> 19 0 0 10 34 64
> 20 0 0 10 34 66
> 21 0 2 14 40 72
> 22 0 0 8 24 48 76
> 23 0 0 8 24 50 82
> 24 0 0 8 26 56 88
> 25 0 0 8 26 56 90
> 26 0 0 10 32 60 96
> 27 0 0 10 32 60 98
> 28 0 0 10 34 64 104
> 29 0 0 10 34 64 106
> 30 0 0 10 34 66 112
> 31 0 0 10 34 66 114
> 0 1
> 1 1 2 5
> 2 1 2
> 3 1 2 12 23 19
> 4 1 2 12 23
> 5 1 2 12
> 6 1 2 12 28
> 7 1 2 12 28 36
> 8 1 2 12 30 47 45
> 9 1 2 12 30 47
> 10 1 2 12 30
> 11 1 2 12 30 52
> 12 1 2 12 28 38
> 13 1 2 12 28 38 62
> 14 1 2 12 28 38 62 68
> 15 1 2 12 30 54 75
> 16 1 2 12 30 54
> 17 1 2 12 30 54 78
> 18 1 2 12 30 54 78 84
> 19 1 2 12 30 54 78 84 92
> 20 1 2 12 28 38 62 70
> 21 1 2 12 28 38 62 70 108
> 22 1 2 12 30 54 80
> 23 1 2 12 30 54 78 86
> 24 1 2 12 30 54 78 86 120
> 25 1 2 12 30 54 78 84 94
> 26 1 2 12 30 54 78 84 94 124
> 27 1 2 12 30 54 78 84 92 102
> 28 1 2 12 30 54 78 84 92 102 128
> 29 1 2 12 28 38 62 70 110
> 30 1 2 12 28 38 62 70 110 130
> 31 1 2 12 28 38 62 70 108 116
> 0 2
> 1 2 2 5
> 2 2 2
> 3 2 2 5 6
> 4 2 2 5 6 18
> 5 2 2 5 6 18 22
> 6 2 2 14
> 7 2 2 16
> 8 2 2 5 6 20
> 9 2 2 5 6 20 44
> 10 2 2 5 6 18 26
> 11 2 2 5 6 18 22 32
> 12 2 2 5 6 18 22 32 58
> 13 2 2 14 40
> 14 2 2 16 42
> 15 2 2 5 6 20 44 50 75
> 16 2 2 5 6 20 44 50
> 17 2 2 5 6 18 26 56
> 18 2 2 5 6 18 22 32 60
> 19 2 2 14 40 72 109 101
> 20 2 2 14 40 72 109
> 21 2 2 14 40 72
> 22 2 2 5 6 20 44 50 80
> 23 2 2 5 6 20 44 50 80 118
> 24 2 2 5 6 20 44 50 80 118 120
> 25 2 2 5 6 20 44 50 80 118 120 122
> 26 2 2 14 40 72 109 101 102 127
> 27 2 2 14 40 72 109 101 102
> 28 2 2 14 40 72 109 101 104
> 29 2 2 14 40 72 116 133 131
> 30 2 2 14 40 72 116 133
> 31 2 2 14 40 72 116
> 0 3
> 1 3 0
> 2 3 0 4
> 3 3 0 6
> 4 3 0 6 18
> 5 3 0 6 18 22
> 6 3 0 4 16 37
> 7 3 0 4 16
> 8 3 0 6 20
> 9 3 0 6 18 26 47
> 10 3 0 6 18 26
> 11 3 0 6 18 22 32
> 12 3 0 6 18 22 32 58
> 13 3 0 4 16 42 69
> 14 3 0 4 16 42
> 15 3 0 6 18 26 47 48
> 16 3 0 6 18 26 47 48 74
> 17 3 0 6 18 26 56
> 18 3 0 6 18 22 32 60
> 19 3 0 6 18 22 32 58 64
> 20 3 0 6 18 22 32 58 66
> 21 3 0 6 18 22 32 58 66 108
> 22 3 0 6 18 26 47 48 76
> 23 3 0 6 18 26 56 86
> 24 3 0 6 18 26 56 88
> 25 3 0 6 18 26 56 90
> 26 3 0 6 18 22 32 60 96
> 27 3 0 6 18 22 32 60 98
> 28 3 0 6 18 22 32 58 64 104
> 29 3 0 6 18 22 32 58 64 106
> 30 3 0 6 18 22 32 58 66 112
> 31 3 0 6 18 22 32 58 66 114
> 0 4
> 1 4 0
> 2 4 2
> 3 4 0 6
> 4 4 0 8
> 5 4 0 10
> 6 4 2 16 37
> 7 4 2 16
> 8 4 0 8 24 45
> 9 4 0 8 24
> 10 4 0 8 26
> 11 4 0 8 26 52
> 12 4 2 16 37 38
> 13 4 2 16 42 69
> 14 4 2 16 42
> 15 4 0 8 24 48
> 16 4 0 8 24 50
> 17 4 0 8 26 56
> 18 4 0 8 26 52 60
> 19 4 2 16 37 38 64
> 20 4 2 16 42 69 70
> 21 4 2 16 42 69 72
> 22 4 0 8 24 48 76
> 23 4 0 8 24 50 82
> 24 4 0 8 26 56 88
> 25 4 0 8 26 52 60 94
> 26 4 0 8 26 52 60 96
> 27 4 0 8 26 52 60 98
> 28 4 2 16 37 38 64 104
> 29 4 2 16 42 69 70 110
> 30 4 2 16 42 69 70 112
> 31 4 2 16 42 69 70 114
You can try this awk command to count unique values ignoring 1st and 2nd column:
awk '{$1=$2=""; !seen[$0]++} END{print length(seen)}' file
130
If you are counting uniques excluding 1st and 2nd columns and ignoring odd numbers then use:
awk '{for (i=3; i<=NF; i++) !($i%2) && !seen[$i]++} END{print length(seen)}' file
63
Hi i have a table which looks like this:
chr10 84890986 84891021 2 17.5 2 93 0 61 48 2 48 0 1.16 GA
chr10 84897562 84897613 2 25.5 2 100 0 102 50 49 0 0 1 AC
chr10 84899819 84899844 2 12.5 2 100 0 50 0 0 52 48 1 GT
chr10 84905282 84905318 6 5.8 6 87 6 54 80 19 0 0 0.71 AAAAAC
chr10 84955235 84955267 2 16 2 100 0 64 50 0 0 50 1 AT
chr10 84972254 84972288 2 17 2 93 0 59 2 0 47 50 1.16 GT
chr10 85011399 85011478 3 25.7 3 80 12 63 58 1 40 0 1.06 GAA
chr10 85011461 85011525 3 20.7 3 87 6 74 39 0 60 0 0.97 GAG
chr10 85014721 85014841 5 23.8 5 78 8 66 0 69 0 29 1 TTCCC
chr10 85021530 85021701 5 38.4 5 84 13 53 74 0 24 0 0.85 AAGAG
chr10 85045413 85045440 3 9 3 100 0 54 66 33 0 0 0.92 CAA
chr10 85059334 85059364 5 6 5 92 0 51 20 3 0 76 0.92 ATTTT
chr10 85072010 85072038 2 14 2 100 0 56 50 50 0 0 1 CA
chr10 85072037 85072077 4 10 4 84 10 55 25 22 0 52 1.47 ATCT
chr10 85084308 85084338 6 5 6 91 0 51 83 13 3 0 0.77 CAAAAA
chr10 85096597 85096640 3 14.7 3 95 4 79 69 30 0 0 0.88 AAC
chr10 85151154 85151190 6 6.5 6 87 12 51 0 11 0 88 0.5 TTTCTT
chr10 85168255 85168320 4 16.2 4 100 0 130 50 0 49 0 1 AGGA
chr10 85173155 85173184 2 14.5 2 100 0 58 48 0 0 51 1 TA
chr10 85196836 85196861 2 12.5 2 100 0 50 52 48 0 0 1 AC
chr10 85215511 85215546 2 17.5 2 100 0 70 51 48 0 0 1 AC
chr10 85225048 85225075 2 13.5 2 100 0 54 51 48 0 0 1 AC
chr10 85242322 85242357 2 17.5 2 93 0 61 0 2 48 48 1.16 TG
chr10 85245934 85245981 4 11 4 79 20 51 27 2 0 70 0.99 ATTT
chr10 85249139 85249230 5 18.8 5 88 6 116 0 60 0 39 0.97 TTCCC
chr10 85251100 85251153 5 11 5 97 2 92 0 0 37 62 0.96 GTTTG
chr10 85268725 85268752 4 6.8 4 100 0 54 0 25 0 74 0.83 CTTT
chr10 85268767 85268798 4 7.8 4 100 0 62 0 0 22 77 0.77 TTTG
chr10 85269189 85269239 6 8.8 6 79 16 54 84 2 12 2 0.8 AAAAGA
chr10 85330217 85330253 2 18 2 100 0 72 0 0 50 50 1 TG
chr10 85332256 85332314 4 15 4 82 7 75 70 1 27 0 0.97 AAGA
chr10 85337969 85337996 2 13.5 2 100 0 54 0 0 48 51 1 TG
chr10 85344795 85344957 2 75.5 2 83 12 198 45 4 3 45 1.42 TA
chr10 85349732 85349765 5 6.8 5 93 6 59 84 15 0 0 0.61 AAAAC
chr10 85353082 85353109 5 5.4 5 100 0 54 0 22 18 59 1.38 CTGTT
I want to extract all rows with have 3 and only 3 characters in the last column. My try till now is this:
grep -E "['ACTG']['ACTG']['ACTG']{1,3}$"
But this gives me everything from 3 and longer than 3. I tried many different combinations but nothing seems to give me what i want. Any ideas?
If you like to try awk, you can do:
awk '$NF~/\<...\>/' file
chr10 85011399 85011478 3 25.7 3 80 12 63 58 1 40 0 1.06 GAA
chr10 85011461 85011525 3 20.7 3 87 6 74 39 0 60 0 0.97 GAG
chr10 85045413 85045440 3 9 3 100 0 54 66 33 0 0 0.92 CAA
chr10 85096597 85096640 3 14.7 3 95 4 79 69 30 0 0 0.88 AAC
It will test if last field $NF has 3 character ...
This regex would also do: awk '$NF~/^...$/'
Or if you need exact characters. (PS this needs awk 4.x, or use of switch --re-interval)
awk '$NF~/^[ACTG]{3}$/' file
Using grep
grep -E " [ACTG]{3}$" file
chr10 85011399 85011478 3 25.7 3 80 12 63 58 1 40 0 1.06 GAA
chr10 85011461 85011525 3 20.7 3 87 6 74 39 0 60 0 0.97 GAG
chr10 85045413 85045440 3 9 3 100 0 54 66 33 0 0 0.92 CAA
chr10 85096597 85096640 3 14.7 3 95 4 79 69 30 0 0 0.88 AAC
You need the space, to separate last column, and {3} to get 3 and only 3 characters.
If you want to print the rows which has exactly three chars in the last column then you could use the below grep command.
grep -E " [ACTG]{3}$"
[ACTG]{3} Matches exactly three characters from the given list.
You have to grep either " ['ACTG']['ACTG']['ACTG']$" or " ['ACTG']{1,3}$".
Currently, you are grepping 3 to 5 'ACTG'.
Also, the quotes are unnecessary ['ACTG'] means "match anything between []" so any of the 5 characters 'ACTG, just grep " [ACTG]{1,3}$".
Be sure to use a delimiter for the left part (space ' ', tab\t if it is tab delimited, word boundary \b or \W).
If your lines are all ending with [ACTG]+, you can even only grep -E "\W.{,3}$"
Another way that you could do this would be using awk:
$ awk '$NF ~ /^[ACTG][ACTG][ACTG]$/' file
chr10 85011399 85011478 3 25.7 3 80 12 63 58 1 40 0 1.06 GAA
chr10 85011461 85011525 3 20.7 3 87 6 74 39 0 60 0 0.97 GAG
chr10 85045413 85045440 3 9 3 100 0 54 66 33 0 0 0.92 CAA
chr10 85096597 85096640 3 14.7 3 95 4 79 69 30 0 0 0.88 AAC
This prints all lines whose last field exactly matches 3 of the characters "A", "C", "T" or "G".
2 hours late but this is one way in awk
This can be easily edited for different lengths and fields.
awk 'length($NF)==3' file
As i was looking for answers myself i found out that Perl regex work more efficiently:
this does the deal : grep -P '\t...$' Way more compact code.
$ cat roi_new.bed | grep -P "\t...$"
chr10 81038152 81038182 3 9.7 3 92 7 51 30 0 0 70 0.88 TTA
chr10 81272294 81272320 3 8.7 3 100 0 52 0 30 69 0 0.89 GGC
chr10 81287690 81287720 3 10 3 100 0 60 66 33 0 0 0.92 CAA