Separating the coordinates for each cluster in DBSCAN using python

Separating the coordinates for each cluster in DBSCAN using python - python-3.x

Below scripts gives me the coordinates of each cluster in separate txt files. But i want to edit the content of the file as below
usually the coordinates will get printed as follows
0.64 0.30 0.29
0.27 0.24 0.92
0.34 0.62 0.92
0.05 0.48 0.60
0.26 0.77 0.62
0.15 0.23 0.14
0.35 0.26 0.64
But i need it to get printed as Below with all these integers, letters and words for each line.
HETATM 1 O HOH 1 W 0.64 0.30 0.29 1.00 43.38
HETATM 2 O HOH 2 W 0.27 0.24 0.92 1.00 43.38
HETATM 3 O HOH 3 W 0.34 0.62 0.92 1.00 43.38
HETATM 4 O HOH 4 W 0.05 0.48 0.60 1.00 43.38
HETATM 5 O HOH 5 W 0.15 0.23 0.14 1.00 43.38
HETATM 6 O HOH 6 W 0.15 0.23 0.14 1.00 43.38
HETATM 7 O HOH 7 W 0.15 0.23 0.14 1.00 43.38
HETATM 8 O HOH 8 W 0.15 0.23 0.14 1.00 43.38
HETATM 9 O HOH 9 W 0.15 0.23 0.14 1.00 43.38
HETATM 10 O HOH 10 W 0.15 0.23 0.14 1.00 43.38
This is like the format of pdb files (.pdb) for proteins
Does anybody knows how to do this?
Below is my script
from sklearn.cluster import DBSCAN
import numpy as np
data = np.random.rand(500,3)
db = DBSCAN(eps=0.12, min_samples=1).fit(data)
labels = db.labels_
from collections import Counter
Counter(labels)
from collections import defaultdict
clusters = defaultdict(list)
for i,c in enumerate(db.labels_):
clusters[c].append(data[i])
for k,v in clusters.items():
np.savetxt('cluster{}.txt'.format(k), v, delimiter=",", fmt="%1.2f %1.2f %1.2f")

You can modify the two for loops this way:
for i,c in enumerate(db.labels_):
l = np.concatenate([['HETATM {}'.format(i), 'O HOH {} W'.format(i)],data[i],[1.00, 43.38]], axis=0)
clusters[c].append(l)
for k,v in clusters.items():
np.savetxt('cluster{}.txt'.format(k), v, delimiter=",", fmt='%s')
and you get the number of the sample in your dataset, for example:
HETATM 2,O HOH 2 W,0.27035681984544035,0.25141288216432167,0.44097961252275675,1.0,43.38
HETATM 21,O HOH 21 W,0.2905981520836243,0.2680383230921106,0.47545544921372906,1.0,43.38

Related

Adding two columns based on the match of column values of dataframe1 and colnames of dataframe2

I have two tibbles in R like this ones:
portfolio
MAR PLC KIN AMN
1 Fin It Sov 567
2 Cdi Fr Mnc 782
3 Hlt De Pse 312
4 Uti It Sov 234
...
and cases
It Fr De Fin Cdi Hlt Uti
1 0.11 0.21 0.56 0.43 0.89 0.26 0.77
2 0.92 0.03 0.44 0.52 0.78 0.24 0.86
3 0.14 0.42 0.83 0.03 0.22 0.75 0.65
4 0.83 0.31 0.06 0.42 0.89 0.07 0.48
5 0.12 0.29 0.51 0.95 0.38 0.81 0.76
...
I would like to add two columns to the first tibble conditional on the combination of portfolio$MAR and portfolio$PLC, returning in the two additional columns the values of the matched MAR and PLC in the second tibble. Something like this:
df_result
MAR PLC KIN AMN cases(MAR) cases(PLC)
1 Fin It Sov 567 0.43 0.11
2 Fin It Sov 567 0.52 0.92
3 Fin It Sov 567 0.03 0.14
4 Fin It Sov 567 0.42 0.83
5 Fin It Sov 567 0.95 0.12
6 Cdi Fr Mnc 782 0.89 0.21
7 Cdi Fr Mnc 782 0.78 0.03
8 Cdi Fr Mnc 782 0.22 0.42
9 Cdi Fr Mnc 782 0.89 0.31
10 Cdi Fr Mnc 782 0.38 0.29
11 Hlt De Pse 312 0.26 0.56
...
12 Uti It Sov 234 0.76 0.12
I tried with left_join but I really don't think is the right way to proceed.

How to log a table of metrics into mlflow

I am trying to see if mlflow is the right place to store my metrics in the model tracking. According to the doc log_metric takes either a key value or a dict of key-values. I am wondering how to log something like below into mlflow so it can be visualized meaningfully.
precision recall f1-score support
class1 0.89 0.98 0.93 174
class2 0.96 0.90 0.93 30
class3 0.96 0.90 0.93 30
class4 1.00 1.00 1.00 7
class5 0.93 1.00 0.96 13
class6 1.00 0.73 0.85 15
class7 0.95 0.97 0.96 39
class8 0.80 0.67 0.73 6
class9 0.97 0.86 0.91 37
class10 0.95 0.81 0.88 26
class11 0.50 1.00 0.67 5
class12 0.93 0.89 0.91 28
class13 0.73 0.84 0.78 19
class14 1.00 1.00 1.00 6
class15 0.45 0.83 0.59 6
class16 0.97 0.98 0.97 245
class17 0.93 0.86 0.89 206
accuracy 0.92 892
macro avg 0.88 0.90 0.88 892
weighted avg 0.93 0.92 0.92 892

get the value from another values if value is nan [duplicate]

I am trying to create a column which contains only the minimum of the one row and a few columns, for example:
A0 A1 A2 B0 B1 B2 C0 C1
0 0.84 0.47 0.55 0.46 0.76 0.42 0.24 0.75
1 0.43 0.47 0.93 0.39 0.58 0.83 0.35 0.39
2 0.12 0.17 0.35 0.00 0.19 0.22 0.93 0.73
3 0.95 0.56 0.84 0.74 0.52 0.51 0.28 0.03
4 0.73 0.19 0.88 0.51 0.73 0.69 0.74 0.61
5 0.18 0.46 0.62 0.84 0.68 0.17 0.02 0.53
6 0.38 0.55 0.80 0.87 0.01 0.88 0.56 0.72
Here I am trying to create a column which contains the minimum for each row of columns B0, B1, B2.
The output would look like this:
A0 A1 A2 B0 B1 B2 C0 C1 Minimum
0 0.84 0.47 0.55 0.46 0.76 0.42 0.24 0.75 0.42
1 0.43 0.47 0.93 0.39 0.58 0.83 0.35 0.39 0.39
2 0.12 0.17 0.35 0.00 0.19 0.22 0.93 0.73 0.00
3 0.95 0.56 0.84 0.74 0.52 0.51 0.28 0.03 0.51
4 0.73 0.19 0.88 0.51 0.73 0.69 0.74 0.61 0.51
5 0.18 0.46 0.62 0.84 0.68 0.17 0.02 0.53 0.17
6 0.38 0.55 0.80 0.87 0.01 0.88 0.56 0.72 0.01
Here is part of the code, but it is not doing what I want it to do:
for i in range(0,2):
df['Minimum'] = df.loc[0,'B'+str(i)].min()

This is a one-liner, you just need to use the axis argument for min to tell it to work across the columns rather than down:
df['Minimum'] = df.loc[:, ['B0', 'B1', 'B2']].min(axis=1)
If you need to use this solution for different numbers of columns, you can use a for loop or list comprehension to construct the list of columns:
n_columns = 2
cols_to_use = ['B' + str(i) for i in range(n_columns)]
df['Minimum'] = df.loc[:, cols_to_use].min(axis=1)

For my tasks a universal and flexible approach is the following example:
df['Minimum'] = df[['B0', 'B1', 'B2']].apply(lambda x: min(x[0],x[1],x[2]), axis=1)
The target column 'Minimum' is assigned the result of the lambda function based on the selected DF columns['B0', 'B1', 'B2']. Access elements in a function through the function alias and his new Index(if count of elements is more then one). Be sure to specify axis=1, which indicates line-by-line calculations.
This is very convenient when you need to make complex calculations.
However, I assume that such a solution may be inferior in speed.
As for the selection of columns, in addition to the 'for' method, I can suggest using a filter like this:
calls_to_use = list(filter(lambda f:'B' in f, df.columns))
literally, a filter is applied to the list of DF columns through a lambda function that checks for the occurrence of the letter 'B'.
after that the first example can be written as follows:
calls_to_use = list(filter(lambda f:'B' in f, df.columns))
df['Minimum'] = df[calls_to_use].apply(lambda x: min(x), axis=1)
although after pre-selecting the columns, it would be preferable:
df['Minimum'] = df[calls_to_use].min(axis=1)

SED change last columnt text

I would like to ask how to change in last column the letter A to C using sed.
Input for example:
HETATM 18 H UNK 0 12.447 20.851 23.373 0.00 0.00 0.167 HD
HETATM 19 C UNK 0 11.406 19.947 21.942 0.00 0.00 0.033 A
HETATM 20 C UNK 0 10.684 20.899 21.181 0.00 0.00 0.030 A
HETATM 21 C UNK 0 9.503 20.541 20.507 0.00 0.00 0.019 A
HETATM 22 C UNK 0 9.032 19.211 20.545 0.00 0.00 0.032 A
HETATM 23 C UNK 0 9.772 18.248 21.264 0.00 0.00 0.019 A
HETATM 24 C UNK 0 10.946 18.613 21.948 0.00 0.00 0.030 A
HETATM 25 C UNK 0 7.833 18.846 19.889 0.00 0.00 0.253 C
HETATM 26 O UNK 0 7.856 18.994 18.642 0.00 0.00 -0.267 OA
Output:
HETATM 18 H UNK 0 12.447 20.851 23.373 0.00 0.00 0.167 HD
HETATM 19 C UNK 0 11.406 19.947 21.942 0.00 0.00 0.033 C
HETATM 20 C UNK 0 10.684 20.899 21.181 0.00 0.00 0.030 C
HETATM 21 C UNK 0 9.503 20.541 20.507 0.00 0.00 0.019 C
HETATM 22 C UNK 0 9.032 19.211 20.545 0.00 0.00 0.032 C
HETATM 23 C UNK 0 9.772 18.248 21.264 0.00 0.00 0.019 C
HETATM 24 C UNK 0 10.946 18.613 21.948 0.00 0.00 0.030 C
HETATM 25 C UNK 0 7.833 18.846 19.889 0.00 0.00 0.253 C
HETATM 26 O UNK 0 7.856 18.994 18.642 0.00 0.00 -0.267 OA
I tried sed like this:
sed 's/[A*]$/C/'
But the output looks like this:
HETATM 26 O UNK 0 7.856 18.994 18.642 0.00 0.00 -0.267 OC

Simple sed approach:
sed 's/\<A[[:space:]]*$/C/' file
\< - word boundary (assuming A char occurs only as standalone char)
[[:space:]]* - match possible whitespace(s) at the end of the string $
The output:
HETATM 18 H UNK 0 12.447 20.851 23.373 0.00 0.00 0.167 HD
HETATM 19 C UNK 0 11.406 19.947 21.942 0.00 0.00 0.033 C
HETATM 20 C UNK 0 10.684 20.899 21.181 0.00 0.00 0.030 C
HETATM 21 C UNK 0 9.503 20.541 20.507 0.00 0.00 0.019 C
HETATM 22 C UNK 0 9.032 19.211 20.545 0.00 0.00 0.032 C
HETATM 23 C UNK 0 9.772 18.248 21.264 0.00 0.00 0.019 C
HETATM 24 C UNK 0 10.946 18.613 21.948 0.00 0.00 0.030 C
HETATM 25 C UNK 0 7.833 18.846 19.889 0.00 0.00 0.253 C
HETATM 26 O UNK 0 7.856 18.994 18.642 0.00 0.00 -0.267 OA

VIM replacing text in 2 columns

So below is a part of one column-sensitive file from lines 23 to 34. Please look at columns 25 and 26. Lines 23 to 28 are correct as it's supposed to be sequential.
HETATM 21 O HOH 7 -1.609 5.551 -4.296 1.00 0.00 WAT O
HETATM 22 H HOH 7 -1.594 5.971 -3.395 1.00 0.00 WAT H
HETATM 23 H HOH 7 -1.048 4.730 -4.281 1.00 0.00 WAT H
HETATM 24 O HOH 8 -4.693 5.472 -0.557 1.00 0.00 WAT O
HETATM 25 H HOH 8 -3.881 4.900 -0.521 1.00 0.00 WAT H
HETATM 26 H HOH 8 -4.819 5.805 -1.485 1.00 0.00 WAT H
HETATM 27 O HOH 1 0.289 -5.035 5.663 1.00 0.00 WAT O
HETATM 28 H HOH 10 0.241 -4.604 -5.564 1.00 0.00 WAT H
HETATM 29 H HOH 1 -0.399 -5.750 5.605 1.00 0.00 WAT H
HETATM 30 O HOH 11 -1.741 -5.167 0.877 1.00 0.00 WAT O
HETATM 31 H HOH 0 -2.612 -4.754 0.636 1.00 0.00 WAT H
HETATM 32 H HOH 0 -1.819 -5.599 1.769 1.00 0.00 WAT H
However, columns 25 and 26 in lines 29 to 34 (and also lines beyond 34 that are not included here) need to be edited. They represent the ID number of water molecules in the file. So, columns 25 and 26 in lines 29-31 is supposed to be ' 9' instead of ' 1' or '10', and columns 25 and 26 in lines 32-34 are supposed to be '10' instead of '11' or ' 0'. And all lines after 34 suffers from the similar problem and I also want to change the contents in columns 25 and 26 to '12','13',etc. for each group of 3 lines. So the final result is expected to be like this.
HETATM 21 O HOH 7 -1.609 5.551 -4.296 1.00 0.00 WAT O
HETATM 22 H HOH 7 -1.594 5.971 -3.395 1.00 0.00 WAT H
HETATM 23 H HOH 7 -1.048 4.730 -4.281 1.00 0.00 WAT H
HETATM 24 O HOH 8 -4.693 5.472 -0.557 1.00 0.00 WAT O
HETATM 25 H HOH 8 -3.881 4.900 -0.521 1.00 0.00 WAT H
HETATM 26 H HOH 8 -4.819 5.805 -1.485 1.00 0.00 WAT H
HETATM 27 O HOH 9 0.289 -5.035 5.663 1.00 0.00 WAT O
HETATM 28 H HOH 9 0.241 -4.604 -5.564 1.00 0.00 WAT H
HETATM 29 H HOH 9 -0.399 -5.750 5.605 1.00 0.00 WAT H
HETATM 30 O HOH 10 -1.741 -5.167 0.877 1.00 0.00 WAT O
HETATM 31 H HOH 10 -2.612 -4.754 0.636 1.00 0.00 WAT H
HETATM 32 H HOH 10 -1.819 -5.599 1.769 1.00 0.00 WAT H
So far I couldn't really come up with a nice pattern to replace those funky numbers to 9,10,etc. It would be great if I could replace all these groups of 3 lines in a single vim command instead of having to do it group by group, as there are 50-60 groups of these with this problem. What I did earlier was just simply :26,28s/HOH 1/HOH 8 and this is clearly not the most efficient way.
Sorry for not being clear at the first attempt of the question, but your help would be appreciated. Thank you

Your question is not clear, but from what I understand, trying to select a rectangular block in visual mode might help you. Use ctrl-v in OS X or Linux or ctrl-q in Windows (in normal mode).

Actually I'd like to thank everyone for your time and sorry for causing the confusions. I found a way to do it, with python's string formatting as the pattern is really fuzzy and I'm not so used to the regex patterns so I couldn't figure a simple way to do it on VIM.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Separating the coordinates for each cluster in DBSCAN using python - python-3.x

Related

Adding two columns based on the match of column values of dataframe1 and colnames of dataframe2

How to log a table of metrics into mlflow

get the value from another values if value is nan [duplicate]

SED change last columnt text

VIM replacing text in 2 columns

Categories

Resources