I'm trying to create a range between two variables. The variables contain string and number characters.
For example P9160-P9163 or P360-P369.
The P is not constant and could be any character(s)/multiple, but i'm trying to generate a list that would contain all values in between.
i tried with looking at ASCII characters but didn't work for me.
Any thoughts?
x = 'P9160'
y = 'P9163'
x = re.match(r"([a-z]+)([0-9]+)", x, re.I)
y = re.match(r"([a-z]+)([0-9]+)", y, re.I)
for i in range(int(x.groups()[1]), int(y.groups()[1])+1):
print("{}{}".format(x.groups()[0], i))
Using a reusable regex pattern, and a generator expression, does certainly improves the code performance.
import re
x = 'P9160'
y = 'P9173'
# resuable regex pattern
regex = re.compile(r"([a-zA-Z]+)(\d+)")
x, y = regex.match(x), regex.match(y)
# generator expression
xy = (x.groups()[0]+str(i) for i in range(int(x.groups()[1]), int(y.groups()[1])+1))
# list of all values from the generator
print(list(xy))
Related
I want to define an array of variables x, where some will be integer variables and some real (continuous) variables. For instance, I have three sets:
model = pyo.AbstractModel()
model.N = pyo.Set()
model.NL = pyo.Set()
model.NN = pyo.Set()
NL and NN are mutually exclusive sets whose union is N.
I would like to define the following variables:
model.x = pyo.Var(model.N, within = pyo.Integers) # if x in NL
model.x = pyo.Var(model.N, within = pyo.Reals) # if x in NN
I can of course rename xL and xN, but is it possible to have a single variable set x with subset dependent domains?
Thank you very much.
Yes. There are several ways to accomplish this:
The domain (or within) argument can take a rule:
def x_domain(m, i):
if i in m.NL:
return pyo.Integers
else:
return pyo.Reals
model.x = pyo.Var(model.N, within=x_domain)
You can set the Var to one domain and then update the domain after the fact:
model.x = pyo.Var(model.N, within=Reals)
for i in model.NL:
model.x[i].domain = Integers
From this
import sympy as sp
x,y,z = sp.symbols("x y z")
sp.Ep(x,y/z)
To this
#varibles = array
#equation = ????
def solver(variables,equation):
#Looping through variables array and converting variables to sympy objects
for var in variables:
var = sp.symbols(var)
#Generate sympy Equation
equation = sp.Ep(equation)
variables = [x,y,z]
equation = x,y/z #invalid code
solver(variables,equation)
I'm creating a function that is able to take in an equation with x amount of variables and x-1 number of values then solve for the missing variable symbolically then return a numerical answer using the values provided.
I only included a small portion of code where I'm having trouble understanding how to pass through an equation. Any solutions or pointers would be greatly appericated. Thanks.
There are several layers of potential confusion here concerning Python variables and SymPy objects (Symbols) used for variables.
Here is an example of what you are saying:
# 3 variables
syms = x, y, z = var('x:z')
# 2 values
vals = {x:1, y:2}
# an equations
eq = Eq(x, y/z)
# solve for the missing value symbolically
missing = set(syms) - set(vals) # == {z}
solve(eq, missing)
[y/x]
# solve for the missing value after substituting in the known values
solve(eq.subs(vals))
[2]
You could make a solver to accept an equation and then specified values and figure out the missing one and return that value by doing something like this:
>>> def solver(eq, **vals):
... from sympy.core.containers import Dict
... from sympy.solvers.solvers import solve
... free = eq.free_symbols
... vals = Dict(vals)
... x = free - set(vals)
... if len(x) != 1:
... raise ValueError('specify all but one of the values for %s' % free)
... x = x.pop()
... return solve(eq.subs(vals), x, dict=True)
...
>>> solver(eq, x=1, z=2)
[{y: 2}]
Does that give you some ideas of how to continue?
I'm trying to change characters from x into upper or lower character depending whether they are in r or c. And the problem is that i can't get all the changed characters into one string.
import unittest
def fun_exercise_6(x):
y = []
r = 'abcdefghijkl'
c = 'mnopqrstuvwxz'
for i in range(len(x)):
if(x[i] in r):
y += x[i].lower()
elif(x[i] in c):
y += x[i].upper()
return y
class TestAssignment1(unittest.TestCase):
def test1_exercise_6(self):
self.assertTrue(fun_exercise_6("osso") == "OSSO")
def test2_exercise_6(self):
self.assertTrue(fun_exercise_6("goat") == "gOaT")
def test3_exercise_6(self):
self.assertTrue(fun_exercise_6("bag") == "bag")
def test4_exercise_6(self):
self.assertTrue(fun_exercise_6("boat") == "bOaT" )
if __name__ == '__main__':
unittest.main()
Using a list as you are using is probably the best approach while you are figuring out whether or not each character should be uppered or lowered. You can join your list using str's join method. In your case, you could have your return statement look like this:
return ''.join(y)
What this would do is join a collection of strings (your individual characters into one new string using the string you join on ('').
For example, ''.join(['a', 'b', 'c']) will turn into 'abc'
This is a much better solution than making y a string as strings are immutable data types. If you make y a string when you are constructing it, you would have to redefine and reallocate the ENTIRE string each time you appended a character. Using a list, as you are doing, and joining it at the end would allow you to accumulate the characters and then join them all at once, which is comparatively very efficient.
If you define y as an empty string y = "" instead of an empty list you will get y as one string. Since when you declare y = [] and add an item to the list, you add a string to a list of string not a character to a string.
You can't compare a list and a string.
"abc" == ["a", "b", "c'] # False
The initial value of y in the fun_exercise_6 function must be ""
I'm trying to extracts these sequences into separate lists or arrays in Python from a file.
My data looks like:
>gene_FST
AGTGGGTAATG--TGATG...GAAATTTG
>gene_FPY
AGT-GG..ATGAAT---AAATGAAAT--G
I would like to have
seq1 = [AGTGGGTAATG--TGATG...GAAATTTG]
seq2 = [AGT-GG..ATGAAT---AAATGAAAT--G]
My plan is to later compare the contents of the list
I would appreciate any advise
So far, here's what I have done, that
f = open (r"C:\Users\Olukayode\Desktop\my_file.txt", 'r') #first r - before the normal string it converts normal string to raw string
def parse_fasta(lines):
seq = []
seq1 = []
seq2 = []
head = []
data = ''
for line in lines:
if line.startswith('>'):
if data:
seq.append(data)
data = ''
head.append(line[1:])
else:
data+= line.rstrip()
seq.append(data)
return seq
h = parse_fasta(f)
print(h)
print(h[0])
print(h[1])
gives:
['AGTGGGTAATG--TGATG...GAAATTTG', 'AGT-GG..ATGAAT---AAATGAAAT--G']
AGTGGGTAATG--TGATG...GAAATTTG
AGT-GG..ATGAAT---AAATGAAAT--G
I think I just figured it out, I can pass each string the list containing both sequences into a separate list, if possible though
If you want to get the exact results you were looking for in your original question, i.e.
seq1 = [AGTGGGTAATG--TGATG...GAAATTTG]
seq2 = [AGT-GG..ATGAAT---AAATGAAAT--G]
you can do it in a variety of ways. Instead of changing anything you already have though, you can just convert your data into a dictionary and print the dictionary items.
your code block...
h = parse_fasta(f)
sDict = {}
for i in range(len(h)):
sDict["seq"+str(i+1)] = [h[i]]
for seq, data in sDict.items():
print(seq, "=", data)
I'd like to record the location of differences from both strings in a list (to remove them) ... preferably recording the highest separation point for each section, as these areas will have dynamic content.
Compare these
total chars 178. Two unique sections
t1 = 'WhereTisthetotalnumberofght5y5wsjhhhhjhkmhm Thethreemethodsthatreturntheratioofmatchingtototalcharacterscangivedifferentresultsduetodifferinglevelsofapxxxxxxxproximation,although'
and
total chars 211. Two unique sections
t2 = 'WhereTisthetotalnumberofdofodfgjnjndfgu><rgregw><sssssuguyguiygis>gggs<GS,Gs Thethreemethodsthatreturntheratioofmatchingtototalcharacterscangivedifferentrexxxxxxxsultsduetodifferinglevelsofapproximation,although'
I know difflib can do this but the output is bad.
I'd like to store (in a list) the char positions, perferably the larger seperation values.
pattern location
t1 = 'WhereTisthetotalnumberof 24 ght5y5wsjhhhhjhkmhm 43 Thethreemethodsthatreturntheratioofmatchingtototalcharacterscangivedifferentresultsduetodifferinglevelsofap 151 xxxxxxx 158 proximation,although'
t2 = 'WhereTisthetotalnumberof 24 dofodfgjnjndfgu><rgregw><sssssuguyguiygis>gggs<GS,Gs 76 Thethreemethodsthatreturntheratioofmatchingtototalcharacterscangivedifferentre 155 xxxxxxx 162 sultsduetodifferinglevelsofapproximation,although'
output:
output list = [24, 76, 151, 162]
Update
Response post #Olivier
position of all Y's seperated by ***
t1
WhereTisthetotalnumberofght5***y***5wsjhhhhjhkmhm Thethreemethodsthatreturntheratioofmatchingtototalcharacterscangivedifferentresultsduetodifferinglevelsofapxxxxxxxproximation,although
t2 WhereTisthetotalnumberofdofodfgjnjndfgu><rgregw><sssssugu***y***gui***y***gis>gggs<GS,Gs Thethreemethodsthatreturntheratioofmatchingtototalcharacterscangivedifferentrexxxxxxxsultsduetodifferinglevelsofapproximation,although
output after matcher.get_matching_blocks()
and string = ''.join([t1[a:a+n] for a, _, n in blocks])
WhereTisthetotalnumberof***y*** Thethreemethodsthatreturntheratioofmatchingtototalcharacterscangivedifferentresultsduetodifferinglevelsofapproximation,although
Using difflib is probably your best bet as you are unlikely to come up with a more efficient solution than the algorithms it provides. What you want is to use SequenceMatcher.get_matching_blocks. Here is what it will output according to the doc.
Return list of triples describing matching subsequences. Each triple
is of the form (i, j, n), and means that a[i:i+n] == b[j:j+n]. The
triples are monotonically increasing in i and j.
Here is a way you could use this to reconstruct a string from which you removed the delta.
from difflib import SequenceMatcher
x = "abc_def"
y = "abc--ef"
matcher = SequenceMatcher(None, x, y)
blocks = matcher.get_matching_blocks()
# blocks: [Match(a=0, b=0, size=4), Match(a=5, b=5, size=2), Match(a=7, b=7, size=0)]
string = ''.join([x[a:a+n] for a, _, n in blocks])
# string: "abcef"
Edit: It was also pointed out that in a case where you had two strings like such.
t1 = 'WordWordaayaaWordWord'
t2 = 'WordWordbbbybWordWord'
Then the above code would return 'WordWordyWordWord. This is because get_matching_blocks will catch that 'y' that is present in both strings between the expected blocks. A solution around this is to filter the returned blocks by length.
string = ''.join([x[a:a+n] for a, _, n in blocks if n > 1])
If you want more complex analysis of the returned blocks you could also do the following.
def block_filter(substring):
"""Outputs True if the substring is to be merged, False otherwise"""
...
string = ''.join([x[a:a+n] for a, _, n in blocks if block_filter(x[a:a+n])])