(New to Python after years of R use)
Say I have a string:
dna = "gctacccgtaatacgtttttttttt"
And I want to pre-define the indices of interest:
window_1 = 0:3
window_2 = 1:4
window_3 = 2:5
This is invalid python syntax. So I tried:
window_1 = range(0, 3)
This does not work when I try to use the window_1 variable as a string index:
dna[window_1]
But I get "string indices must be integers" error.
I have tried numerous things such as wrapping range() in int() and / or list() but nothing works.
Thanks
When you're setting your variables window_1, window_2, and window_3, first you have to tell it where you want it to look to grab these indices you're telling it to grab, so you need to tell it to look in your variable 'dna'. Secondly, the indices should be in square brackets. Also keep in mind that Python uses a zero based numbering system. So, the first position in your dna sequence(g) as far as Python is concerned is the zero position. The second position (c) is actually the number 1 position.
dna = "gctacccgtaatacgtttttttttt"
window_1 = dna[0:3]
window_2 = dna[1:4]
window_3 = dna[2:5]
Related
How can I iterate over a string in Python (get each character from the string, one at a time, each time through a loop)?
As Johannes pointed out,
for c in "string":
#do something with c
You can iterate pretty much anything in python using the for loop construct,
for example, open("file.txt") returns a file object (and opens the file), iterating over it iterates over lines in that file
with open(filename) as f:
for line in f:
# do something with line
If that seems like magic, well it kinda is, but the idea behind it is really simple.
There's a simple iterator protocol that can be applied to any kind of object to make the for loop work on it.
Simply implement an iterator that defines a next() method, and implement an __iter__ method on a class to make it iterable. (the __iter__ of course, should return an iterator object, that is, an object that defines next())
See official documentation
If you need access to the index as you iterate through the string, use enumerate():
>>> for i, c in enumerate('test'):
... print i, c
...
0 t
1 e
2 s
3 t
Even easier:
for c in "test":
print c
Just to make a more comprehensive answer, the C way of iterating over a string can apply in Python, if you really wanna force a square peg into a round hole.
i = 0
while i < len(str):
print str[i]
i += 1
But then again, why do that when strings are inherently iterable?
for i in str:
print i
Well you can also do something interesting like this and do your job by using for loop
#suppose you have variable name
name = "Mr.Suryaa"
for index in range ( len ( name ) ):
print ( name[index] ) #just like c and c++
Answer is
M r . S u r y a a
However since range() create a list of the values which is sequence thus you can directly use the name
for e in name:
print(e)
This also produces the same result and also looks better and works with any sequence like list, tuple, and dictionary.
We have used tow Built in Functions ( BIFs in Python Community )
1) range() - range() BIF is used to create indexes
Example
for i in range ( 5 ) :
can produce 0 , 1 , 2 , 3 , 4
2) len() - len() BIF is used to find out the length of given string
If you would like to use a more functional approach to iterating over a string (perhaps to transform it somehow), you can split the string into characters, apply a function to each one, then join the resulting list of characters back into a string.
A string is inherently a list of characters, hence 'map' will iterate over the string - as second argument - applying the function - the first argument - to each one.
For example, here I use a simple lambda approach since all I want to do is a trivial modification to the character: here, to increment each character value:
>>> ''.join(map(lambda x: chr(ord(x)+1), "HAL"))
'IBM'
or more generally:
>>> ''.join(map(my_function, my_string))
where my_function takes a char value and returns a char value.
Several answers here use range. xrange is generally better as it returns a generator, rather than a fully-instantiated list. Where memory and or iterables of widely-varying lengths can be an issue, xrange is superior.
You can also do the following:
txt = "Hello World!"
print (*txt, sep='\n')
This does not use loops but internally print statement takes care of it.
* unpacks the string into a list and sends it to the print statement
sep='\n' will ensure that the next char is printed on a new line
The output will be:
H
e
l
l
o
W
o
r
l
d
!
If you do need a loop statement, then as others have mentioned, you can use a for loop like this:
for x in txt: print (x)
If you ever run in a situation where you need to get the next char of the word using __next__(), remember to create a string_iterator and iterate over it and not the original string (it does not have the __next__() method)
In this example, when I find a char = [ I keep looking into the next word while I don't find ], so I need to use __next__
here a for loop over the string wouldn't help
myString = "'string' 4 '['RP0', 'LC0']' '[3, 4]' '[3, '4']'"
processedInput = ""
word_iterator = myString.__iter__()
for idx, char in enumerate(word_iterator):
if char == "'":
continue
processedInput+=char
if char == '[':
next_char=word_iterator.__next__()
while(next_char != "]"):
processedInput+=next_char
next_char=word_iterator.__next__()
else:
processedInput+=next_char
I'm trying to implement in a different way what I can already do implementing some custom matlab functions. Let us suppose to have this string 'AAAAAAAAAAAaaaaaaaaaaaTTTTTTTTTTTTTTTTsssssssssssTTTTTTTTTT' I know to remove each lowercase sub strings with
regexprep(String, '[a-z]*', '')
But since I want to understand how to take indexes of these substrings and using them to check them and remove them maybe with a for loop I'm investigating about how to do it.
Regexp give the indexes :
[Start,End] = regexp(Seq,'[a-z]{1,}');
but i'm not succeeding in figuring out how to use them to check these sequences and eliminate them.
With the indexing approach you get several start and end indices (two in your example), so you need a loop to remove the corresponding sections from the string. You should remove them from last to first, otherwise indices that haven't been used yet will become invalid as you remove sections:
x = 'AAAAAAAAAAAaaaaaaaaaaaTTTTTTTTTTTTTTTTsssssssssssTTTTTTTTTT'; % input
y = x; % initiallize result
[Start, End] = regexp(x, '[a-z]{1,}');
for k = numel(Start):-1:1 % note: from last to first
y(Start(k):End(k)) = []; % remove section
end
I'm extremely new to python and I have no idea why this code gives me this output. I tried searching around for an answer but couldn't find anything because I'm not sure what to search for.
An explain-like-I'm-5 explanation would be greatly appreciated
astring = "hello world"
print(astring[3:7:2])
This gives me : "l"
Also
astring = "hello world"
print(astring[3:7:3])
gives me : "lw"
I can't wrap my head around why.
This is string slicing in python.
Slicing is similar to regular string indexing, but it can return a just a section of a string.
Using two parameters in a slice, such as [a:b] will return a string of characters, starting at index a up to, but not including, index b.
For example:
"abcdefg"[2:6] would return "cdef"
Using three parameters performs a similar function, but the slice will only return the character after a chosen gap. For example [2:6:2] will return every second character beginning at index 2, up to index 5.
ie "abcdefg"[2:6:2] will return ce, as it only counts every second character.
In your case, astring[3:7:3], the slice begins at index 3 (the second l) and moves forward the specified 3 characters (the third parameter) to w. It then stops at index 7, returning lw.
In fact when using only two parameters, the third defaults to 1, so astring[2:5] is the same as astring[2:5:1].
Python Central has some more detailed explanations of cutting and slicing strings in python.
I have a feeling you are over complicating this slightly.
Since the string astring is set statically you could more easily do the following:
# Sets the characters for the letters in the consistency of the word
letter-one = "h"
letter-two = "e"
letter-three = "l"
letter-four = "l"
letter-six = "o"
letter-7 = " "
letter-8 = "w"
letter-9 = "o"
letter-10 = "r"
letter11 = "l"
lettertwelve = "d"
# Tells the python which of the character letters that you want to have on the print screen
print(letter-three + letter-7 + letter-three)
This way its much more easily readable to human users and it should mitigate your error.
My variable x has a value of "000032403" and I want to remove the first set of zeros but I want to keep the other! How I gonna do that?
Note: Please give me any suggestions without knowing the amount of zeros in the beginning, because in my program this value is obtained from the user.
You can use the lstrip() function of the string class like this
>>> x = "000032403"
>>> x.lstrip("0")
"32403"
This will "return a copy of the string with leading characters removed".
Here's a link to the docs
I want my program to ask the user to input a 3D point, and it is supposed to keep prompting the user until the user inputs the point (0,0,0). The problem I am having with this loop is being caused by the statement "point = [int(y) for y in input().split()]". Whenever the loop reaches this statement, it quits. I have tried placing this statement in different places, but it does the same thing no matter where I put it. If I take the statement out, the loop works. I need to change the coordinates inputted by the user to integers, so I cannot leave the statement out. Is there something else I can do to change the coordinates to integers that won't affect the loop?
point = ""
pointList = [[]] #pointList will be a list that contains lists
while True:
if point == "0,0,0":
break
else:
point = input("Enter a point in 3D space:")
point = [int(y) for y in input().split()]
pointList.append(point)
print(pointList)
From the docs:
If sep is not specified or is None, a different splitting algorithm is applied: runs of consecutive whitespace are regarded as a single separator, and the result will contain no empty strings at the start or end if the string has leading or trailing whitespace.
In short, it splits on whitespace, which doesn't include commas. What you're looking for is str.split(',').
I suggest to make it more robust with respect to the user input. While regular expressions should not be overused, I believe it is a good fit for this situation -- you can define the regular expression for all possible allowed separators, and then you can use the split method of the regular expression. It is also more usual to represent the point as a tuple. The loop can directly contain the condition. Also, the condition can be a bit different than giving it a point with zeros. (Not shown in the example.) Try the following code:
#!python3
import re
# The separator.
rexsep = re.compile(r'\s*,?\s*') # can be extended if needed
points = [] # the list of points
point = None # init
while point != (0, 0, 0):
s = input('Enter a point in 3D space: ')
try:
# The regular expression is used for splitting thus allowing
# more complex separators like spaces, commas, commas and spaces,
# whatever - you never know your user ;)
x, y, z, *rest = [int(e) for e in rexsep.split(s)]
point = (x, y, z)
points.append(point)
except:
print('Some error.')
print(points)