I have two lists:
limits = [0.09090909,0.18181818,0.27272727]
res = [0.001,0.002,0.0923,0.0978,0.19374,0.21838]
limits list specifies the range and what I want to see how many values in res are less than for example the first value in limits and so on. And in a different list I want to store the last index of res whose value is less that 0.09, last index of res whose value is less than 0.1818 and so on.
So the result would be:
track = [1,3,5]
But my code is not doing that. My code so far is:
index = 0 ##this variable keeps track of where we are in limits list.
counter = 0 ## keeps track of indices in res list
for each_position in res:
if each_position <= limits[index]:
counter += 1
else:
track.append(counter)
index += 1
What I get from this code is [2,3] whereas the output should be [1,3,5].
Help would be appreciated.
Modify your loop a bit. Instead of looping through the results first, loop through the limits. Then for each limit check the results. Whenever you encounter a value in res larger than a limit, append the previous index in track list, break out of the inner loop and continue for remaining limits.
for limit in limits:
for index, value in enumerate(res):
if value > limit:
track.append(index - 1)
break
if len(res)-1 == index:
track.append(index)
Edit: my apologies. If you do not find a value in res that is larger than a limit, then store the last index of res.
Related
I'm extremely new to programming in general and have only been learning Python for 1 week.
For a class, I have to analyze a text DNA sequence, something like this:
CTAGATAGATAGATAGATAGATGACTA
for these specific keys: AGAT,AATG,TATC
I have to keep track of the largest number of consecutive repetitions for each, disregarding all but the highest number of repetitions.
I've been pouring over previous stackoverflow answers and I saw groupby() suggested as a way to do this. I'm not exactly sure how to use groupby for my specific implementation needs though.
It seems like I will have to read the text sequence from a file into a list. Can I import what is essentially a text string into a list? Do I have to separate all of the characters by commas? Will groupby work on a string?
It also looks like groupby would give me the highest incident of consecutive repetitions, but in the form of a list. How would I get the highest result from that list out of that list to them be stored somewhere else, without me the programmer having to look at the result? Will groupby return the highest number of consecutive repeats first in the list? Or will it be placed in order of when it occured in the list?
Is there a function I can use to isolate and return the sequence with the highest repetition incidence, so that I can compare that with the dictionary file I've been provided with?
Frankly, I really could use some help breaking down the groupby function in general.
My assignment recommended possibly using a slice to accomplish this, and that seemed somehow more daunting to try, but if that's the way to go, please let me know, and I wouldn't turn down a mudge in the direction on how in the heck to do that.
Thank you in advance for any and all wisdom on this.
Here's a similar solution to the previous post, but may have better readability.
# The DNA Sequence
DNA = "CTAGATAGATAGATAGATAGATGACTAGCTAGATAGATAGATAGATAGATGACTAGAGATAGATAGATCTAG"
# All Sequences of Interest
elements = {"AGAT", "AATG", "TATC"}
# Add Elements to A Dictionary
maxSeq = {}
for element in elements:
maxSeq[element] = 0
# Find Max Sequence for Each Element
for element in elements:
i = 0
curCount = 0
# Ensure DNA Length Not Reached
while i+4 <= len(DNA):
# Sequence Not Being Tracked
if curCount == 0:
# Sequence Found
if DNA[i: i + 4] == element:
curCount = 1
i += 4
# Sequence Not Found
else: i += 1
# Sequence Is Being Tracked
else:
# Sequence Found
if DNA[i: i + 4] == element:
curCount += 1
i += 4
# Sequence Not Found
else:
# Check If Previous Max Was Beat
if curCount > maxSeq[element]:
maxSeq[element] = curCount
# Reset Count
curCount = 0
i += 1
#Check If Sequence Was Being Tracked At End
if curCount > maxSeq[element]: maxSeq[element] = curCount
#Display
print(maxSeq)
Output:
{'AGAT': 5, 'TATC': 0, 'AATG': 0}
This doesn't seem like a groupby problem since you want multiple groups of the same key. It would easier to just scan the list for key counts.
# all keys (keys are four chars each)
seq = "CTAGATAGATAGATAGATAGATGACTAGCTAGATAGATAGATAGATAGATGACTAGAGATAGATAGATCTAG"
# split key string into list of keys: ["CTAG","ATAG","ATAG","ATAG", ....]
lst = [seq[i:i+4] for i in (range(0,len(seq),4))]
lst.append('X') # the while loop only tallies when next key found, so add fake end key
# these are the keys we care about and want to store the max consecutive counts
dicMax = { 'AGAT':0, 'AATG':0, 'TATC':0, 'ATAG':0 } #dictionary of keys and max consecutive key count
# the while loop starts at the 2nd entry, so set variables based on first entry
cnt = 1
key = lst[0] #first key in list
if (key in dicMax): dicMax[key] = 1 #store first key in case it's the max for this key
ctr = 1 # start at second entry in key list (we always compare to previous entry so can't start at 0)
while ctr < len(lst): #all keys in list
if (lst[ctr] != lst[ctr-1]): #if this key is different from previous key in list
if (key in dicMax and cnt > dicMax[key]): #if we care about this key and current count is larger than stored count
dicMax[key] = cnt #store current count as max count for this key
#set variables for next key in list
cnt = 0
key = lst[ctr]
ctr += 1 #list counter
cnt += 1 #counter for current key
print(dicMax) # max consecutive count for each key
Raiyan Chowdhury suggested that the sequences may overlap, so dividing the base sequence into four character strings may not work. In this case, we need to search for each string individually.
Note that this algorithm is not efficient, but readable to a new programmer.
seq = "CTAGATAGATAGATAGATAGATGACTAGCTAGATAGATAGATAGATAGATGACTAGAGATAGATAGATCTAG"
dicMax = { 'AGAT':0, 'AATG':0, 'TATC':0, 'ATAG':0 } #dictionary of keys and max consecutive key count
for key in dicMax: #each key, could divide and conquer here so all keys run at same time
for ctr in range(1,9999): #keep adding key to itself ABC > ABCABC > ABCABCABC
s = key * ctr #create string by repeating key "ABC" * 2 = "ABCABC"
if (s in seq): # if repeated key found in full sequence
dicMax[key]=ctr # set max (repeat) count for this key
else:
break; # exit inner for #done with this key
print(dicMax) #max consecutive key counts
I want to make my algorithm more efficient via deleting the items it already sorted, but i don't know how I can do it efficiently. The only way I found was to rewrite the whole list.
l = [] #Here you put your list
sl = [] # this is to store the list when it is sorted
a = 0 # variable to store which numbers he already looked for
while True: # loop
if len(sl) == len(l): #if their size is matching it will stop
print(sl) # print the sorted list
break
a = a + 1
if a in l: # check if it is in list
sl.append(a) # add to sorted list
#here i want it to be deleted from the list.
The variable a is a little awkward. It starts at 0 and increments 1 by 1 until it matches elements from the list l
Imagine if l = [1000000, 1200000, -34]. Then your algorithm will first run for 1000000 iterations without doing anything, just incrementing a from 0 to 1000000. Then it will append 1000000 to sl. Then it will run again 200000 iterations without doing anything, just incrementing a from 1000000 to 1200000.
And then it will keep incrementing a looking for the number -34, which is below zero...
I understand the idea behind your variable a is to select the elements from l in order, starting from the smallest element. There is a function that does that: it's called min(). Try using that function to select the smallest element from l, and append that element to sl. Then delete this element from l; otherwise, the next call to min() will select the same element again instead of selecting the next smallest element.
Note that min() has a disadvantage: it returns the value of the smallest element, but not its position in the list. So it's not completely obvious how to delete the element from l after you've found it with min(). An alternative is to write your own function that returns both the element, and its position. You can do that with one loop: in the following piece of code, i refers to a position in the list (0 is the position of the first element, 1 the position of the second, etc) and a refers to the value of that element. I left blanks and you have to figure out how to select the position and value of the smallest element in the list.
....
for i, a in enumerate(l):
if ...:
...
...
If you managed to do all this, congratulations! You have implemented "selection sort". It's a well-known sorting algorithm. It is one of the simplest. There exist many other sorting algorithms.
I am trying to create a recursive function to sort the list from low to high.
Following code doesn't work
less = []
greater = []
def quicksort(array):
if len(array)<2:
return array
else:
pivot = array[0]
for i in array[1:]:
if i <= pivot:
less.append(i)
else:
greater.append(i)
return quicksort(less)+[pivot]+quicksort(greater)
print(quicksort([1,3,2,7,8]))
but I use a book code , it works. Would you advise me why?
def quicksort(array):
if len(array)<2:
return array
else:
pivot = array[0]
less = [i for i in array[1:] if i <= pivot]
greater = [i for i in array[1:] if i > pivot]
return quicksort(less)+[pivot]+quicksort(greater)
print(quicksort([1,3,2,7,8]))
You're using global less and greater lists, so you're going to end up building up the lists bigger and bigger and bigger, repeating your inputs many times (roughly proportional to the number of times you recursively call quicksort). less and greater keep growing until you blow the stack depth limit or run out of memory and Python dies to protect you from yourself.
Worse, you preserve state across calls, so the second and subsequent things you quicksort ends up including garbage from the prior sort operations, even if they're on inputs so short you could "sort" them trivially. Your code would work if you made less/greater local, initializing them afresh in each call:
def quicksort(array):
if len(array)<2:
return array
else:
pivot = array[0]
less = [] # Local!
greater = [] # Local!
for i in array[1:]:
if i <= pivot:
less.append(i)
else:
greater.append(i)
return quicksort(less)+[pivot]+quicksort(greater)
I have a list of about 20-30 items [strings].
I'm able to print them out in my program just fine - but I'd like to save some space, and merge items that are shorter...
So basically, if I have 2 consecutive items that the combined lengths are less than 30, I want to join those to items as a single entry in the list - with a / between them
I'm not coming up with a simple way of doing this.
I don't care if I do it in the same list, or make a new list of items... it's all happening inside 1 function...
You need to loop through the list and keep joining items till they satisfy your requirement (size 30). Then add them to a new list when an element grows that big.
l=[] # your new list
buff=yourList[0] if len(yourList)>0 else "" # hold strings till they reach desired length
for i in range(1,len(yourList)):
# check if concatenating will exceed the size or not
t=yourList[i]
if (len(buff) + len(t) + 1) <= 30:
buff+="/"+t
else:
l.append(buff)
buff=t
l.append(buff) # since last element is yet to be inserted
You can extend method of list as follows:
a = [1,2,3]
b = [4,5,6]
a.append('/')
a.extend(b)
You just need to check the size of two list a and b as per your requirements.
I hope I understood your problem !
This code worked for me, you can check to see if that's what you wanted, it's a bit lenghty but it works.
list1 = yourListOfElements
for elem in list1:
try: # Needs try/except otherwise last iteration would throw an indexerror
listaAUX = [] # Auxiliar list to check length and join smaller elements. You can probably eliminate this using list slicing
listaAUX.append(elem)
listaAUX.append(list1[list1.index(elem)+1])
if len(listaAUX[0]) + len(listaAUX[1]) < 30:
concatenated = '/'.join(listaAUX)
print(concatenated)
else:
print(elem)
except IndexError:
print(elem)
I am trying to count the output of a regex search I am conducting on a dataset but for some reason my count is off by a lot. I was wondering what I am doing wrong and how I can get an official count. I should have around 1500 matches but I keep getting an error that says "'int' object is not iterable".
import re
with open ('Question 1 Logfile.txt' , 'r') as h:
results = []
count = []
for line in h.readlines():
m = re.search(r'(((May|Apr)(\s*)\w+\s\w{2}:\w{2}:\w{2}))', line)
t = re.search(r'(((invalid)(\s(user)\s\w+)))',line)
i = re.search(r'(((from)(\s\w+.\w+.\w+.\w+)))', line)
if m and t and i:
count += 1
print(m.group(1),' - ',i.group(4),' , ',t.group(4))
print(count)
You want to increment the number of times you satisfy a condition over a series of loop iterations. The confusion here seems to be how exactly to do that, and what variable to increment.
Here's a small example that captures the difficulty you've encountered, as described in OP and in OP comments. It's meant as a learning example, but it does also provide a couple of options for a solution.
count = []
count_int = 0
for _ in range(2):
try:
count += 1
except TypeError as e:
print("Here's the problem with trying to increment a list with an integer")
print(str(e))
print("We can, however, increment a list with additional lists:")
count += [1]
print("Count list: {}\n".format(count))
print("Most common solution: increment int count by 1 per loop iteration:")
count_int +=1
print("count_int: {}\n\n".format(count_int))
print("It's also possible to check the length of a list you incremented by one element per loop iteration:")
print(len(count))
Output:
"""
Here's the problem with trying to increment a list with an integer:
'int' object is not iterable
We can, however, increment a list with additional lists:
Count list: [1]
Most common is to increment an integer count by 1, for each loop iteration:
count_int: 1
Here's the problem with trying to increment a list with an integer:
'int' object is not iterable
We can, however, increment a list with additional lists:
Count list: [1, 1]
Most common is to increment an integer count by 1, for each loop iteration:
count_int: 2
It's also possible to check the length of a list you incremented
by one element per loop iteration:
2
"""
Hope that helps. Good luck learning Python!