Regex Output Count - python-3.x

I am trying to count the output of a regex search I am conducting on a dataset but for some reason my count is off by a lot. I was wondering what I am doing wrong and how I can get an official count. I should have around 1500 matches but I keep getting an error that says "'int' object is not iterable".
import re
with open ('Question 1 Logfile.txt' , 'r') as h:
results = []
count = []
for line in h.readlines():
m = re.search(r'(((May|Apr)(\s*)\w+\s\w{2}:\w{2}:\w{2}))', line)
t = re.search(r'(((invalid)(\s(user)\s\w+)))',line)
i = re.search(r'(((from)(\s\w+.\w+.\w+.\w+)))', line)
if m and t and i:
count += 1
print(m.group(1),' - ',i.group(4),' , ',t.group(4))
print(count)

You want to increment the number of times you satisfy a condition over a series of loop iterations. The confusion here seems to be how exactly to do that, and what variable to increment.
Here's a small example that captures the difficulty you've encountered, as described in OP and in OP comments. It's meant as a learning example, but it does also provide a couple of options for a solution.
count = []
count_int = 0
for _ in range(2):
try:
count += 1
except TypeError as e:
print("Here's the problem with trying to increment a list with an integer")
print(str(e))
print("We can, however, increment a list with additional lists:")
count += [1]
print("Count list: {}\n".format(count))
print("Most common solution: increment int count by 1 per loop iteration:")
count_int +=1
print("count_int: {}\n\n".format(count_int))
print("It's also possible to check the length of a list you incremented by one element per loop iteration:")
print(len(count))
Output:
"""
Here's the problem with trying to increment a list with an integer:
'int' object is not iterable
We can, however, increment a list with additional lists:
Count list: [1]
Most common is to increment an integer count by 1, for each loop iteration:
count_int: 1
Here's the problem with trying to increment a list with an integer:
'int' object is not iterable
We can, however, increment a list with additional lists:
Count list: [1, 1]
Most common is to increment an integer count by 1, for each loop iteration:
count_int: 2
It's also possible to check the length of a list you incremented
by one element per loop iteration:
2
"""
Hope that helps. Good luck learning Python!

Related

I need the code to stop after break and it should not print max(b)

Rahul was learning about numbers in list. He came across one word ground of a number.
A ground of a number is defined as the number which is just smaller or equal to the number given to you.Hence he started solving some assignments related to it. He got struck in some questions. Your task is to help him.
O(n) time complexity
O(n) Auxilary space
Input Description:
First line contains two numbers ‘n’ denoting number of integers and ‘k’ whose ground is to be check. Next line contains n space separated numbers.
Output Description:
Print the index of val.Print -1 if equal or near exqual number
Sample Input :
7 3
1 2 3 4 5 6 7
Sample Output :
2
`
n,k = 7,3
a= [1,2,3,4,5,6,7]
b=[]
for i in range(n):
if k==a[i]:
print(i)
break
elif a[i]<k:
b.append(i)
print(max(b))
`
I've found a solution, you can pour in if you've any different views
n,k = 7,12
a= [1,2,3,4,5,6,7]
b=[]
for i in range(n):
if k==a[i]:
print(i)
break
elif a[i]<k:
b.append(i)
else:
print(max(b))
From what I understand, these are the conditions to your question,
If can find number, print the number and break
If cannot find number, get the last index IF it's less than value k
Firstly, it's unsafe to manually input the length of iteration for your list, do it like this:
k = 3
a= [1,7,2,2,5,1,7]
finalindex = 0
for i, val in enumerate(a):
if val==k:
finalindex = i #+1 to index because loop will end prematurely
break
elif val>=k:
continue
finalindex = i #skip this if value not more or equal to k
print(finalindex) #this will either be the index of value found OR if not found,
#it will be the latest index which value lesser than k
Basically, you don't need to print twice. because it's mutually exclusive. Either you find the value, print the index or if you don't find it you print the latest index that is lesser than K. You don't need max() because the index only increases, and the latest index would already be your max value.
Another thing that I notice, if you use your else statement like in your answer, if you have two elements in your list that are larger than value K, you will be printing max() twice. It's redundant
else:
print(max(b))

Shortest code to return current index number in string in 'for n in 'word': loop

I have a question about strings. I thought that this code:
for n in 'banana':
print(n)
would return this:
0
1
2
3
4
5
But, of course, it doesn't. It returns the value at each position in the string, not the position number. In order for me to understand this better, I thought it might help to write the simplest possible program to achieve the output I thought I'd get:
count = 0
for n in 'banana':
print(count)
count += 1
This works, but surely there's a more direct way to access the position number that the current iteration is looking at? Can't see any methods that would achieve this directly though.
These are all equivalent:
i = 0
for n in 'banana':
print(i)
i += 1
for i, w in enumerate('banana'):
print(i)
for i in range(len('banana')):
print(i)
print(*range(len('banana')), sep='\n')
As posted in the other answer, enumerate() works:
for idx, character in enumerate('myword'):
print(f"Index={idx} character={character}")
It is worth pointing out that in this Python treats strings as arrays. When you have "abc"[0] it will return a. And, similarly, when you say 'give me each element in some array' it will simply give you the element, not the index of that element - which would be counterintuitive.

How to count number of substrings in python, if substrings overlap?

The count() function returns the number of times a substring occurs in a string, but it fails in case of overlapping strings.
Let's say my input is:
^_^_^-_-
I want to find how many times ^_^ occurs in the string.
mystr=input()
happy=mystr.count('^_^')
sad=mystr.count('-_-')
print(happy)
print(sad)
Output is:
1
1
I am expecting:
2
1
How can I achieve the desired result?
New Version
You can solve this problem without writing any explicit loops using regex. As #abhijith-pk's answer cleverly suggests, you can search for the first character only, with the remainder being placed in a positive lookahead, which will allow you to make the match with overlaps:
def count_overlapping(string, pattern):
regex = '{}(?={})'.format(re.escape(pattern[:1]), re.escape(pattern[1:]))
# Consume iterator, get count with minimal memory usage
return sum(1 for _ in re.finditer(regex, string))
[IDEOne Link]
Using [:1] and [1:] for the indices allows the function to handle the empty string without special processing, while using [0] and [1:] for the indices would not.
Old Version
You can always write your own routine using the fact that str.find allows you to specify a starting index. This routine will not be very efficient, but it should work:
def count_overlapping(string, pattern):
count = 0
start = -1
while True:
start = string.find(pattern, start + 1)
if start < 0:
return count
count += 1
[IDEOne Link]
Usage
Both versions return identical results. A sample usage would be:
>>> mystr = '^_^_^-_-'
>>> count_overlapping(mystr, '^_^')
2
>>> count_overlapping(mystr, '-_-')
1
>>> count_overlapping(mystr, '')
9
>>> count_overlapping(mystr, 'x')
0
Notice that the empty string is found len(mystr) + 1 times. I consider this to be intuitively correct because it is effectively between and around every character.
you can use regex for a quick and dirty solution :
import re
mystr='^_^_^-_-'
print(len(re.findall('\^(?=_\^)',mystr)))
You need something like this
def count_substr(string,substr):
n=len(substr)
count=0
for i in range(len(string)-len(substr)+1):
if(string[i:i+len(substr)] == substr):
count+=1
return count
mystr=input()
print(count_substr(mystr,'121'))
Input: 12121990
Output: 2

merging some entries in a python list based on length of items

I have a list of about 20-30 items [strings].
I'm able to print them out in my program just fine - but I'd like to save some space, and merge items that are shorter...
So basically, if I have 2 consecutive items that the combined lengths are less than 30, I want to join those to items as a single entry in the list - with a / between them
I'm not coming up with a simple way of doing this.
I don't care if I do it in the same list, or make a new list of items... it's all happening inside 1 function...
You need to loop through the list and keep joining items till they satisfy your requirement (size 30). Then add them to a new list when an element grows that big.
l=[] # your new list
buff=yourList[0] if len(yourList)>0 else "" # hold strings till they reach desired length
for i in range(1,len(yourList)):
# check if concatenating will exceed the size or not
t=yourList[i]
if (len(buff) + len(t) + 1) <= 30:
buff+="/"+t
else:
l.append(buff)
buff=t
l.append(buff) # since last element is yet to be inserted
You can extend method of list as follows:
a = [1,2,3]
b = [4,5,6]
a.append('/')
a.extend(b)
You just need to check the size of two list a and b as per your requirements.
I hope I understood your problem !
This code worked for me, you can check to see if that's what you wanted, it's a bit lenghty but it works.
list1 = yourListOfElements
for elem in list1:
try: # Needs try/except otherwise last iteration would throw an indexerror
listaAUX = [] # Auxiliar list to check length and join smaller elements. You can probably eliminate this using list slicing
listaAUX.append(elem)
listaAUX.append(list1[list1.index(elem)+1])
if len(listaAUX[0]) + len(listaAUX[1]) < 30:
concatenated = '/'.join(listaAUX)
print(concatenated)
else:
print(elem)
except IndexError:
print(elem)

python 3.x for loop not stopping at desired value

i'm reading a beginners book written in Python 2.x but I decided to follow it using 3.5
the book is about data wrangling and while reading an Excel file using the library xlrd it gives a quick example of how Counters work:
count = 0
for i in range(1000):
if count < 10;
print i
count += 1
print 'Count: ', count
first of all, i know in Python 3.x print is actually print() and i also learned that in 2.x range(1000) IS NOT the same in 3.x
so i managed to run the code without errors but not with the desired result:
count = 0
my_list = list(range(1000))
for i in my_list:
if count < 50:
print(i)
count += 1
print(count)
the result were the numbers from 1001 to 2000. clearly not what i meant it to do, so it made my think if the += was the same for 3.x but couldnt find much information, so i tried the (at least to me) logical way:
count = 0
my_list = list(range(1000))
for i in my_list:
if count < 50:
print(i)
count = count + 1
print(count)
but now the result are all numbers from 0 to 1000 BUT all numbers from 1 to 49 are repeated once. so i changed count for just i but made no difference.
clearly none of my attempts stopped at 50...
i appreciate all input in advance
I think your confusion is originating from increment-ing count at the end of your for loop.
the way you have it written if you do for k in range(5) ... counter will increment from 0 all the way up to 5, because it increments once past it's value on each iteration.
Running this code will sow that count counts higher than your itterating variable k in the loop.
print ("Hello World!");
count = 0
for k in range(5):
print("k: "+str(k))
print("count: "+str(count))
count+=1
print("countNow: "+str(count))

Resources