match a pattern in text and read it to different variables

match a pattern in text and read it to different variables - python-3.x

I have the following pattern: <num1>-<num2> <char a-z>: <string>. For example, 1-3 z: zztop
I'd like to parse them to n1=1, n2=3, c='z', s='zztop'
Of course I can do this easily with splitting but is there a more compact way to do that in Python?

Using re.finditer with a regex having named capture groups:
inp = "1-3 z: zztop"
r = re.compile('(?P<n1>[0-9]+)-(?P<n2>[0-9]+) (?P<c>\w+):\s*(?P<s>\w+)')
output = [m.groupdict() for m in r.finditer(inp)]
print(output) # [{'n1': '1', 'n2': '3', 'c': 'z', 's': 'zztop'}]

Related

Splitting a sequence of number into a list (Python)

I wanted to ask you how can I split in Python for example this string '20020050055' into a list of integer that looks like [200, 200, 500, 5, 5].
I was thinking about enumerate but do you have any example of a better solution to accomplish this example? thanks

One approach, using a regex find all:
inp = '20020050055'
matches = re.findall(r'[1-9]0*', inp)
print(matches) # ['200', '200', '500', '5', '5']
If, for some reason, you can't use regular expressions, here is an iterative approach:
inp = '20020050055'
matches = []
num = ''
for i in inp:
if i != '0':
if num != '':
matches.append(num)
num = i
else:
num = num + i
matches.append(num)
print(matches) # ['200', '200', '500', '5', '5']
The idea here is to build out each match one digit at a time. When we encounter a non zero digit, we start a new match. For zeroes, we keep concatenating them until reaching the end of the input or the next non zero digit.

Retrieving repeated items in a list comprehension

Given a sorted list, I would like to retrieve the first repeated item in the list using list comprehension.
So I ran the line below:
list=['1', '2', '3', 'a', 'a', 'b', 'c']
print(k for k in list if k==k+1)
I expected the output "a". But instead I got:
<generator object <genexpr> at 0x0021AB30>
I'm pretty new at this, would someone be willing to clarify why this doesn't work?

You seem to confuse the notion of list element and index.
For example the generator expression iterating over all items of list xs equal to its predecessor would look like this:
g = (xs[k] for k in range(1, len(xs)) if xs[k] == xs[k - 1])
Since you are interested only in first such item, you could write
next(xs[k] for k in range(1, len(xs)) if xs[k] == xs[k - 1])
however you'll get an exception if there is in fact no such items.
As a general advice, prefer simple readable functions over clever long one-liners,
especially when you are new to language. Your task could be accomplished as follows:
def first_duplicate(xs):
for k in range(1, len(xs)):
if xs[k] == xs[k - 1]:
return xs[k]
chars = ['1', '2', '3', 'a', 'a', 'b', 'c']
print(first_duplicate(chars)) # 'a'
P.S. Beware using list as your variable name -- you're shadowing built-in type

If you want just the first repeated item in the list you can use the next function with a generator expression that iterates through the list zipped with itself but with an offset of 1 to compare adjacent items:
next(a for a, b in zip(lst, lst[1:]) if a == b)
so that given lst = ['1', '2', '3', 'a', 'a', 'b', 'c'], the above returns: 'a'.

What's the one liner to split a string to dictionary with default value in python3?

I have a input string input_str = 'a=1;b=2;c' and I want to split it into dictionary as {'a':1, 'b':2, 'c': '.'}
input_str = 'a=1;b=2;c'
default = '.'
output = dict(s.split('=') if '=' in s else {s ,default} for s in input_str.split(';'))
print(output)
{'a': '1', 'b': '2', '.': 'c'}
# Output I want:
{'a': '1', 'b': '2', 'c': '.'}
Following code works.But I was looking for a one liner with dict comprehension.
my_result = {}
input_str = 'a=1;b=2;c'
for s in input_str.split(';'):
if '=' in s:
key, val = s.split('=')
my_result[key] = val
else:
my_result[s] = '.'
I noticed that else condition in above code {s ,default} is treated as set. How to convert it into dictionary.

As you noted, {s, default} defines a set, and the order of sets is undefined.
All you need to do to remedy this is to use a list instead.
dict(s.split('=', 1) if '=' in s else [s, default] for s in input_str.split(';'))
Note, this is unlikely to be very useful in real-life unless you have very restricted requirements. What happens if you want to include a value that contains a ';' character?
By changing the first split() call to have , 1, this means that the value will only ever be split once, no matter how many '=' characters there are.
For example, trying to parse an input of: a=bad=value;b=2 would raise a ValueError.

Concat multiple CSV rows into 1 in python

I am trying to contact the CSV rows. I tried to convert the CSV rows to list by pandas but it gets 'nan' values appended as some files are empty.
Also, I tried using zip but it concats column values.
with open(i) as f:
lines = f.readlines()
res = ""
for i, j in zip(lines[0].strip().split(','), lines[1].strip().split(',')):
res += "{} {},".format(i, j)
print(res.rstrip(','))
for line in lines[2:]:
print(line)
I have data as below,
Input data:-
Input CSV Data
Expected Output:-
Output CSV Data
The number of rows are more than 3,only sample is given here.
Suggest a way which will achieve the above task without creating a new file. Please point to any specific function or sample code.

This assumes your first line contains the correct amount of columns. It will read the whole file, ignore empty data ( ",,,,,," ) and accumulate enough data points to fill one row, then switch to the next row:
Write test file:
with open ("f.txt","w")as f:
f.write("""Circle,Year,1,2,3,4,5,6,7,8,9,10,11,12
abc,2018,,,,,,,,,,,,
2.2,8.0,6.5,9,88,,,,,,,,,,
55,66,77,88,,,,,,,,,,
5,3.2,7
def,2017,,,,,,,,,,,,
2.2,8.0,6.5,9,88,,,,,,,,,,
55,66,77,88,,,,,,,,,,
5,3.2,7
""")
Process test file:
data = [] # all data
temp = [] # data storage until enough found , then put into data
with open("f.txt","r") as r:
# get header and its lenght
title = r.readline().rstrip().split(",")
lenTitel = len(title)
data.append(title)
# process all remaining lines of the file
for l in r:
t = l.rstrip().split(",") # read one lines data
temp.extend( (x for x in t if x) ) # this eliminates all empty ,, pieces even in between
# if enough data accumulated, put as sublist into data, keep rest
if len (temp) > lenTitel:
data.append( temp[:lenTitel] )
temp = temp [lenTitel:]
if temp:
data.append(temp)
print(data)
Output:
[['Circle', 'Year', '1', '2', '3', '4', '5', '6', '7', '8', '9', '10', '11', '12'],
['abc', '2018', '2.2', '8.0', '6.5', '9', '88', '55', '66', '77', '88', '5', '3.2', '7'],
['def', '2017', '2.2', '8.0', '6.5', '9', '88', '55', '66', '77', '88', '5', '3.2', '7']]
Remarks:
your file cant have leading newlines, else the size of the title is incorrect.
newlines in between do not harm
you cannot have "empty" cells - they get eliminated

As long as nothing weird is going on in the files, something like this should work:
with open(i) as f:
result = []
for line in f:
result += line.strip().split(',')
print(result)

How do I join portions of a list in Python

Trying to join only portions of a large list that has numbers in it. For example:
h = ['9 This is the way this is the way 10 to program a string 11 to program a string']
##I've tried...
h[0].split()
z = []
h = ['9', 'This', 'is', 'the', 'way', 'this', 'is', 'the', 'way', '10', 'to', 'program', 'a', 'string', '11', 'to', 'program', 'a', 'string']
for i in h:
while i != '10':
z.append(i)
But the program runs an infinite loop. I've also tried if statements, if i != '10' then z.append(i). Basically, I have large portions of scripture that is in a list as a single string and I'd like to quickly extract the verses and put them in their own separate list. Thank you
Edit: I've tried...
h= ['9 nfnf dhhd snsn nana na 10 hfhf gkg utu 11 oeoe ldd sss', 'kgk hfh']
y = h[0].split()
print (y)
z = []
for i in y:
if i != "10":
z.append(i)
break
print (z)
Output is the split list and 'z' prints '9' only. I've also changed the break to the correct indentation for the 'for' loop

First of all, use the result you get from h[0].split(). You can do this by using h = h[0].split()
Now, lets get to the loop. It's going into an infinite loop because the for loop is picking the first i which is "9" and then while i != "10", it keeps appending i to z. i will never equal "10". Thus, the infinite loop. I think what you want here is:
for i in h:
if i != "10":
z.append(i)
else:
break
This will append every value of h into z until i is equal to "10". Let me know if you need more help and I'll be happy to edit!

Try this for extracting all numbers in z:
h = ['9 This is the way this is the way 10 to program a string 11 to program a string']
##I've tried...
h = h[0].split()
z = []
for i in h:
try:
z.append(eval(i))
except:
pass
print z
output:
[9, 10, 11]
[Finished in 0.0s]

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

match a pattern in text and read it to different variables - python-3.x

I have the following pattern: <num1>-<num2> <char a-z>: <string>. For example, 1-3 z: zztop I'd like to parse them to n1=1, n2=3, c='z', s='zztop' Of course I can do this easily with splitting but is there a more compact way to do that in Python?

Using re.finditer with a regex having named capture groups: inp = "1-3 z: zztop" r = re.compile('(?P<n1>[0-9]+)-(?P<n2>[0-9]+) (?P<c>\w+):\s*(?P<s>\w+)') output = [m.groupdict() for m in r.finditer(inp)] print(output) # [{'n1': '1', 'n2': '3', 'c': 'z', 's': 'zztop'}]

Related

Splitting a sequence of number into a list (Python)

Retrieving repeated items in a list comprehension

What's the one liner to split a string to dictionary with default value in python3?

Concat multiple CSV rows into 1 in python

How do I join portions of a list in Python

Categories

Resources