Extracting names and values from a line using regex

Extracting names and values from a line using regex - python-3.x

I want to extract names and variables from a line and then store them in a dictionary as a key value pair using regular expression in python.
Eg:
A has 50000 rupees and B has 15000 rupees.C has 7854 rupees and D has 10000 rupees
It should look like {'A':50000,'B':15000,'C':7854,'D':10000}. Also integer can not have more than 5 digits

You can use this pattern: ([a-zA-Z])(?=\shas\s(\d{,5}))
See Regex Demo
Code:
import re
pattern = r'([a-zA-Z])(?=\shas\s(\d{,5}))'
text = 'A has 50000 rupees and B has 15000 rupees.C has 7854 rupees and D has 10000 rupees'
kv = {}
for key, value in re.findall(pattern, text):
kv[key] = value
print(kv)
Output:
{'A': '50000', 'B': '15000', 'C': '7854', 'D': '10000'}

Related

Change a dataframe column value based on the current value?

I have a pandas dataframe with several columns and in one of them, there are string values. I need to change these strings to an acceptable value based on the current value. The dataframe is relatively large (40.000 x 32)
I've made a small function that takes the string to be changed as a parameter and then lookup what this should be changed to.
df = pd.DataFrame({
'A': ['Script','Scrpt','MyScript','Sunday','Monday','qwerty'],
'B': ['Song','Blues','Rock','Classic','Whatever','Something']})
def lut(txt):
my_lut = {'Script' : ['Script','Scrpt','MyScript'],
'Weekday' : ['Sunday','Monday','Tuesday']}
for key, value in my_lut.items():
if txt in value:
return(key)
break
return('Unknown')
The desired output should be:
A B
0 Script Song
1 Script Blues
2 Script Rock
3 Weekday Classic
4 Weekday Whatever
5 Unknown Something
I can't figure out how to apply this to the dataframe.
I've struggled over this for some time now so any input will be appreciated
Regards,

Check this out:
import pandas as pd
df = pd.DataFrame({
'A': ['Script','Scrpt','MyScript','Sunday','sdfsd','qwerty'],
'B': ['Song','Blues','Rock','Classic','Whatever','Something']})
dic = {'Weekday': ['Sunday', 'Monday', 'Tuesday', 'Wednesday', 'Thursday', 'Friday'], 'Script': ['Script','Scrpt','MyScript']}
for k, v in dic.items():
for item in v:
df.loc[df.A == item, 'A'] = k
df.loc[~df.A.isin(k for k, v in dic.items()), 'A'] = "Unknown"
Output:

Task : Find unique elements in an array. Count their occurrences. Find the numbers that occur less than 10 times in an array of 5000 elements

I tried a few solutions :
1.)
uniqueValues, indexList,occurCount = np.unique(desired_array,
return_index=True, return_counts=True)
print(uniqueValues,indexList,occurCount)
However the indexList only gives first occurrence of a number. For example : if num 33 occurred at 20,56,3000, indexList would only show that it occurred at 20. Since 33 occurs less than 10 times, i.e 3 times, I need all the locations.
2.) I decided to use dictionary to find all the index locations. But this is not working.
for i in range(5000):
...: if not d.get(i):
...: d[desired_array[i]]=[i]
...: else:
...: indices = d[desired_array[i]]
...: indices.append(i)

This jobs screams for collections.Counter:
from collections import Counter
desired_array = [1, 2, 3, 1, 3, 5, 3]
result = Counter(desired_array)
print(result)
This will print out the unique elements and the count of occurrences:
Counter({3: 3, 1: 2, 2: 1, 5: 1})
You can replace
for i in range(1250):
var = desired_array[i]
if not d.get(var):
d[var] = []
# print(var)
s = d[var]
s.append(i)
with
for i in range(1250):
var = desired_array[i]
d.setdefault(var, []).append(i)
According to the documentation dict.setdefault(key, default):
If key is in the dictionary, return its value. If not, insert key with a value of default and return default. default defaults to None.
To write a csv file it's best to use the standard csv.Writer class:
import csv
with open('some.csv', 'w', newline='') as f:
writer = csv.writer(f)
writer.writerows(someiterable)
If you want to write the key/value pairs of your dict to the csv file you need to write something like:
with open('some.csv', 'w', newline='') as f:
writer = csv.writer(f)
for k, v in desired_array.items():
writer.writerow((k, v))

sort values of lists inside dictionary based on length of characters

d = {'A': ['A11117',
'33465'
'17160144',
'A11-33465',
'3040',
'A11-33465 W1',
'nor'], 'B': ['maD', 'vern', 'first', 'A2lRights']}
I have a dictionary d and I would like to sort the values based on length of characters. For instance, for key A the value A11-33465 W1 would be first because it contains 12 characters followed by 'A11-33465' because it contains 9 characters etc. I would like this output:
d = {'A': ['A11-33465 W1',
' A11-33465',
'17160144',
'A11117',
'33465',
'3040',
'nor'],
'B': ['A2lRights',
'first',
'vern',
'maD']}
(I understand that dictionaries are not able to be sorted but I have examples below that didn't work for me but the answer contains a dictionary that was sorted)
I have tried the following
python sorting dictionary by length of values
print(' '.join(sorted(d, key=lambda k: len(d[k]), reverse=True)))
Sort a dictionary by length of the value
sorted_items = sorted(d.items(), key = lambda item : len(item[1]))
newd = dict(sorted_items[-2:])
How do I sort a dictionary by value?
import operator
sorted_x = sorted(d.items(), key=operator.itemgetter(1))
But they both do not give me what I am looking for.
How do I get my desired output?

You are not sorting the dict, you are sorting the lists inside it. The simplest will be a loop that sorts the lists in-place:
for k, lst in d.items():
lst.sort(key=len, reverse=True)
This will turn d into:
{'A': ['3346517160144', 'A11-33465 W1', 'A11-33465', 'A11117', '3040', 'nor'],
'B': ['A2lRights', 'first', 'vern', 'maD']}
If you want to keep the original data intact, use a comprehension like:
sorted_d = {k: sorted(lst, key=len, reverse=True) for k, lst in d.items()}

return dictionary of file names as keys and word lists with words unique to file as values

I am trying to write a function to extract only words unique to each key and list them in a dictionary output like {"key1": "unique words", "key2": "unique words", ... }. I start out with a dictionary. To test with I created a simple dictionary:
d = {1:["one", "two", "three"], 2:["two", "four",
"five"], 3:["one","four", "six"]}
My output should be:
{1:"three",
2:"five",
3:"six"}
I am thinking maybe split in to separate lists
def return_unique(dct):
Klist = list(dct.keys())
Vlist = list(dct.values())
aList = []
for i in range(len(Vlist)):
for j in Vlist[i]:
if
What I'm stuck on is how do I tell Python to do this: if Vlist[i][j] is not in the rest of Vlist then aList.append(Vlist[i][j]).
Thank you.

You can try something like this:
def return_unique(data):
all_values = []
for i in data.values(): # Get all values
all_values = all_values + i
unique_values = set([x for x in all_values if all_values.count(x) == 1]) # Values which are not duplicated
for key, value in data.items(): # For Python 3.x ( For Python 2.x -> data.iteritems())
for item in value: # Comparing values of two lists
for item1 in unique_values:
if item == item1:
data[key] = item
return data
d = {1:["one", "two", "three"], 2:["two", "four", "five"], 3:["one","four", "six"]}
print (return_unique(d))
result >> {1: 'three', 2: 'five', 3: 'six'}

Since a key may have more than one unique word associated with it, it makes sense for the values in the new dictionary to be a container type object to hold the unique words.
The set difference operator returns the difference between 2 sets:
>>> a = set([1, 2, 3])
>>> b = set([2, 4, 6])
>>> a - b
{1, 3}
We can use this to get the values unique to each key. Packaging these into a simple function yields:
def unique_words_dict(data):
res = {}
values = []
for k in data:
for g in data:
if g != k:
values += data[g]
res[k] = set(data[k]) - set(values)
values = []
return res
>>> d = {1:["one", "two", "three"],
2:["two", "four", "five"],
3:["one","four", "six"]}
>>> unique_words_dict(d)
{1: {'three'}, 2: {'five'}, 3: {'six'}}
If you only had to do this once, then you might be interested in the less efficeint but more consice dictionary comprehension:
>>> from functools import reduce
>>> {k: set(d[k]) - set(reduce(lambda a, b: a+b, [d[g] for g in d if g!=k], [])) for k in d}
{1: {'three'}, 2: {'five'}, 3: {'six'}}

Reading input as dictionary in python

Im trying to read input spread across multiline in form of dictionary and apply simple math operations on the value of dictionary . My code reads
d ={}
bal=0
text = input().split(",") #split the input text based on line'text'
print(text)
for i in range(5):
text1 = text[i].split(" ") #split the input text based on space & store in the list 'text1'
d[text1[0]] = int(text1[1]) #assign the 1st item to key and 2nd item to value of the dictionary
print(d)
for key in d:
if key=='D':
bal=bal+int(d[key])
#print(d[key])
elif key=='W':
bal=bal-int(d[key])
print(bal)
Input : W 300,W 200,D 100,D 400,D 600
output :{'D': 600, 'W': 200}
400
Expected Output: {'W':300,'W':200,'D':100,'D':400,'D':600}
600
ISSUE: The issue here is the code always reads 2 and last values only . For example in the above case output is
{'D': 600, 'W': 200}
400
Can someone let me know the issue with for loop .
Thanks in advance

You can try like this in a simpler way using your own approach. #Rakesh and #Sabesh suggested good. Dictionary is an unordered collection with unique and immutable keys. You can easily check this on your Python interactive console by executing help(dict).
You can check https://docs.python.org/2/library/collections.html#collections.defaultdict . Here you'll find number of examples on how to efficiently using dictionary.
>>> d = {}
>>> text = 'W 300,W 200,D 100,D 400,D 600'
>>>
>>> for item in text.split(","):
... arr = item.split()
... d.setdefault(arr[0], []).append(arr[1])
...
>>> d
{'W': ['300', '200'], 'D': ['100', '400', '600']}
>>>
>>> w = [int(n) for n in d['W']]
>>> d = [int(n) for n in d['D']]
>>>
>>> bal = sum(d) - sum(w)
>>> bal
600
>>>

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Extracting names and values from a line using regex - python-3.x

Related

Change a dataframe column value based on the current value?

Task : Find unique elements in an array. Count their occurrences. Find the numbers that occur less than 10 times in an array of 5000 elements

sort values of lists inside dictionary based on length of characters

return dictionary of file names as keys and word lists with words unique to file as values

Reading input as dictionary in python

Categories

Resources