Extracting names and values from a line using regex - python-3.x

I want to extract names and variables from a line and then store them in a dictionary as a key value pair using regular expression in python.
Eg:
A has 50000 rupees and B has 15000 rupees.C has 7854 rupees and D has 10000 rupees
It should look like {'A':50000,'B':15000,'C':7854,'D':10000}. Also integer can not have more than 5 digits

You can use this pattern: ([a-zA-Z])(?=\shas\s(\d{,5}))
See Regex Demo
Code:
import re
pattern = r'([a-zA-Z])(?=\shas\s(\d{,5}))'
text = 'A has 50000 rupees and B has 15000 rupees.C has 7854 rupees and D has 10000 rupees'
kv = {}
for key, value in re.findall(pattern, text):
kv[key] = value
print(kv)
Output:
{'A': '50000', 'B': '15000', 'C': '7854', 'D': '10000'}

Related

Change a dataframe column value based on the current value?

I have a pandas dataframe with several columns and in one of them, there are string values. I need to change these strings to an acceptable value based on the current value. The dataframe is relatively large (40.000 x 32)
I've made a small function that takes the string to be changed as a parameter and then lookup what this should be changed to.
df = pd.DataFrame({
'A': ['Script','Scrpt','MyScript','Sunday','Monday','qwerty'],
'B': ['Song','Blues','Rock','Classic','Whatever','Something']})
def lut(txt):
my_lut = {'Script' : ['Script','Scrpt','MyScript'],
'Weekday' : ['Sunday','Monday','Tuesday']}
for key, value in my_lut.items():
if txt in value:
return(key)
break
return('Unknown')
The desired output should be:
A B
0 Script Song
1 Script Blues
2 Script Rock
3 Weekday Classic
4 Weekday Whatever
5 Unknown Something
I can't figure out how to apply this to the dataframe.
I've struggled over this for some time now so any input will be appreciated
Regards,
Check this out:
import pandas as pd
df = pd.DataFrame({
'A': ['Script','Scrpt','MyScript','Sunday','sdfsd','qwerty'],
'B': ['Song','Blues','Rock','Classic','Whatever','Something']})
dic = {'Weekday': ['Sunday', 'Monday', 'Tuesday', 'Wednesday', 'Thursday', 'Friday'], 'Script': ['Script','Scrpt','MyScript']}
for k, v in dic.items():
for item in v:
df.loc[df.A == item, 'A'] = k
df.loc[~df.A.isin(k for k, v in dic.items()), 'A'] = "Unknown"
Output:

Task : Find unique elements in an array. Count their occurrences. Find the numbers that occur less than 10 times in an array of 5000 elements

I tried a few solutions :
1.)
uniqueValues, indexList,occurCount = np.unique(desired_array,
return_index=True, return_counts=True)
print(uniqueValues,indexList,occurCount)
However the indexList only gives first occurrence of a number. For example : if num 33 occurred at 20,56,3000, indexList would only show that it occurred at 20. Since 33 occurs less than 10 times, i.e 3 times, I need all the locations.
2.) I decided to use dictionary to find all the index locations. But this is not working.
for i in range(5000):
...: if not d.get(i):
...: d[desired_array[i]]=[i]
...: else:
...: indices = d[desired_array[i]]
...: indices.append(i)
This jobs screams for collections.Counter:
from collections import Counter
desired_array = [1, 2, 3, 1, 3, 5, 3]
result = Counter(desired_array)
print(result)
This will print out the unique elements and the count of occurrences:
Counter({3: 3, 1: 2, 2: 1, 5: 1})
You can replace
for i in range(1250):
var = desired_array[i]
if not d.get(var):
d[var] = []
# print(var)
s = d[var]
s.append(i)
with
for i in range(1250):
var = desired_array[i]
d.setdefault(var, []).append(i)
According to the documentation dict.setdefault(key, default):
If key is in the dictionary, return its value. If not, insert key with a value of default and return default. default defaults to None.
To write a csv file it's best to use the standard csv.Writer class:
import csv
with open('some.csv', 'w', newline='') as f:
writer = csv.writer(f)
writer.writerows(someiterable)
If you want to write the key/value pairs of your dict to the csv file you need to write something like:
with open('some.csv', 'w', newline='') as f:
writer = csv.writer(f)
for k, v in desired_array.items():
writer.writerow((k, v))

sort values of lists inside dictionary based on length of characters

d = {'A': ['A11117',
'33465'
'17160144',
'A11-33465',
'3040',
'A11-33465 W1',
'nor'], 'B': ['maD', 'vern', 'first', 'A2lRights']}
I have a dictionary d and I would like to sort the values based on length of characters. For instance, for key A the value A11-33465 W1 would be first because it contains 12 characters followed by 'A11-33465' because it contains 9 characters etc. I would like this output:
d = {'A': ['A11-33465 W1',
' A11-33465',
'17160144',
'A11117',
'33465',
'3040',
'nor'],
'B': ['A2lRights',
'first',
'vern',
'maD']}
(I understand that dictionaries are not able to be sorted but I have examples below that didn't work for me but the answer contains a dictionary that was sorted)
I have tried the following
python sorting dictionary by length of values
print(' '.join(sorted(d, key=lambda k: len(d[k]), reverse=True)))
Sort a dictionary by length of the value
sorted_items = sorted(d.items(), key = lambda item : len(item[1]))
newd = dict(sorted_items[-2:])
How do I sort a dictionary by value?
import operator
sorted_x = sorted(d.items(), key=operator.itemgetter(1))
But they both do not give me what I am looking for.
How do I get my desired output?
You are not sorting the dict, you are sorting the lists inside it. The simplest will be a loop that sorts the lists in-place:
for k, lst in d.items():
lst.sort(key=len, reverse=True)
This will turn d into:
{'A': ['3346517160144', 'A11-33465 W1', 'A11-33465', 'A11117', '3040', 'nor'],
'B': ['A2lRights', 'first', 'vern', 'maD']}
If you want to keep the original data intact, use a comprehension like:
sorted_d = {k: sorted(lst, key=len, reverse=True) for k, lst in d.items()}

return dictionary of file names as keys and word lists with words unique to file as values

I am trying to write a function to extract only words unique to each key and list them in a dictionary output like {"key1": "unique words", "key2": "unique words", ... }. I start out with a dictionary. To test with I created a simple dictionary:
d = {1:["one", "two", "three"], 2:["two", "four",
"five"], 3:["one","four", "six"]}
My output should be:
{1:"three",
2:"five",
3:"six"}
I am thinking maybe split in to separate lists
def return_unique(dct):
Klist = list(dct.keys())
Vlist = list(dct.values())
aList = []
for i in range(len(Vlist)):
for j in Vlist[i]:
if
What I'm stuck on is how do I tell Python to do this: if Vlist[i][j] is not in the rest of Vlist then aList.append(Vlist[i][j]).
Thank you.
You can try something like this:
def return_unique(data):
all_values = []
for i in data.values(): # Get all values
all_values = all_values + i
unique_values = set([x for x in all_values if all_values.count(x) == 1]) # Values which are not duplicated
for key, value in data.items(): # For Python 3.x ( For Python 2.x -> data.iteritems())
for item in value: # Comparing values of two lists
for item1 in unique_values:
if item == item1:
data[key] = item
return data
d = {1:["one", "two", "three"], 2:["two", "four", "five"], 3:["one","four", "six"]}
print (return_unique(d))
result >> {1: 'three', 2: 'five', 3: 'six'}
Since a key may have more than one unique word associated with it, it makes sense for the values in the new dictionary to be a container type object to hold the unique words.
The set difference operator returns the difference between 2 sets:
>>> a = set([1, 2, 3])
>>> b = set([2, 4, 6])
>>> a - b
{1, 3}
We can use this to get the values unique to each key. Packaging these into a simple function yields:
def unique_words_dict(data):
res = {}
values = []
for k in data:
for g in data:
if g != k:
values += data[g]
res[k] = set(data[k]) - set(values)
values = []
return res
>>> d = {1:["one", "two", "three"],
2:["two", "four", "five"],
3:["one","four", "six"]}
>>> unique_words_dict(d)
{1: {'three'}, 2: {'five'}, 3: {'six'}}
If you only had to do this once, then you might be interested in the less efficeint but more consice dictionary comprehension:
>>> from functools import reduce
>>> {k: set(d[k]) - set(reduce(lambda a, b: a+b, [d[g] for g in d if g!=k], [])) for k in d}
{1: {'three'}, 2: {'five'}, 3: {'six'}}

Reading input as dictionary in python

Im trying to read input spread across multiline in form of dictionary and apply simple math operations on the value of dictionary . My code reads
d ={}
bal=0
text = input().split(",") #split the input text based on line'text'
print(text)
for i in range(5):
text1 = text[i].split(" ") #split the input text based on space & store in the list 'text1'
d[text1[0]] = int(text1[1]) #assign the 1st item to key and 2nd item to value of the dictionary
print(d)
for key in d:
if key=='D':
bal=bal+int(d[key])
#print(d[key])
elif key=='W':
bal=bal-int(d[key])
print(bal)
Input : W 300,W 200,D 100,D 400,D 600
output :{'D': 600, 'W': 200}
400
Expected Output: {'W':300,'W':200,'D':100,'D':400,'D':600}
600
ISSUE: The issue here is the code always reads 2 and last values only . For example in the above case output is
{'D': 600, 'W': 200}
400
Can someone let me know the issue with for loop .
Thanks in advance
You can try like this in a simpler way using your own approach. #Rakesh and #Sabesh suggested good. Dictionary is an unordered collection with unique and immutable keys. You can easily check this on your Python interactive console by executing help(dict).
You can check https://docs.python.org/2/library/collections.html#collections.defaultdict . Here you'll find number of examples on how to efficiently using dictionary.
>>> d = {}
>>> text = 'W 300,W 200,D 100,D 400,D 600'
>>>
>>> for item in text.split(","):
... arr = item.split()
... d.setdefault(arr[0], []).append(arr[1])
...
>>> d
{'W': ['300', '200'], 'D': ['100', '400', '600']}
>>>
>>> w = [int(n) for n in d['W']]
>>> d = [int(n) for n in d['D']]
>>>
>>> bal = sum(d) - sum(w)
>>> bal
600
>>>

Resources