Fuzzy Wuzzy Not Comparing Every String Against Every Other String in String_List - fuzzywuzzy

I'm hoping to use fuzzy wuzzy to compare all strings in a list against each other, but it looks not every string is being compared against one another in the list. Here's what I've tried:
matrix = [(x,) + i for item in output for x in item for i in process.extract(x, item, scorer=fuzz.partial_ratio)]
A.K.A
for item in output:
for x in item:
for i in process.extract(x,item,scorer=fuzz.partial_ratio):
Here is one item for which each string is being checked against all other strings for similarity:
[['Java',
'JavaVersio',
'Control',
'GitTools',
'Sketch',
'IVision',
'Zepli',
'Go',
'GoAutomatedTesting',
'AutomatedTestingProjectManagement',
'AgileMethodology',
'ScrumEnglish',
'Writte',
'English',
'Spoke',
'EnglishMobile',
'ReactNative',
'Ionic',
'Android',
'Kotlin',
'ObjectiveC'],
['HTML',
'HTMLJava',
'JavaJavaScript',
'JavaScript',
'React',
'NodejsVersio',
'Control',
'GitManualQA',...
So there should be 210 comparisons made ((k * (k-1)/2)), but here you're able to see that the beginning of the next item is being compared at index 105:
matrix_df = pd.DataFrame(matrix, columns=["word", "match", "score"])
matrix_df[100:150]
word match score
100 ObjectiveC ObjectiveC 100
101 ObjectiveC ReactNative 57
102 ObjectiveC AutomatedTestingProjectManagement 45
103 ObjectiveC Ionic 40
104 ObjectiveC Sketch 38
105 HTML HTML 100
106 HTML HTMLJava 90
107 HTML Control 45
108 HTML GitManualQA 45
109 HTML PostgreSQLManagementHosting 45
110 HTMLJava HTMLJava 100
111 HTMLJava HTML 90
112 HTMLJava JavaJavaScript 45
Why would this be happening and how would I fix it???
Thank you!

The function process.extract in fuzzywuzzy has the following arguments:
def extract(query, choices, processor=default_processor, scorer=default_scorer, limit=5):
here limit is set to 5 by default which means the function will only return a list with up to the 5 best matches within choices (less when choices does not has 5 elements). So to get the scores for all elements instead you should pass the argument limit=None.
matrix = [
(x,) + i for item in output
for x in item
for i in process.extract(x, item, scorer=fuzz.partial_ratio, limit=None)
]

Related

Set is reverse sorted by default for a list having negative integer

n = int(input())
arr = list(map(int, input().split()))
s = set(arr)
print(s)
print(list(s)[len(s) - 2])
Input:
4
57 57 -57 57
Output:
{57, -57}
57
I am trying to find the second largest number in the given list. In the above code the set is reverse sorted which should not be the case. Why is this happening ?
A set doesn't have a defined order of any kind. You need to convert the set to a list then explicitly sort it. sorted will produce a list automatically.
print(sorted(s)[-2] if len(s) >= 2 else None)

Most effient way to find all combinations of elements in a long list Python

Sorry the title looks a little far-fetched. I was asked to calculate the sum of 2 or more elements in a list. I searched it up on the web, I found some results, I tested them, but it didn't work...
input.txt
35
20
15
25
47
40
62
55
65
95
102
117
150
182
127
219
299
277
309
576
code.py
from itertools import combinations
def combo(arr, r):
return list(combinations(arr, r))
with open("input.txt") as f:
nums = f.read().split("\n")
nums = [int(item) for item in nums]
r = range(2,20)
for rr in r:
for c in combo(nums, rr):
if sum(c) == 127:
print(c)
It works in the code above, it works because the list is quite short. However, the input.txt I recieved was 100 lines long! In that case, Python threw a MemoryError. So I needed to find a better way. Unfortunately, I didn't find any other way, except a longer way, so I ask the question, "What is the most effient way to find the combination of elements in a long list in Python".
You can try to save memory by not converting the output of itertools.combinations to a list and instead just iterating over the generator output:
for rr in range(2, 22):
print(f"combine {rr} values")
for c in itertools.combinations(values, rr):
if sum(c) == 127:
print(c)
Check: Making all possible combinations of a list

getting specific data in range of 1-10 from dict

Im fairly new to programming and came upon a work-related problem(nothing to do with computers or any)
and thought i could automaticly calculate this for my employees.
update:
It needs to scan through the dict keys and find the number related to the input:
For example input 289:
Search through Dict keys
271: 1393.289, # 7.5505
281: 1468.817, # 7.5735
291: 1544.574, # 7.5962
301: 1620.559, # 7.6188
289 is between 281 and 291
Take value of 281 which is 1468.817(use this as base value)
289-281 = 8, so remainder of 8 is then multiplied by 7.5735(the [1] index of the dict) then return value of base + remainder * 7.5735 = 1.529,405
the value that we need to multiple remainder with changes on every new dict key
I am not sure I fully comprehend what you are trying to do, but if I understand correctly you are processing a user input with a special function that will yield a result 'base' in the range of 0 to 320. Using this computed base you then want to select a key from a dictionary basisIJK such that key-1 <= key < key+1. Having identified the appropriate basisIJK dictionary key you want to return key + (base-key)* basisIJK[key][1].
If my understanding is correct you can do the following:
basisIJK = {271 : [1393.289, 7.5505], 281 : [1468.817, 7.5735], 291 : [1544.574, 7.5962], 301 : [1620.559, 7.6188]}
def find_value(n, d):
kys = sorted(list(d.keys()))
for i in range(1, len(kys)):
if kys[i-1] <= n and n < kys[i]:
r = d[kys[i-1]][0] + (n - kys[i-1]) * d[kys[i-1]][1]
return r
return None
executing
find_value(289, basisIJK)
Yields
1529.405

Pyhton3 strange loop outcome

def addsongs(code, aantal, liedjes_dir):
songs = lijst = []
liedjes = os.listdir(liedjes_dir)
for l in liedjes:
if l.startswith(code):
songs.append(l)
random.shuffle(songs)
a = len(songs)
if a < aantal:
aantal = a
print('aantal: ', aantal)
for x in range(aantal):
lijst.append(songs[x])
print('len lijst: ', len(lijst))
quit()
return lijst
I have a folder containing alot of mp3 files. Files are like 010001.mp3, 081245.mp3 ...
The first 2 digits identify the genre of the mp3 like Pop, Jazz, Classic ...
With this function I want for example 175 mp3's from the genre Piano.
1st 2 digits of Piano are 08.
I go look in my mp 3folder and look for all the files that starts with 08 and when he find 1 he adds the mp3 to the song list.
Then I shuffle the song list and I want 175 Piano mp3's.
the var aantal in this case = 175
from the shuffled list, i want to add the first 175 mp3's and put them in a new list "lijst"
if I use while or for, don't matter. He prints out that aantal = 175 and the lenght of the lijst list = 475.
output:
aantal: 175 len lijst: 465
Why do I get 465 ?
Lists are mutable references.
songs = lijst = [] defines two variables that point to a single mutable list, so modifying one also modifies the other. Define them on two separate rows instead, so they point to two different lists.

Error when using xlswrite in Octave

I am trying to write a cell array to an Excel spreadsheet in Octave using the xlswrite from the io package in Octave (3.8.0, io 2.0.2 loaded, using Windows 7 64 bit).
The cell array looks like this:
>> pump_backlash(1:3,:)
ans =
{
[1,1] = Machine #
[2,1] = Machine_01
[3,1] = Machine_02
[1,2] = Station #
[2,2] = 1
[3,2] = 1
[1,3] = Pump channel #
[2,3] = 1
[3,3] = 2
[1,4] = Backlash
[2,4] =
57 65 62
[3,4] =
58 49 50
}
Except it's got many more rows. The first row consists of "headings" (strings), and then after that the first column is a string relating to the machine ID, the second and third columns are integers (scalars), and the fourth column of the cell array are 1x3 vectors of integers (although cells in the 4th column are sometimes empty if the test/measurement failed for whatever reason).
I try to write to Excel using the following command:
>> xlswrite('Pump_cal_results.xlsx',pump_backlash)
and the error message I get is as follows:
Creating file Pump_cal_results.xlsx
error: cellfun: all values must be scalars when UniformOutput = true
error: called from:
error: C:\Octave\Octave-3.8.0\share\octave\packages\io-2.0.2\private\spsh_prstype.m at line 62, column 6
error: C:\Octave\Octave-3.8.0\share\octave\packages\io-2.0.2\private\__COM_oct2spsh__.m at line 108, column 10
error: C:\Octave\Octave-3.8.0\share\octave\packages\io-2.0.2\oct2xls.m at line 189, column 18
error: C:\Octave\Octave-3.8.0\share\octave\packages\io-2.0.2\xlswrite.m at line 178, column 20
If I follow the error trail and go to line 62 of \private\spsh_prstype.m, I have:
ptr = cellfun ("isnan", obj); ## Find NaNs & set to BLANK
So it's obviously got something to do with that function call to cellfun, but I am not sure where to go from there. There are quite a few other function calls to cellfun in spsh_prstype.m.
The closest I have found by searching on the internet is this question, but there is no solution offered.
Any help/suggestions welcome.
Not having received any answers, I'll answer my own question :-)
I haven't worked out what the root cause of the problem and how to fix it, but I have found a workaround. It seems that the problems lies with the fact that the elements in the 4th column are vectors rather than scalars. It seems that to write to Excel, all elements in the cell array must be "uniform", which I take it to mean you can't mix scalars and vectors for example.
So my workaround was to re-arrange the cell array so that it now looks like:
>> pump_backlash(1:3,:)
ans =
{
[1,1] = Machine #
[2,1] = Machine_01
[3,1] = Machine_02
[1,2] = Station #
[2,2] = 1
[3,2] = 1
[1,3] = Pump channel #
[2,3] = 1
[3,3] = 2
[1,4] = Backlash #1
[2,4] = 57
[3,4] = 58
[1,5] = Backlash #2
[2,5] = 65
[3,5] = 49
[1,6] = Backlash #3
[2,6] = 62
[3,6] = 50
}
i.e. the cell array now has 6 columns instead of 4, and there's no more vectors, only scalars.
The call to xlswrite then works OK:
xlswrite('Pump_cal_results.xlsx',pump_backlash,'Backlash','','com');

Resources