How to find most similar to an array in gensim

How to find most similar to an array in gensim - python-3.x

I know the most_similar method works when entering a previously added string, but how do you reverse search a numpy array of some word?
modelw2v = KeyedVectors.load_word2vec_format('GoogleNews-vectors-negative300.bin.gz',binary=True)
differenceArr = modelw2v["King"] - modelw2v["Queen"]
# This line does not work
modelw2v.most_similar(differenceArr)

The most_similar() method can take vectors as the origin of a search, but you should explicitly specify them as one member of a list provided to the method's positive parameter, so that its logic for handling more simple origins (like a string or list of strings) isn't confused.
Specifically, this should work with your other code:
model23v.most_similar(positive=[differenceArr,])
More generally, you can supply lists of vectors (or word-keys for looking up vectors) to both the positive and negative parameters of this method, and the method will combine them (according to the exact logic you can see in the source code). So for example the prominent word2vec example...
wv('king') - wv('man') + wv('woman') = ?
...can be effected with the most_similar() method without doing your own other vector-arithmetic:
sims = modelw2v.most_similar(positive=['king', 'woman'], negative=['man'])

Related

gcloud translate submitting lists

The codelab example for using gcloud translate via python only translates one string:
sample_text = "Hello world!"
target_language_code = "tr"
response = client.translate_text(
contents=[sample_text],
target_language_code=target_language_code,
parent=parent,
)
for translation in response.translations:
print(translation.translated_text)
But since it puts sample_text in a list and iterates over the response, I take it one can submit a longer list. Is this true and can I count on the items in the response corresponding to the order of items in contents? This must be the case but I can't find a clear answer in the docs.

translate_text contents is a Sequence[str] but must be less than 30k (codepoints).
For longer than 30k, use batch_translate_text
APIs Explorer provides an explanation of the request and response types for the translateText method. This allows you to call the underlying REST API method and it generates a 'form' for you in which content is an array of string (as expected).
The TranslateTextResponse describes translations as having the same length as contents.
There's no obvious other way to map entries in contents with translations so these must be in the same order, translations[foo] being the translation of contents[foo].
You can prove this to yourself by:
making the call with multiple known translations
including one word not in the source language (i.e. notknowninenglish in English) to confirm the translation result.

Contribution analyses - tagged.database

I need to get the single contribution of the processes and emissions I filled into my database - similar to this problem : Brightway2 - Get LCA scores of immediate exchanges
it works for single methods but i was wondering how to get these results for several methods similar to when doing the ordinary calculations which can then be saved as csv? is there a way to create a loop for this?
Thank you so much!
Miriam

There is a function called multi_traverse_tagged_database in bw2analyzer which should do what you need. It was part of a pull request so it's not in the docs.
I've copied in the docstring at the bottom which should give you some pointers. It's basically the same as the traverse_tagged_database function used in the question you've linked to, but for multiple methods. You'd use it like this:
results, graph = multi_traverse_tagged_databases(functional_unit, list_of_methods, label='name')
You should be able to use pandas to export the dictionary you get in results to a csv file.
def multi_traverse_tagged_databases(
functional_unit, methods, label="tag", default_tag="other", secondary_tags=[]
):
"""Traverse a functional unit throughout its foreground database(s), and
group impacts (for multiple methods) by tag label.
Input arguments:
* ``functional_unit``: A functional unit dictionary, e.g. ``{("foo", "bar"): 42}``.
* ``methods``: A list of method names, e.g. ``[("foo", "bar"), ("baz", "qux"), ...]``
* ``label``: The label of the tag classifier. Default is ``"tag"``
* ``default_tag``: The tag classifier to use if none was given. Default is ``"other"``
* ``secondary_tags``: List of tuples in the format (secondary_label, secondary_default_tag). Default is empty list.
Returns:
Aggregated tags dictionary from ``aggregate_tagged_graph``, and tagged supply chain graph from ``recurse_tagged_database``.
"""

squeak(smalltalk) how to use method `findSubstring: in: startingAt: matchTable:`?

what I should send for matchTable: selector?
in the implementation, there are no examples or detailed explanation so
I don't understand which object is getting the message if I put the string in in: selector

The matchTable: keyword provides a way to identify characters so that they become equivalent in comparisons. The argument is usually a ByteArray of 256 entries, containing at position i the code point of the ith character to be considered when comparing.
The main use of the table is to implement case-insensitive searches, where, e.g., A=a. Thus, instead of comparing the characters at hand during the search, what are compared are the elements found in the matchTable at their respective code points. So, instead of
(string1 at: i) = (string2 at: j)
the testing becomes something on the lines of
cp1 := string1 basicAt: i.
cp2 := string2 basicAt: j.
(table at: cp1) = (table at: cp2).
In other words, the matchTable: argument is used to map actual characters to the ones that actually matter for the comparisons.
Note that the same technique can be applied for case-sensitive/insensitive sorting.
Finally, bear in mind that this is a rather low-level method that non-system programmers would rarely need. You should be using instead higher level versions for finding substrings such as findString:startingAt:caseSensitive:, where the argument of the last keyword is a Boolean.

MATLAB selecting items considering the end of their name

I have to extract the onset times for a fMRI experiment. I have a nested output called "ResOut", which contains different matrices. One of these is called "cond", and I need the 4th element of it [1,2,3,4]. But I need to know its onset time just when the items in "pict" matrix (inside ResOut file) have a name that ends with "*v.JPG".
Here's the part of the code that I wrote (but it's not working):
for i=1:length(ResOut);
if ResOut(i).cond(4)==1 && ResOut(i).pict== endsWith(*"v.JPG")
What's wrong? Can you halp me to fix it out?
Thank you in advance,
Adriano

It's generally helpful to start with unfamiliar functions by reading their documentation to understand what inputs they are expecting. Per the documentation for endsWith, it expects two inputs: the input text and the pattern to match. In your example, you are only passing it one (incorrectly formatted) string input, so it's going to error out.
To fix this, call the function properly. For example:
filepath = ["./Some Path/mazeltov.jpg"; "~/Some Path/myfile.jpg"];
test = endsWith(filepath, 'v.jpg')
Returns:
test =
2×1 logical array
1
0
Or, more specifically to your code snippet:
endsWith(ResOut(i).pict, 'v.JPG')
Note that there is an optional third input, 'IgnoreCase', which you can pass as a boolean true/false to control whether or not the matching ignores case.

Checking if values in List is part of String

I have a string like this:
val a = "some random test message"
I have a list like this:
val keys = List("hi","random","test")
Now, I want to check whether the string a contains any values from keys. How can we do this using the in built library functions of Scala ?
( I know the way of splitting a to List and then do a check with keys list and then find the solution. But I'm looking a way of solving it more simply using standard library functions.)

Something like this?
keys.exists(a.contains(_))
Or even more idiomatically
keys.exists(a.contains)

The simple case is to test substring containment (as remarked in rarry's answer), e.g.
keys.exists(a.contains(_))
You didn't say whether you actually want to find whole word matches instead. Since rarry's answer assumed you didn't, here's an alternative that assumes you do.
val a = "some random test message"
val words = a.split(" ")
val keys = Set("hi","random","test") // could be a List (see below)
words.exists(keys contains _)
Bear in mind that the list of keys is only efficient for small lists. With a list, the contains method typically scans the entire list linearly until it finds a match or reaches the end.
For larger numbers of items, a set is not only preferable, but also is a more true representation of the information. Sets are typically optimised via hashcodes etc and therefore need less linear searching - or none at all.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

How to find most similar to an array in gensim - python-3.x

Related

gcloud translate submitting lists

Contribution analyses - tagged.database

squeak(smalltalk) how to use method `findSubstring: in: startingAt: matchTable:`?

MATLAB selecting items considering the end of their name

Checking if values in List is part of String

Categories

Resources