I have a string in dictionary format like:
{'fossils': [{Synset('dodo.n.01'): {'of', 'is', 'out', 'someone', 'fashion',
'whose', 'style'},Synset('fossil.n.02'): {'age', 'that', 'an', 'has', 'past',
'from', 'excavated', 'plant', '(',')', 'and', 'animal', 'in', 'remains',
'geological', 'soil', 'existed', 'impression', 'of', 'or', 'been', 'the', 'a'},
Synset('fossil.a.01'): {'of', 'a', 'fossil', 'characteristic'}}],
'disturbing': [{Synset('disturb.v.01'): {'me', 'This', 'upset', 'thought','book',
'deeply','move', 'troubling', 'A'}, Synset('agitate.v.06'): {'of', 'or',
'arrangement', 'the', 'position', 'change'}, Synset('touch.v.11'): {'my', '!',
'touch', 'Do', 'tamper', 'with', 'CDs', "n't"}, Synset('interrupt.v.02'): {'of',
'or', 'peace', 'tranquility', 'destroy', 'the', 'I', 'me', 'interrupt', 'Do',
'reading', 'when', "'m", "n't"}}]}
I want to convert this into a dictionary. The format of the dictionary is
{key: list of dictionaries as value}
Please help me to sort this out
Thanks
you have "Objects" (called Synset). usually, you can do this sort of convert with json.loads(str) to get the value of the dictionary.
because you have the objects, you need to fix this manually (pre-process the string prior to the json.loads) and later post-process to have the objects back.
edit: moreover, you have multiple types of parantheses which might humper things when loading the string.
edit2: There is of course another option (if your Synset class is known and you trust it's source):
import Synset # for swift conversion
dict_val = eval(dict_like_str)
I'll also add these lines in my original
Related
In NER task we want to classification sentence tokens with using different approaches (BIO, for example). But we cant join any subtokens when tokenizer divides sentences stronger.
I would like to classificate 'weight 40.5 px' sentence with custom tokenization (by space in this example)
But after tokenization
tokenizer.convert_ids_to_tokens(tokenizer(['weight', '40.5', 'px'], is_split_into_words=True)['input_ids'])
i had
['[CLS]', 'weight', '40', '.', '5', 'p', '##x', '[SEP]']
when '40.5' splitted into another tokens '40', '.', '5'. Its problem for me, because i want to classificate 3 tokens ('weight', '40.5', 'px'), but it not merge automaticaly, because '40', '.', '5' not looks like '40', '##.', '##5'.
What can i do to solve this problem?
you can get the relation between raw text and tokenized tokens through “offset_mapping”
I have a list of lists, in which each inner-list is a tokenized text, so its length is the number of words in the text.
corpus = [['this', 'is', 'text', 'one'], ['this', 'is', 'text', 'two']]
Now, I want to create a set that contains all unique tokens from the corpus. For the above example, the desired output would be:
{'this', 'is', 'text', 'one', 'two}
Currently, I have:
all_texts_list = list(chain(*corpus))
vocabulary = set(all_texts_list)
But this seems a memory-inefficient way of doing it.
Is there a more efficient way to obtain this set?
I found this link. However, there they want to find the set of unique lists and not the set of unique elements from the list.
You can use a simple for loop with set update operation.
vocabulary = set()
for tokens in corpus:
vocabulary.update(tokens)
Output:
{'this', 'one', 'text', 'two', 'is'}
I have a free text like '2packetLays 1 liter milk 2loafsbrownbread2kgbasmatirice' and want to split to below format - to separate out closely matched food items:
['2', 'packet', 'Lays', '1', 'liter', 'milk', '2', 'loafs', 'brownbread', '2', 'kg', 'basmatirice']
I was able to split above text to words like below:
['2', 'packet', 'Lays', '1', 'liter', 'milk', '2', 'loafs', 'brown' ,'bread', '2', 'kg', 'basmati', 'rice']
But I want 'brownbread' as one word but I was getting as 2 different words:
['brown' ,'bread']
and 'basmatirice' also as one word
I am new to python and I am stuck to find solution for one problem.
I have a list like ['hello', 'world', 'and', 'practice', 'makes', 'perfect', 'again'] which I want to sort without using any pre defined function.
I thought a lot but not able to solve it properly.
Is there any short and elegant way to sort such list of string without using pre-defined functions.
Which algorithm will be best suitable to sort list of strings?
Thanks.
This sounds like you're learning about sorting algorithms. One of the simplest sorting methods is bubblesort. Basically, it's just making passes through the list and looking at each neighboring pair of values. If they're not in the right order, we swap them. Then we keep making passes through the list until there are no more swaps to make, then we're done. This is not the most efficient sort, but it is very simple to code and understand:
values = ['hello', 'world', 'and', 'practice', 'makes', 'perfect', 'again']
def bubblesort(values):
'''Sort a list of values using bubblesort.'''
sorted = False
while not sorted:
sorted = True
# take a pass through every pair of values in the list
for index in range(0, len(values)-1):
if values[index] > values[index+1]:
# if the left value is greater than the right value, swap them
values[index], values[index+1] = values[index+1], values[index]
# also, this means the list was NOT fully sorted during this pass
sorted = False
print(f'Original: {values}')
bubblesort(values)
print(f'Sorted: {values}')
## OUTPUT ##
# Original: ['hello', 'world', 'and', 'practice', 'makes', 'perfect', 'again']
# Sorted: ['again', 'and', 'hello', 'makes', 'perfect', 'practice', 'world']
There are lots more sorting algorithms to learn about, and they each have different strengths and weaknesses - some are faster than others, some take up more memory, etc. It's fascinating stuff and worth it to learn more about Computer Science topics. But if you're a developer working on a project, unless you have very specific needs, you should probably just use the built-in Python sorting algorithms and move on:
values = ['hello', 'world', 'and', 'practice', 'makes', 'perfect', 'again']
print(f'Original: {values}')
values.sort()
print(f'Sorted: {values}')
## OUTPUT ##
# Original: ['hello', 'world', 'and', 'practice', 'makes', 'perfect', 'again']
# Sorted: ['again', 'and', 'hello', 'makes', 'perfect', 'practice', 'world']
I have a list 'decrypted_characters' that looks like this
['mathematician', 'to', 'present', 'a', 'proof', 'of', 'the', 'sensitivity']
I want it to look like a paragraph so I tried
decrypted_word = ''.join(decrypted_characters)
and when I print it looks like
mathematiciantopresentaproofofthesensitivity
How to add spaces after each element so that it looks like sentence? I want it to look like
mathematician to present a proof of the sensitivity
Thanks!
Use join:
In [4]: ' '.join(['mathematician', 'to', 'present', 'a', 'proof', 'of', 'the', 'sensitivity'])
Out[4]: 'mathematician to present a proof of the sensitivity'
decrypted_characters = ['mathematician', 'to', 'present', 'a', 'proof', 'of', 'the', 'sensitivity']
decrypted_word = ' '.join(decrypted_characters)
print(decrypted_word)
Try this out.