How to remove a string element from a list in Python .remove is not working for me - python-3.x

I am trying to remove strings from a list after they have been chosen to avoid getting the same word again but when I try to .remove or .pop it doesn't remove the word. Why is this and how could i sort it?
I also tried to create a copy of the word incase it got removed before returning the word from the function, would this affect the word if its already chosen?
Thanks for any help, I am new to programming, as you could probably tell!
def choose_a_word_easy(): # function for choosing random easy word.
words = ['ant', 'bee', 'cat', 'dog', 'egg', 'hat', 'golf', 'jelly', 'king', 'bird', 'hot', 'cold', 'fish', 'log',
'dad', 'mum', 'goal', 'help', 'file', 'neat', 'car', 'moon', 'eye', 'tree', 'rice', 'ice', 'speed', 'rat',
'water', 'rain', 'snow', 'spoon', 'light', 'gold', 'zoo', 'oil', 'goat', 'yoga', 'judo', 'japan', 'hello']
pick = random.choice(words) # randomly choose any word from the list.
# p1 = pick
words.remove(pick)
return pick

By declaring your list inside the function choose_a_word_easy, a new list is created on every call. You want the same list to be reused on every call. Do so by creating the list outside the function's scope and passing it as an argument.
words = ['ant', 'bee', 'cat', 'dog', ...]
def pick_and_remove(lst):
pick = random.choice(lst)
lst.remove(pick)
return pick
pick = pick_and_remove(words)
print(pick) # 'bee'
print(words) # ['ant', 'cat', 'dog', ...]
Note that your function can be made slightly more efficient by randomly picking an index and poping it.
def pick_and_remove(lst):
i = random.randrange(len(lst))
return lst.pop(i)

Related

how to separate the value inside a list of list base on the length value of character?

The data is like this, Here need to divide the integer character based on length.
length should be 10, rest of the peace not important.
anyone please help.
list_val = [['01234567890000','123456789','xyz'],['123456789','1234567890111','abcdefghijkl']]
new_list = [[] for i in range(len(list_val))]
for i in range(Len(list_val)):
for txt in list_val[i]:
if len(txt)>10:
new_list[i].append(re.split(r'(\d{10})', txt))
else:
new_list[i].append(txt)
output is:
[[['', '0123456789', '0000'], '123456789', 'xyz'],['123456789', ['', '1234567890', '111'], ['abcdefghijkl']]]
Here need to remove the inside list unnecessary parts.
desired output:
[['0123456789','123456789', 'xyz'],['123456789','1234567890', 'abcdefghijkl']]
It seems you want to truncate the numerical strings, not split them. Just use slices for that:
new_list = [[s[:10] if s.isdigit() else s for s in sub] for sub in list_val]
# [['0123456789', '123456789', 'xyz'], ['1234567890', '123456789', 'abcdefghijkl']]

How to get the count of a word in a database using pandas

For the same data base, the following 2 codes are showing different answers when executed. According to the answer given the 2nd one is correct but what is the mistake in the 1st code.
code 1
df=pd.read_csv("amazon_baby.csv", index_col ="name")
sw = ['awesome', 'great', 'fantastic', 'amazing', 'love', 'horrible', 'bad', 'terrible', 'awful', 'wow', 'hate']
for i in sw:
df[i]=df["review"].str.count(i)
y=df[i].sum(axis=0)
print(i,y)
code 2
df=pd.read_csv("amazon_baby.csv", index_col ="name")
sw = ['awesome', 'great', 'fantastic', 'amazing', 'love', 'horrible', 'bad', 'terrible', 'awful', 'wow', 'hate']
df['word_count']=df['review'].apply(lambda x:Counter(str(x).split()))
def great_count(x):
if 'great' in x:
return x.get('great')
else:
return 0
df['great3'] = df['word_count'].apply(great_count)
print (sum(df['great3']))
These pieces of code are quite different.
The first one takes each of the words in sw list and counts the number of occurences in string. This means that in string "this is great, this is the greatest" for word "great" it will show 2. This is the error, I suppose.
Second code splits the text in separate words: ['this', 'is', 'great,', 'this', 'is', 'the', 'greatest'], then calculates counts: Counter({'this': 2, 'is': 2, 'great,': 1, 'the': 1, 'greatest': 1}) and shows the sum of the column.
But!! There is no word "great" in the Counter - this is because of the comma. So this is also wrong.
A better way would be to get rid of punctuation at first. For example like this:
sum(1 for i in ''.join(ch for ch in t if ch not in string.punctuation).split() if i == 'great')

How to sort list of strings without using any pre-defined function?

I am new to python and I am stuck to find solution for one problem.
I have a list like ['hello', 'world', 'and', 'practice', 'makes', 'perfect', 'again'] which I want to sort without using any pre defined function.
I thought a lot but not able to solve it properly.
Is there any short and elegant way to sort such list of string without using pre-defined functions.
Which algorithm will be best suitable to sort list of strings?
Thanks.
This sounds like you're learning about sorting algorithms. One of the simplest sorting methods is bubblesort. Basically, it's just making passes through the list and looking at each neighboring pair of values. If they're not in the right order, we swap them. Then we keep making passes through the list until there are no more swaps to make, then we're done. This is not the most efficient sort, but it is very simple to code and understand:
values = ['hello', 'world', 'and', 'practice', 'makes', 'perfect', 'again']
def bubblesort(values):
'''Sort a list of values using bubblesort.'''
sorted = False
while not sorted:
sorted = True
# take a pass through every pair of values in the list
for index in range(0, len(values)-1):
if values[index] > values[index+1]:
# if the left value is greater than the right value, swap them
values[index], values[index+1] = values[index+1], values[index]
# also, this means the list was NOT fully sorted during this pass
sorted = False
print(f'Original: {values}')
bubblesort(values)
print(f'Sorted: {values}')
## OUTPUT ##
# Original: ['hello', 'world', 'and', 'practice', 'makes', 'perfect', 'again']
# Sorted: ['again', 'and', 'hello', 'makes', 'perfect', 'practice', 'world']
There are lots more sorting algorithms to learn about, and they each have different strengths and weaknesses - some are faster than others, some take up more memory, etc. It's fascinating stuff and worth it to learn more about Computer Science topics. But if you're a developer working on a project, unless you have very specific needs, you should probably just use the built-in Python sorting algorithms and move on:
values = ['hello', 'world', 'and', 'practice', 'makes', 'perfect', 'again']
print(f'Original: {values}')
values.sort()
print(f'Sorted: {values}')
## OUTPUT ##
# Original: ['hello', 'world', 'and', 'practice', 'makes', 'perfect', 'again']
# Sorted: ['again', 'and', 'hello', 'makes', 'perfect', 'practice', 'world']

How to replace the items from given dictionary?

I am a newbie in python. I was playing with dictionaries and wanted to know the solution to the given problem
list_ = [['any', 'window', 'says', 'window'], ['dog', 'great'], ['after', 'end', 'explains', 'income', '.', '?']]
dictionary=[('dog', 'cat'), ('window', 'any')]
def replace_matched_items(word_list, dictionary):
int_word = []
int_wordf = []
for lst in word_list:
for ind, item in enumerate(lst):
for key,value in dictionary:
if item in key:
lst[ind] = key
else:
lst[ind] = "NA"
int_word.append(lst)
int_wordf.append(int_word)
return int_wordf
list_ = replace_matched_items(list_, dictionary)
print(list_ )
Output generated is:
[[['NA', 'window', 'NA', 'window'], ['NA', 'NA'], ['NA', 'NA', 'NA', 'NA', 'NA', 'NA']]]
The expected output is:
[[['NA', 'window', 'NA', 'window'], ['dog', 'NA'], ['NA', 'NA', 'NA', 'NA', 'NA', 'NA']]]
I am using python 3
Thanks in advance
Some quick introduction to data structures in python just to clarify your question.
A list is similar to your arrays, where they can be accessed via their index and are mutable meaning items within the list can be changed. Lists are generally identified by brackets [].
For example:
my_array = [4, 8, 16, 32]
print(my_array[0]) # returns 4
print(my_array[3]) # returns 32
my_array[2] = 0
print(my_array) # returns [4, 8, 0, 32]
A tuple is similar to a list, however, the main difference is that they are immutable meaning items within the tuple cannot be changed. Items can still be accessed via their index. They are generally identified by parenthesis ().
For example:
my_tuple = ('this', 'is', 'a', 'tuple', 'of', 'strings')
print(my_tuple[0]) # returns 'this'
my_tuple[1] = 'word' # throws a 'TypeError' as items within tuples cannot be changed.
A dictionary uses a key and value pair to access, store, and change data in a dictionary. Similar to lists, they are mutable, however, each value has their own unique key. To access a value in the dictionary, you have to pass the key within the dictionary. Dictionaries are generally identified by curly braces {}.
For Example:
my_dictionary = {'John':13, 'Bob':31, 'Kelly':24, 'Ryan':17}
print(my_dictionary['Kelly']) # Returns 24
my_dictionary['Kelly'] = 56 # Assigns 56 to Kelly
print(my_dictionary['Kelly']) # Returns 56
The key:value takes this form within the dictionary, and each subsequent key-value pairs are separated by commas.
I would highly suggested reading the official documentation on the Data Structures available for python: Link Here
To answer the question
From your code given, you've used a tuple with your key-value pair encapsulated the tuple in a list as your dictionary data structure.
Your expected output is a result of you iterating through the entire dictionary, and did not handle what will occur once you've found the key for your dictionary. This can be fixed by adding a break statement within your if-statement once a key has been found.
The break statement, will exit your for loop once a key has been found and, will continue onto the next list item.
Your function will end up looking like:
def replace_matched_items(word_list, dictionary):
int_word = []
int_wordf = []
for lst in word_list:
for ind, item in enumerate(lst):
for key,value in dictionary:
if item in key:
lst[ind] = key
break
else:
lst[ind] = "NA"
int_word.append(lst)
int_wordf.append(int_word)
return int_wordf
Suggested to use a Dictionary
Using a Dictionary data structure for your key and value pairs will let you have access to methods that'll let you check whether a key exists inside your dictionary.
If you have a list of keys, and you'd like to check if a dictionary key exists within a list:
this_list = ['any', 'window', 'says', 'window', 'dog',
'great', 'after', 'end', 'explains', 'income', '.', '?']
dictionary = {'dog':'cat', 'window':'any'}
matched_list = []
for keys in dictionary:
if keys in this_list:
matched_list.append(keys) # adds keys that are matched
else:
# do something if the key is in not the dictionary
print(matched_list)
# Returns ['dog', 'window']

Difference between the total number of words (length of a list) and vocabulary of a list or file in NLP?

How to compute the total number of words and vocabulary of a corpus stored as a list in python? What is the major difference between these two terms?
Suppose, I am using the following list. The total number of words or the length of the list can be computed by len(L1). However, I am interested to know how to calculate the vocabulary of the below mentioned list.
L1 = ['newnes', 'imprint', 'elsevier', 'elsevier', 'corporate', 'drive', 'suite',
'burlington', 'usa', 'linacre', 'jordan', 'hill', 'oxford', 'uk',
'elsevier', 'inc', 'right', 'reserved', 'exception', 'newness', 'uk', 'military',
'organization', 'summary', 'task', 'definition', 'system', 'definition',
'system', 'engineering', 'military', 'project', 'military', 'project',
'definition', 'input', 'output', 'operation', 'requirement', 'development',
'overview', 'spacecraft', 'development', 'architecture', 'design']
Is this what you're looking for?
from nltk.stem import WordNetLemmatizer
lemmatizer = WordNetLemmatizer()
list_of_tokens = ['cat', 'dog','cats', 'children','dog']
unique_tokens = set(list_of_tokens)
### {'cat', 'cats', 'children', 'dog'}
tokens_lemmatized = [ lemmatizer.lemmatize(token) for token in unique_tokens]
#### ['child', 'cat', 'cat', 'dog']
unique_tokens_lemmatized = set(tokens_lemmatized)
#### {'cat', 'child', 'dog'}
print('Input tokens:',len(list_of_tokens) , 'Lemmmatized tokens:', len(unique_tokens_lemmatized)
#### Input tokens: 5 Lemmmatized tokens: 3
If your question is regarding how to get the number of unique words in a list, that can be achieved using sets. (From what I remember from NLP, the vocabulary of a corpus should mean the collection of unique words in that corpus.)
Convert your list to a set using the set() method, then call len() on that. In your case, you would get the number of unique words in the list L1 like so:
len(set(L1)) #number of unique words in L1
Edit: You now mentioned that the vocabulary is the set of lemmatized words. In this case, you would do the same thing except import a lemmatizer from NLTK or whatever NLP library you're using, run your list or whatever into the lemmatizer, and convert the output into a set and proceed with the above.

Resources