Difference between defining a variable as a=[a] and a= ' ' in python - python-3.x

I have below two doubts
1)I am trying to build a wordcloud and for doing that am defining a variable a=[ ] but it throws error but if I define it as a='' it works well. Please tell me what is the difference between them?
I am using below two for loops but both of them show difference output whereas I expect them to show the same? What is the difference between them.
a)
allwords=[]
for i in data['Url']:
allwords+= ' '.join(i)
b) all_words = ' '.join([text for text in data['Url']])

About item A
There is an error in your code. To add an element to a list, use the append method:
allwords=[]
for i in data['Url']:
allwords.append(' '.join(i))
Use += to concatenate the string or append method to add an item to a list.
About item B
This works properly because is an attribution.

Related

Use split function to split recursively in python

I have written a piece of code which takes a keyword and splits a name using that specific keyword, however as split returns the list, now I want to individually check the elements of the returned list and again split it using a different keyword(if it exists), but this time I dont want another sublist to be returned, rather then the elements should get extended in the same list.
Code below :-
def get_comb_drugs(keyword, name):
if keyword in name:
name = name.split(keyword)
return name
print(get_comb_drugs(", polymer with", "Acetaldehyde, polymer with ammonia and formaldehyde"))
The output I get is:
['Acetaldehyde', ' ammonia and formaldehyde']
however, I want to split 'ammonia and formaldehyde' again using " and " keyword and the exact output I want is:
['Acetaldehyde', ' ammonia', 'formaldehyde']
Guide me in achieving the desired result.
You can use re.split instead with an alternation pattern:
import re
separators = [', polymer with', ' and ']
re.split('|'.join(separators), 'Acetaldehyde, polymer with ammonia and formaldehyde')
This returns:
['Acetaldehyde', ' ammonia', 'formaldehyde']

Remove & add split-list using dictionary python [duplicate]

I have the code below. I'm trying to remove two strings from lists predict strings and test strings if one of them has been found in the other. The issue is that I have to split up each of them and check if there is a "portion" of one string inside the other. If there is then I just say there is a match and then delete both strings from the list so they are no longer iterated over.
ValueError: list.remove(x): x not in list
I get the above error though and I am assuming this is because I can't delete the string from test_strings since it is being iterated over? Is there a way around this?
Thanks
for test_string in test_strings[:]:
for predict_string in predict_strings[:]:
split_string = predict_string.split('/')
for string in split_string:
if (split_string in test_string):
no_matches = no_matches + 1
# Found match so remove both
test_strings.remove(test_string)
predict_strings.remove(predict_string)
Example input:
test_strings = ['hello/there', 'what/is/up', 'yo/do/di/doodle', 'ding/dong/darn']
predict_strings =['hello/there/mister', 'interesting/what/that/is']
so I want there to be a match between hello/there and hello/there/mister and for them to be removed from the list when doing the next comparison.
After one iteration I expect it to be:
test_strings == ['what/is/up', 'yo/do/di/doodle', 'ding/dong/darn']
predict_strings == ['interesting/what/that/is']
After the second iteration I expect it to be:
test_strings == ['yo/do/di/doodle', 'ding/dong/darn']
predict_strings == []
You should never try to modify an iterable while you're iterating over it, which is still effectively what you're trying to do. Make a set to keep track of your matches, then remove those elements at the end.
Also, your line for string in split_string: isn't really doing anything. You're not using the variable string. Either remove that loop, or change your code so that you're using string.
You can use augmented assignment to increase the value of no_matches.
no_matches = 0
found_in_test = set()
found_in_predict = set()
for test_string in test_strings:
test_set = set(test_string.split("/"))
for predict_string in predict_strings:
split_strings = set(predict_string.split("/"))
if not split_strings.isdisjoint(test_set):
no_matches += 1
found_in_test.add(test_string)
found_in_predict.add(predict_string)
for element in found_in_test:
test_strings.remove(element)
for element in found_in_predict:
predict_strings.remove(element)
From your code it seems likely that two split_strings match the same test_string. The first time through the loop removes test_string, the second time tries to do so but can't, since it's already removed!
You can try breaking out of the inner for loop if it finds a match, or use any instead.
for test_string, predict_string in itertools.product(test_strings[:], predict_strings[:]):
if any(s in test_string for s in predict_string.split('/')):
no_matches += 1 # isn't this counter-intuitive?
test_strings.remove(test_string)
predict_strings.remove(predict_string)

Python3 print selected values of dict

In this simple code to read a tsv file of many columes:
InColnames = ['Chr','Pos','Ref','Alt']
tsvin = csv.DictReader(fin, delimiter='\t')
for row in tsvin:
print(', '.join(row[InColnames]))
How can I make the print work ?
The following will do:
for row in tsvin:
print(', '.join(row[col] for col in InCOlNames))
You cannot pass a list of keys to the dict's item-lookup and magically get a list of values. You have to somehow iterate the keys and retrieve each one's value individually. The approach at hand uses a generator expression for that.

Updating dictionary - Python

total=0
line=input()
line = line.upper()
names = {}
(tag,text) = parseLine(line) #initialize
while tag !="</PLAY>": #test
if tag =='<SPEAKER>':
if text not in names:
names.update({text})
I seem to get this far and then draw a blank.. This is what I'm trying to figure out. When I run it, I get:
ValueError: dictionary update sequence element #0 has length 8; 2 is required
Make an empty dictionary
Which I did.
(its keys will be the names of speakers and its values will be how many times s/he spoke)
Within the if statement that checks whether a tag is <SPEAKER>
If the speaker is not in the dictionary, add him to the dictionary with a value of 1
I'm pretty sure I did this right.
If he already is in the dictionary, increment his value
I'm not sure.
You are close, the big issue is on this line:
names.update({text})
You are trying to make a dictionary entry from a string using {text}, python is trying to be helpful and convert the iterable inside the curly brackets into a dictionary entry. Except the string is too long, 8 characters instead of two.
To add a new entry do this instead:
names.update({text:1})
This will set the initial value.
Now, it seems like this is homework, but you've put in a bit of effort already, so while I won't answer the question I'll give you some broad pointers.
Next step is checking if a value already exists in the dictionary. Python dictionaries have a get method that will retrieve a value from the dictionary based on the key. For example:
> names = {'romeo',1}
> print names.get('romeo')
1
But will return None if the key doesn't exist:
> names = {'romeo',1}
> print names.get('juliet')
None
But this takes an optional argument, that returns a different default value
> names = {'romeo',2}
> print names.get('juliet',1)
1
Also note that your loop as it stands will never end, as you only set tag once:
(tag,text) = parseLine(line) #initialize
while tag !="</PLAY>": #test
# you need to set tag in here
# and have an escape clause if you run out of file
The rest is left as an exercise for the reader...

how use struct.pack for list of strings

I want to write a list of strings to a binary file. Suppose I have a list of strings mylist? Assume the items of the list has a '\t' at the end, except the last one has a '\n' at the end (to help me, recover the data back). Example: ['test\t', 'test1\t', 'test2\t', 'testl\n']
For a numpy ndarray, I found the following script that worked (got it from here numpy to r converter):
binfile = open('myfile.bin','wb')
for i in range(mynpdata.shape[1]):
binfile.write(struct.pack('%id' % mynpdata.shape[0], *mynpdata[:,i]))
binfile.close()
Does binfile.write automatically parses all the data if variable has * in front it (such in the *mynpdata[:,i] example above)? Would this work with a list of integers in the same way (e.g. *myIntList)?
How can I do the same with a list of string?
I tried it on a single string using (which I found somewhere on the net):
oneString = 'test'
oneStringByte = bytes(oneString,'utf-8')
struct.pack('I%ds' % (len(oneString),), len(oneString), oneString)
but I couldn't understand why is the % within 'I%ds' above replaced by (len(oneString),) instead of len(oneString) like the ndarray example AND also why is both len(oneString) and oneString passed?
Can someone help me with writing a list of string (if necessary, assuming it is written to the same binary file where I wrote out the ndarray) ?
There's no need for struct. Simply join the strings and encode them using either a specified or an assumed text encoding in order to turn them into bytes.
''.join(L).encode('utf-8')

Resources