How to add single quote to values in list? - python-3.x

Here is my list
x = ['India,America,Australia,Japan']
How to convert above list into
x = ['India','America','Australia','Japan']
I tried it using strip and split method but it doesn't work.

You can turn that list into the string and split in commas:
test = ['India,America,Australia,Japan']
result = "".join(test).split(",")
print(result)
Output is this:
['India', 'America', 'Australia', 'Japan']
or you can use regex library.
import re
x = "".join(['India,America,Australia,Japan'])
xText = re.compile(r"\w+")
mo = xText.findall(x)
print(mo)
The findall method looks for all the word characters and does not include comma. Finally it returns a list.
Output is this:
['India', 'America', 'Australia', 'Japan']

Related

How to insert variable length list into string

I have what I think is a basic question in Python:
I have a list that can be variable in length and I need to insert it into a string for later use.
Formatting is simple, I just need a comma between each name up to nameN and parenthesis surrounding the names.
List = ['name1', 'name2' .... 'nameN']
string = "Their Names are <(name1 ... nameN)> and they like candy.
Example:
List = ['tom', 'jerry', 'katie']
print(string)
Their Names are (tom, jerry, katie) and they like candy.
Any ideas on this? Thanks for the help!
# Create a comma-separated string with names
the_names = ', '.join(List) # 'tom, jerry, katie'
# Interpolate it into the "main" string
string = f"Their Names are ({the_names}) and they like candy."
There are numerous ways to achieve that.
You could use print + format + join similar to the example from #ForceBru.
Using format would make it compatible with both Python2 and Python3.
names_list = ['tom', 'jerry', 'katie']
"""
Convert the list into a string with .join (in this case we are separating with commas)
"""
names_string = ', '.join(names_list)
# names_string == "tom, katie, jerry"
# Now add one string inside the other:
string = "Their Names are ({}) and they like candy.".format(names_string)
print(string)
>> Their Names are (tom, jerry, katie) and they like candy.

Create a string from a list using list comprehension

I am trying to create a string separated by comma from the below given list
['D:\\abc\\pqr\\123\\aaa.xlsx', 'D:\\abc\\pqr\\123\\bbb.xlsx', 'D:\\abc\\pqr\\123\\ccc.xlsx']
New string should contain only the filename like below which is separated by comma
'aaa.xlsx,bbb.xlsx,ccc.xlsx'
I have achieved this using the below code
n = []
for p in input_list:
l = p.split('\\')
l = l[len(l)-1]
n.append(l)
a = ','.join(n)
print(a)
But instead of using multiple lines of code i would like to achieve this in single line using a list comprehension or regular expression.
Thanks in advance...
Simply do a
main_list = ['D:\\abc\\pqr\\123\\aaa.xlsx', 'D:\\abc\\pqr\\123\\bbb.xlsx', 'D:\\abc\\pqr\\123\\ccc.xlsx']
print([x.split("\\")[-1] for x in main_list])
OUTPUT:
['aaa.xlsx', 'bbb.xlsx', 'ccc.xlsx']
In case u want to get the string of this simply do a
print(",".join([x.split("\\")[-1] for x in main_list]))
OUTPUT:
aaa.xlsx,bbb.xlsx,ccc.xlsx
Another way to do the same is:
print(",".join(map(lambda x : x.split("\\")[-1],main_list)))
OUTPUT:
aaa.xlsx,bbb.xlsx,ccc.xlsx
Do see that os.path.basename is OS-dependent and may create problems on cross-platform scripts.
Using os.path.basename with str.join
Ex:
import os
data = ['D:\\abc\\pqr\\123\\aaa.xlsx', 'D:\\abc\\pqr\\123\\bbb.xlsx', 'D:\\abc\\pqr\\123\\ccc.xlsx']
print(",".join(os.path.basename(i) for i in data))
Output:
aaa.xlsx,bbb.xlsx,ccc.xlsx

Extract characters within certain symbols

I have extracted text from an HTML file, and have the whole thing in a string.
I am looking for a method to loop through the string, and extract only values that are within square brackets and put strings in a list.
I have looked in to several questions, among them this one: Extract character before and after "/"
But i am having a hard time modifying it. Can someone help?
Solved!
Thank you for all your inputs, I will definitely look more into regex. I managed to do what i wanted in a pretty manual way (may not be beautiful):
#remove all html code and append to string
for i in html_file:
html_string += str(html2text.html2text(i))
#set this boolean if current character is either [ or ]
add = False
#extract only values within [ or ], based on add = T/F
for i in html_string:
if i == '[':
add = True
if i == ']':
add = False
clean_string += str(i)
if add == True:
clean_string += str(i)
#split string into list without square brackets
clean_string_list = clean_string.split('][')
The HTML file I am looking to get as pure text (dataframe later on) instead of HTML, is my personal Facebook data that i have downloaded.
Try out this regex, given a string it will place all text inside [ ] into a list.
import re
print(re.findall(r'\[(\w+)\]','spam[eggs][hello]'))
>>> ['eggs', 'hello']
Also this is a great reference for building your own regex.
https://regex101.com
EDIT: If you have nested square brackets here is a function that will handle that case.
import re
test ='spam[eg[nested]gs][hello]'
def square_bracket_text(test_text,found):
"""Find text enclosed in square brackets within a string"""
matches = re.findall(r'\[(\w+)\]',test_text)
if matches:
found.extend(matches)
for word in found:
test_text = test_text.replace('[' + word + ']','')
square_bracket_text(test_text,found)
return found
match = []
print(square_bracket_text(test,match))
>>>['nested', 'hello', 'eggs']
hope it helps!
You can also use re.finditer() for this, see below example.
Let suppose, we have word characters inside brackets so regular expression will be \[\w+\].
If you wish, check it at https://rextester.com/XEMOU85362.
import re
s = "<h1>Hello [Programmer], you are [Excellent]</h1>"
g = re.finditer("\[\w+\]", s)
l = list() # or, l = []
for m in g:
text = m.group(0)
l.append(text[1: -1])
print(l) # ['Programmer', 'Excellent']

Replacing spaces in lists

I'm creating a google searcher in python. Is there any way that I can replace a space in a list with a "+" for my url? This is my code so far:
q=input("Question=")
qlist=list(q)
#print(qlist)
Can I replace any spaces in my list with a plus, and then turn that back into a string?
Just want to add another line of thought there. Try the urllib library for parsing url strings.
Here's an example:
import urllib
## Create an empty dictionary to hold values (for questions and answers).
data = dict()
## Sample input
input = 'This is my question'
### Data key can be 'Question'
data['Question='] = input
### We'll pass that dictionary hrough the urlencode method
url_values = urllib.parse.urlencode(data)
### And print results
print(url_values)
#-------------------------------------------------------------------------------------------------------
#-------------------------------------------------------------------------------------------------------
#Alternatively, you can setup the dictionary a little better if you only have a couple of key-value pairs
## Input
input = 'This is my question'
# Our dictionary; We can set the input value as the value to the Question key
data = {
'Question=': input
}
print(urllib.parse.urlencode(data))
Output:
'Question%3D=This+is+my+question'
You can just join it together to create 1 long string.
qlist = my_string.split(" ")
result = "+".join(qlist)
print("Output string: {}".format(result))
Look at the join and split operations in python.
q = 'dog cat'
list_info = q.split()
https://docs.python.org/3/library/stdtypes.html#str.split
q = ['dog', 'cat']
s_info = ''.join(q)
https://docs.python.org/3/library/stdtypes.html#str.join

i get this error "expexted string or buffer"

file = open("C:\\Users\\file.txt")
text = file.read()
def ie_preprocess(text):
sent_tokenizer = PunktSentenceTokenizer(text)
sents=sent_tokenizer.tokenize(text)
print(sents)
word_tokenizer = WordPunctTokenizer()
words =nltk.word_tokenize(sents)
print(words)
tagges = nltk.pos_tag(words)
print(tagges)
ie_preprocess(text)
nltk.word_tokenize() takes in text which is expected to be a string, but you are passing in sents which is a list of sentences.
Instead, you want:
words = nltk.word_tokenize(text)
If you would like to tokenize each sentence into a list of words and get this back as a list of lists, you could use
words = [nltk.word_tokenize(sentence) for sentence in sents]

Resources