Mapping the sentences, which are of variable length to particular key - python-3.x

I'm a newbie to Python as well as to this forum. Below is the question
The file is as mentioned in the image
File Format.
I'm able to split the text in text2 column and write to different rows with the below code
myfile=open('Output.csv,'w')
wr=csv.writer(myfile,lineterminator='\n')
df=pd.read_excel("Input.xlsx")
for txt in df['Text2']:
sentence.append(txt.split('.'))
for pharse in sentence:
for words in pharse:
wr.writerow([words])
I need a help on how to map the sentences, which are of variable length to the key.Also, how to achieve the specific format as mentioned in attached image file.
Also, the writerow function starts writing in the first row but how to specify to begin with column three.
Any help on this is much appreciated!!

Try this:
myfile = open('Output.csv','w')
wr = csv.writer(myfile, lineterminator='\n')
entries = {}
for k, txt1, txt2 in df.values:
sentences = [s.strip() for s in txt2.split('.') if len(s.strip()) > 0]
# sentences = [s.strip() + '.' for s in txt2.split('.') if len(s.strip()) > 0]
entries[k] = [txt1, sentences]
for k in entries.keys():
txt1, txt2 = entries[k]
wr.writerow([k, txt1, txt2[0]])
for s in txt2[1:]:
wr.writerow(['', '', s])
myfile.close()
Use alternative sentences = ... line (the line commented in the above code) if you want to have a dot at the end of each sentence in the csv file. From your example image it is not clear what needs to happen to the dot (sometimes it appears in the output and sometimes it does not).
Also, if so desired, the code can be further simplified by combining the two loops into one loop:
myfile = open('Output.csv','w')
wr = csv.writer(myfile,lineterminator='\n')
for k, txt1, txt2 in df.values:
sentences = [s.strip() for s in txt2.split('.') if len(s.strip()) > 0]
wr.writerow([k, txt1, sentences[0]])
for s in sentences[1:]:
wr.writerow([None,'',s])
myfile.close()

Related

Appending line from text file to another text file with python

I am working on some file operations with python.
I have two text files. First, contains a lot of lines about bigram word embedding results such as apple_pie 0.3434 0.6767 0.2312. And another text file which contains a lot of lines with unigram word embedding results of apple_pie has apple 0.2334 0.3412 0.123 pie 0.976 0.75654 0.2312
I want to append apple_pie bigram word embedding results with apple and pie unigram so it result becomes something like:
apple_pie 0.3434 0.6767 0.2312 0.2334 0.3412 0.123 0.976 0.75654 0.2312 in one line. Does anybody know how to do this? Thanks...
bigram = open("bigram.txt",'r')
unigram = open("unigram.txt",'r')
combine =open("combine.txt",'w')
bigram_lines = bigram.readlines()
unigram_lines = unigram.readlines()
iteration = 0
while iteration < len(bigram_lines):
num_list_bigram = []
text_list_bigram = []
for item in bigram_lines[iteration].split(" "):
if "." in item:
num_list_bigram.append(item)
else:
text_list_bigram.append(item)
num_list_unigram = []
text_list_unigram = []
for item in unigram_lines[iteration].split(" "):
if "." in item:
num_list_unigram.append(item)
else:
text_list_unigram.append(item)
iteration+=1
com_list=text_list_bigram+num_list_bigram+num_list_unigram
for item in com_list:
combine.write(item+" ")
combine.write("\n")
bigram.close()
unigram.close()
combine.close()
Hopefully this will work for you

How do I print out results on a separate line after converting them from a set to a string?

I am currently trying to compare to text files, to see if they have any words in common in both files.
The text files are as
ENGLISH.TXT
circle
table
year
competition
FRENCH.TXT
bien
competition
merci
air
table
My current code is getting them to print, Ive removed all the unnessecary squirly brackets and so on, but I cant get them to print on different lines.
List = open("english.txt").readlines()
List2 = open("french.txt").readlines()
anb = set(List) & set(List2)
anb = str(anb)
anb = (str(anb)[1:-1])
anb = anb.replace("'","")
anb = anb.replace(",","")
anb = anb.replace('\\n',"")
print(anb)
The output is expected to separate both results onto new lines.
Currently Happening:
Competition Table
Expected:
Competition
Table
Thanks in advance!
- Xphoon
Hi I'd suggest you to try two things as a good practice:
1) Use "with" for opening files
with open('english.txt', 'r') as englishfile, open('french.txt', 'r') as frenchfile:
##your python operations for the file
2) Try to use the "f-String" opportunity if you're using Python 3:
print(f"Hello\nWorld!")
File read using "open()" vs "with open()"
This post explains very well why to use the "with" statement :)
And additionally to the f-strings if you want to print out variables do it like this:
print(f"{variable[index]}\n variable2[index2]}")
Should print out:
Hello and World! in seperate lines
Here is one solution including converting between sets and lists:
with open('english.txt', 'r') as englishfile, open('french.txt', 'r') as frenchfile:
english_words = englishfile.readlines()
english_words = [word.strip('\n') for word in english_words]
french_words = frenchfile.readlines()
french_words = [word.strip('\n') for word in french_words]
anb = set(english_words) & set(french_words)
anb_list = [item for item in anb]
for item in anb_list:
print(item)
Here is another solution by keeping the words in lists:
with open('english.txt', 'r') as englishfile, open('french.txt', 'r') as frenchfile:
english_words = englishfile.readlines()
english_words = [word.strip('\n') for word in english_words]
french_words = frenchfile.readlines()
french_words = [word.strip('\n') for word in french_words]
for english_word in english_words:
for french_word in french_words:
if english_word == french_word:
print(english_word)

Expected str instance, int found. How do I change an int to str to make this code work?

I'm trying to write code that analyses a sentence that contains multiple words and no punctuation. I need it to identify individual words in the sentence that is entered and store them in a list. My example sentence is 'ask not what your country can do for you ask what you can do for your country. I then need the original position of the word to be written to a text file. This is my current code with parts taken from other questions I've found but I just can't get it to work
myFile = open("cat2numbers.txt", "wt")
list = [] # An empty list
sentence = "" # Sentence is equal to the sentence that will be entered
print("Writing to the file: ", myFile) # Telling the user what file they will be writing to
sentence = input("Please enter a sentence without punctuation ") # Asking the user to enter a sentenc
sentence = sentence.lower() # Turns everything entered into lower case
words = sentence.split() # Splitting the sentence into single words
positions = [words.index(word) + 1 for word in words]
for i in range(1,9):
s = repr(i)
print("The positions are being written to the file")
d = ', '.join(positions)
myFile.write(positions) # write the places to myFile
myFile.write("\n")
myFile.close() # closes myFile
print("The positions are now in the file")
The error I've been getting is TypeError: sequence item 0: expected str instance, int found. Could someone please help me, it would be much appreciated
The error stems from .join due to the fact you're joining ints on strings.
So the simple fix would be using:
d = ", ".join(map(str, positions))
which maps the str function on all the elements of the positions list and turns them to strings before joining.
That won't solve all your problems, though. You have used a for loop for some reason, in which you .close the file after writing. In consequent iterations you'll get an error for attempting to write to a file that has been closed.
There's other things, list = [] is unnecessary and, using the name list should be avoided; the initialization of sentence is unnecessary too, you don't need to initialize like that. Additionally, if you want to ask for 8 sentences (the for loop), put your loop before doing your work.
All in all, try something like this:
with open("cat2numbers.txt", "wt") as f:
print("Writing to the file: ", myFile) # Telling the user what file they will be writing to
for i in range(9):
sentence = input("Please enter a sentence without punctuation ").lower() # Asking the user to enter a sentenc
words = sentence.split() # Splitting the sentence into single words
positions = [words.index(word) + 1 for word in words]
f.write(", ".join(map(str, positions))) # write the places to myFile
myFile.write("\n")
print("The positions are now in the file")
this uses the with statement which handles closing the file for you, behind the scenes.
As I see it, in the for loop, you try to write into file, than close it, and than WRITE TO THE CLOSED FILE again. Couldn't this be the problem?

this codes output needs to be sent to a seperate text file, why does no work

i have got this piece of code to find the positions of the first occuring of that word and replace them into the actual program.
i have tried this
sentence = "ask not what you can do for your country ask what your country can do for you"
listsentence = sentence.split(" ")
d = {}
i = 0
values = []
for i, word in enumerate(sentence.split(" ")):
if not word in d:
d[word] = (i + 1)
values += [d[word]]
print(values)
example = open('example.txt', 'wt')
example.write(str(values))
example.close()
how do i write this output to a seperate text file such as notepad.
Actually your code works- example.txt is created each time you run this program. You can check that in your directory this file exists.
If you want to open it right after closing it in your script add:
import os
os.system("notepad example.txt")

How to delete/discard certain words from strings in a text file?

I have a text file with these contents:
GULBERG (source)
MODEL TOWN (destination)
I want to extract GULBERG and MODEL TOWN (or any other names that appear here) and discard (source) and (destination) from the file using Matlab. Further, I'm saving these names to variables STRING{1},STRING{2} for later use.
But the problem i'm facing is that my code extracts only "GULBERG" and "MODEL" from the file and not the second word i.e "TOWN".
My output so far:
GULBERG
MODEL
How can i fix this so that i get the word TOWN as well in the output??
Here's my code:
fid = fopen('myfile.txt');
thisline = fgets(fid);
a=char(40); %character code for paranthesis (
b=char(41); %character code for paranthesis )
STRING=cell(2,1);
ii=1;
while ischar(thisline)
STRING{ii}=sscanf(thisline,['%s' a b 'source' 'destination']);
ii=ii+1;
thisline = fgets(fid);
end
fclose(fid);
% STRING{1} contains first name
% STRING{2} contains second name
Assuming that the identifiers - (source) and (destination) always appear at the end of the lines after the town names that are to be detected, see if this works for you -
%%// input_filepath and output_filepath are filepaths
%%// of input and output text files
str1 = importdata(input_filepath)
split1 = regexp(str1,'\s','Split')
%%// Store row numbers that do not have (source) or (destination) as a string
ind1 = ~cellfun(#isempty,(regexp(str1,'(source)'))) | ...
~cellfun(#isempty,(regexp(str1,'(destination)')));
str1 = strrep(str1,' (source)','')
str1 = strrep(str1,' (destination)','')
STRING = str1(ind1,:)
%%// Save as a text file
fid = fopen(output_filepath,'w');
for k = 1:size(STRING,1)
fprintf(fid,'%s\n',STRING{k,:});
end
fclose(fid);
While I was waiting for an answer, i did more digging myself and found the solution to my problem. Looks like using strrep() to replace the unwanted words with '' solved my problem.
I'm sharing this so anyone with a similar problem might find this helpful!
Here's what I did:
fid = fopen('myfile.txt');
thisline = fgets(fid);
a=char(40);
b=char(41);
STRING=cell(2,1);
index=1;
while ischar(thisline)
STRING{index} = strrep(thisline,'(source)','');
index=index+1;
thisline = fgets(fid);
end
STRING{2} = strrep(STRING{2},'(destination)','');
fclose(fid);

Resources