Comparing strings in python 2.7 - string

This is my code:
for films in filmlist:
with codecs.open('peliculas.txt', encoding='utf8', mode='r') as lfile:
filmsDone = lfile.read()
filmsDoneList = filmsDone.split(',')
if films not in filmsDoneList:
with codecs.open('peliculas.txt', encoding='utf8', mode='a+') as lfile:
lfile.write(films.strip() + ',')
It will never recognize the last item of the list.
I have printed filmsDoneList and the last item in PyCharm looks like this: u'X Men.Primera Generacion'. I have printed films and they looks like this: X Men.Primera Generacion'
So I have no idea where is the problem. Thanks in advance.

#Rafa, for you to better understand what I meant in the comments, I had to write an entire answer in order for me to attach codes and screenshots.
Let's say the peliculas.txt file has the following format:
You can import such file in python according the following 3 commands:
fileIN=open('peliculas.txt','r')
filmsDoneList=fileIN.readlines()
fileIN.close()
So you basically open the file, import each line thanks to readlines() and then close the file because its contents are available in filmsDoneList. The latter has the following contents (in PyCharm):
Obviously this list is quite long and does not fit in my screen, but you get the point.
You can now get rid of that annoying newline tag '\r\n' by means of the following loop:
for id in range(len(filmsDoneList)):
filmsDoneList[id]=filmsDoneList[id].strip()
and now filmsDoneList has the form:
much better now, innit?
Now, let's say you want to add the following films:
newFilms=['The Exorcist','Back to the Future','Aliens','Back to the Future']
To make your code more robust, I have added Back to the Future twice. Basically you can get rid of duplicates in newFilms by means of the set() function. This will convert newFilms in a set with duplicates removed, but we will convert it back to a list thanks to this command:
newFilms=list(set(newFilms))
and now newFilms has the form:
Now that everything has been sorted, it's time to check if items in newFilms already are in filmsDoneList which, recall, is the contents of peliculas.txt.
Reopen peliculas.txt as follows:
fileOUT=open('peliculas.txt','a')
the 'a' tag means "append", so basically everything you write will be added to the file without removing anything from it.
And the main loop goes:
for film in newFilms:
if film in filmsDoneList:
pass
else:
fileOUT.write(film+'\n')
the pass means "do nothing". The write commands also appends the newline tag to the movie title: this will keep the previous format of 1 title per line. At the end of this loop you might as well close fileOUT.
The resulting peliculas.txt is
and, as you can see, Back to the Future was in newFilms but wasn't appended to the end of this file because already was in it. As instead, The Exorcist and Aliens have been appended to this file, at the bottom.
If your file has titles separated by commas, this approach is still valid. However you must add
filmsDoneList=filmsDoneList[0].split(',')
after the first for loop. Also in the write function (in the last for loop) you might want to replace the newline value with a comma.
This approach is cleaner, I reckon will also fix the problem you've been having and avoids continuous open/close files in a loop. Hope this helps!

Related

I want to join two strings, but separate the two strings by a space

I want to process files quickly through a program called star, but I have many files and want to pre-format the input from my files to save time. The format required is-
sample1read1.fq, sample2read1.fq \space\ sample1read2.fq, sample2read2.fq
[EDIT]: My files look like this:
trimmed_Sample_RX.fq where X can either be 1 or 2.
Star wants me to load all of the R1's together, separated by a comma, then a space and then all of my R2's together separated by a comma. To tackle this problem I have attempted to use the join command in python:
def identifier(x):
return(x[-10:])
read1= list(sorted(fnmatch.filter(os.listdir(PATH),'*_R1.fq'), key= identifier))
read1.append(' ')
read2= list(sorted(fnmatch.filter(os.listdir(PATH),'*_R2.fq'), key= identifier))
first_half= ','.join(read1)
second_half= ','.join(read2)
star_input= first_half + second_half
print(star_input)= 'trimmed_sample1R1.fq,trimmed_sample2R1.fq, , trimmed_sample1R2.fq,trimmed_sample2R2.fq'
I attempt to add a space to the end of my file list read1. Then I turn everything into a string and attempt to join the two strings together, but that space I added into my first half pops up in the concatenation as a comma
'trimmed_sample1R1.fq,trimmed_sample2R1.fq, , trimmed_sample1R2.f,trimmed_sample2R2.fq'
If I remove the step where I append a blank space and then concatenate the two strings I get the following
'trimmed_sample1R1.fq,trimmed_sample2R1.fqtrimmed_sample1R2.fq,trimmed_sample2R2.fq'
So now the comma is gone, but I also lose the space.
Thanks.
I think I found a workaround, but if there is a better way to do this please point it out to me. Basically I kept my code the same, but I change the last step. Now the flow looks like this:
first_half=','.join(read1)
second_half=','.join(read2)
star_input= '{} {}'.format(first_half, second_half)
print(star_input)
'trimmed_sample1R1.fq,trimmed_sample2R1 trimmed_sample1R2.fq,trimmedsample2R2.fq'

Python-docx: Editing a pre-existing Run

import docx
doc = docx.Document('CLT.docx')
test = doc.paragraphs[12].runs[2].text
print(test)
doc.save(input('Name of docx file? Make sure to add file extension '))
I've trying to figure out some way to add/edit text to a pre-existing run using python-docx. I've tried test.clear() just to see if I can remove it, but that doesn't seem to work. Additionally, I tried test.add_run('test') and that didn't work either. I know how to add a new run but it will only add it at the end of the paragraph which doesn't help me much. Currently, 'print' will output the text i'd like to alter within the document, "TERMOFINTERNSHIP". Is there something i'm missing?
The text of a run can be edited in its entirety. So to replace "ac" with "abc" you just do something like this:
>>> run.text
"ac"
>>> run.text = "abc"
>>> run.text
"abc"
You cannot simply insert characters at some location; you need to extract the text, edit that str value using Python str methods, and replace it entirely. In a way of thinking, the "editing" is done outside python-docx and you're simply using python-docx for the "before" and "after" versions.
But note that while this is quite true, it's not likely to benefit you much in the general case because runs break at seemingly random locations in a line. So there is no guarantee your search string will occur within a single run. You will need an algorithm that locates all the runs containing any part of the search string, and then allocate your edits accordingly across those runs.
An empty run is valid, so run.text == "" may be a help when there are extra bits in the middle somewhere. Also note that runs can be formatted differently, so if part of your search string is bold and part not, for example, your results may be different than you might want.

need guidance with basic function creation in MATLAB

I have to write a MATLAB function with the following description:
function counts = letterStatistics(filename, allowedChar, N)
This function is supposed to open a text file specified by filename and read its entire contents. The contents will be parsed such that any character that isn’t in allowedChar is removed. Finally it will return a count of all N-symbol combinations in the parsed text. This function should be stored in a file name “letterStatistics.m” and I made a list of some commands and things of how the function should be organized according to my professors' lecture notes:
Begin the function by setting the default value of N to 1 in case:
a. The user specifies a 0 or negative value of N.
b. The user doesn’t pass the argument N into the function, i.e., counts = letterStatistics(filename, allowedChar)
Using the fopen function, open the file filename for reading in text mode.
Using the function fscanf, read in all the contents of the opened file into a string variable.
I know there exists a MATLAB function to turn all letters in a string to lower case. Since my analysis will disregard case, I have to use this function on the string of text.
Parse this string variable as follows (use logical indexing or regular expressions – do not use for loops):
a. We want to remove all newline characters without this occurring:
e.g.
In my younger and more vulnerable years my father gave me some advice that I've been turning over in my mind ever since.
In my younger and more vulnerableyears my father gave me some advicethat I’ve been turning over in my mindever since.
Replace all newline characters (special character \n) with a single space: ' '.
b. We will treat hyphenated words as two separate words, hence do the same for hyphens '-'.
c. Remove any character that is not in allowedChar. Hint: use regexprep with an empty string '' as an argument for replace.
d. Any sequence of two or more blank spaces should be replaced by a single blank space.
Use the provided permsRep function, to create a matrix of all possible N-symbol combinations of the symbols in allowedChar.
Using the strfind function, count all the N-symbol combinations in the parsed text into an array counts. Do not loop through each character in your parsed text as you would in a C program.
Close the opened file using fclose.
HERE IS MY QUESTION: so as you can see i have made this list of what the function is, what it should do, and using which commands (fclose etc.). the trouble is that I'm aware that closing the file involves use of 'fclose' but other than that I'm not sure how to execute #8. Same goes for the whole function creation. I have a vague idea of how to create a function using what commands but I'm unable to produce the actual code.. how should I begin? Any guidance/hints would seriously be appreciated because I'm having programmers' block and am unable to start!
I think that you are new to matlab, so the documentation may be complicated. The root of the problem is the basic understanding of file I/O (input/output) I guess. So the thing is that when you open the file using fopen, matlab returns a pointer to that file, which is generally called a file ID. When you call fclose you want matlab to understand that you want to close that file. So what you have to do is to use fclose with the correct file ID.
fid = open('test.txt');
fprintf(fid,'This is a test.\n');
fclose(fid);
fid = 0; % Optional, this will make it clear that the file is not open,
% but it is not necessary since matlab will send a not open message anyway
Regarding the function creation the syntax is something like this:
function out = myFcn(x,y)
z = x*y;
fprintf('z=%.0f\n',z); % Print value of z in the command window
out = z>0;
This is a function that checks if two numbers are positive and returns true they are. If not it returns false. This may not be the best way to do this test, but it works as example I guess.
Please comment if this is not what you want to know.

Same for loop, giving out two different results using .write()

this is my first time asking a question so let me know if I am doing something wrong (post wise)
I am trying to create a function that writes into a .txt but i seem to get two very different results between calling it from within a module, and writing the same loop in the shell directly. The code is as follows:
def function(para1, para2): #para1 is a string that i am searching for within para2. para2 is a list of strings
with open("str" + para1 +".txt", 'a'. encoding = 'utf-8') as file:
#opens a file with certain naming convention
n = 0
for word in para2:
if word == para1:
file.write(para2[n-1]+'\n')
print(para2[n-1]) #intentionally included as part of debugging
n+=1
function("targetstr". targettext)
#target str is the phrase I am looking for, targettext is the tokenized text I am
#looking through. this is in the form of a list of strings, that is the output of
#another function, and has already been 'declared' as a variable
when I define this function in the shell, I get the correct words appearing. However, when i call this same function through a module(in the shell), nothing appears in the shell, and the text file shows a bunch of numbers (eg: 's93161), and no new lines.
I have even gone to the extent of including a print statement right after declaration of the function in the module, and commented everything but the print statement, and yet nothing appears in the shell when I call it. However, the numbers still appear in the text file.
I am guessing that there is a problem with how I have defined the parameters or how i cam inputting the parameters when I call the function.
As a reference, here is the desired output:
‘She
Ashley
there
Kitty
Coates
‘Let
let
that
PS: Sorry if this is not very clear as I have very limited knowledge on speaking python
I have found the solution to issue. Turns out that I need to close the shell and restart everything before the compiler recognizes the changes made to the function in the module. Thanks to those who took a look at the issue, and those who tried to help.

Reading all but last few lines of a data file

I can easily skip the header of a data file using getline, but then when I parse through the data file and get to the footer of the file, I end up stuck in a loop because the program is trying to parse columns of data that no longer exist. Is there an easy way to stop reading when there is no longer data in the line? It looks like there is a blank line followed by some footer information, but I cannot guarantee that all of my data files will look like that (i.e. I need something pretty generic).
Looking at your existing code (edit your question and put it there, not in a comment), I see you have nested loops. But what you really want is one loop with two reasons to exit.
while ((q < 16) && (liness >> temp)) { ... }
Read the line into a string, parse if only if you see \n at the end.

Resources