this is my first time asking a question so let me know if I am doing something wrong (post wise)
I am trying to create a function that writes into a .txt but i seem to get two very different results between calling it from within a module, and writing the same loop in the shell directly. The code is as follows:
def function(para1, para2): #para1 is a string that i am searching for within para2. para2 is a list of strings
with open("str" + para1 +".txt", 'a'. encoding = 'utf-8') as file:
#opens a file with certain naming convention
n = 0
for word in para2:
if word == para1:
file.write(para2[n-1]+'\n')
print(para2[n-1]) #intentionally included as part of debugging
n+=1
function("targetstr". targettext)
#target str is the phrase I am looking for, targettext is the tokenized text I am
#looking through. this is in the form of a list of strings, that is the output of
#another function, and has already been 'declared' as a variable
when I define this function in the shell, I get the correct words appearing. However, when i call this same function through a module(in the shell), nothing appears in the shell, and the text file shows a bunch of numbers (eg: 's93161), and no new lines.
I have even gone to the extent of including a print statement right after declaration of the function in the module, and commented everything but the print statement, and yet nothing appears in the shell when I call it. However, the numbers still appear in the text file.
I am guessing that there is a problem with how I have defined the parameters or how i cam inputting the parameters when I call the function.
As a reference, here is the desired output:
‘She
Ashley
there
Kitty
Coates
‘Let
let
that
PS: Sorry if this is not very clear as I have very limited knowledge on speaking python
I have found the solution to issue. Turns out that I need to close the shell and restart everything before the compiler recognizes the changes made to the function in the module. Thanks to those who took a look at the issue, and those who tried to help.
Related
I'm writing a script to scrape from another website with Python, and I am facing this question that I have yet to figure out a method to resolve it.
So say I have set to replace this particular string with something else.
word_replace_1 = 'dv'
namelist = soup.title.string.replace(word_replace_1,'11dv')
The script works fine, when the titles are dv234,dv123 etc.
The output will be 11dv234, 11dv123.
However if the titles are, dv234, mixed with dvab123, even though I did not set dvab to be replaced with anything, the script is going to replace it to 11dvab123. What should I do here?
Also, if the title is a combination of alphabits,numbers and Korean characters, say DAV123ㄱㄴㄷ,
how exactly should I make it to only spitting out DAV123, and adding - in between alphabits and numbers?
Python - making a function that would add "-" between letters
This gives me the idea to add - in between all characters, but is there a method to add - between character and number?
the only way atm I can think of is creating a table of replacing them, for example something like this
word_replace_3 = 'a1'
word_replace_4 = 'a2'
.......
and then print them out as
namelist3 = soup.title.string.replace(word_replace_3,'a-1').replace(word_replace_4,'a-2')
This is just slow and not efficient. What would be the best method to resolve this?
Thanks.
import docx
doc = docx.Document('CLT.docx')
test = doc.paragraphs[12].runs[2].text
print(test)
doc.save(input('Name of docx file? Make sure to add file extension '))
I've trying to figure out some way to add/edit text to a pre-existing run using python-docx. I've tried test.clear() just to see if I can remove it, but that doesn't seem to work. Additionally, I tried test.add_run('test') and that didn't work either. I know how to add a new run but it will only add it at the end of the paragraph which doesn't help me much. Currently, 'print' will output the text i'd like to alter within the document, "TERMOFINTERNSHIP". Is there something i'm missing?
The text of a run can be edited in its entirety. So to replace "ac" with "abc" you just do something like this:
>>> run.text
"ac"
>>> run.text = "abc"
>>> run.text
"abc"
You cannot simply insert characters at some location; you need to extract the text, edit that str value using Python str methods, and replace it entirely. In a way of thinking, the "editing" is done outside python-docx and you're simply using python-docx for the "before" and "after" versions.
But note that while this is quite true, it's not likely to benefit you much in the general case because runs break at seemingly random locations in a line. So there is no guarantee your search string will occur within a single run. You will need an algorithm that locates all the runs containing any part of the search string, and then allocate your edits accordingly across those runs.
An empty run is valid, so run.text == "" may be a help when there are extra bits in the middle somewhere. Also note that runs can be formatted differently, so if part of your search string is bold and part not, for example, your results may be different than you might want.
This is my code:
for films in filmlist:
with codecs.open('peliculas.txt', encoding='utf8', mode='r') as lfile:
filmsDone = lfile.read()
filmsDoneList = filmsDone.split(',')
if films not in filmsDoneList:
with codecs.open('peliculas.txt', encoding='utf8', mode='a+') as lfile:
lfile.write(films.strip() + ',')
It will never recognize the last item of the list.
I have printed filmsDoneList and the last item in PyCharm looks like this: u'X Men.Primera Generacion'. I have printed films and they looks like this: X Men.Primera Generacion'
So I have no idea where is the problem. Thanks in advance.
#Rafa, for you to better understand what I meant in the comments, I had to write an entire answer in order for me to attach codes and screenshots.
Let's say the peliculas.txt file has the following format:
You can import such file in python according the following 3 commands:
fileIN=open('peliculas.txt','r')
filmsDoneList=fileIN.readlines()
fileIN.close()
So you basically open the file, import each line thanks to readlines() and then close the file because its contents are available in filmsDoneList. The latter has the following contents (in PyCharm):
Obviously this list is quite long and does not fit in my screen, but you get the point.
You can now get rid of that annoying newline tag '\r\n' by means of the following loop:
for id in range(len(filmsDoneList)):
filmsDoneList[id]=filmsDoneList[id].strip()
and now filmsDoneList has the form:
much better now, innit?
Now, let's say you want to add the following films:
newFilms=['The Exorcist','Back to the Future','Aliens','Back to the Future']
To make your code more robust, I have added Back to the Future twice. Basically you can get rid of duplicates in newFilms by means of the set() function. This will convert newFilms in a set with duplicates removed, but we will convert it back to a list thanks to this command:
newFilms=list(set(newFilms))
and now newFilms has the form:
Now that everything has been sorted, it's time to check if items in newFilms already are in filmsDoneList which, recall, is the contents of peliculas.txt.
Reopen peliculas.txt as follows:
fileOUT=open('peliculas.txt','a')
the 'a' tag means "append", so basically everything you write will be added to the file without removing anything from it.
And the main loop goes:
for film in newFilms:
if film in filmsDoneList:
pass
else:
fileOUT.write(film+'\n')
the pass means "do nothing". The write commands also appends the newline tag to the movie title: this will keep the previous format of 1 title per line. At the end of this loop you might as well close fileOUT.
The resulting peliculas.txt is
and, as you can see, Back to the Future was in newFilms but wasn't appended to the end of this file because already was in it. As instead, The Exorcist and Aliens have been appended to this file, at the bottom.
If your file has titles separated by commas, this approach is still valid. However you must add
filmsDoneList=filmsDoneList[0].split(',')
after the first for loop. Also in the write function (in the last for loop) you might want to replace the newline value with a comma.
This approach is cleaner, I reckon will also fix the problem you've been having and avoids continuous open/close files in a loop. Hope this helps!
I have to write a MATLAB function with the following description:
function counts = letterStatistics(filename, allowedChar, N)
This function is supposed to open a text file specified by filename and read its entire contents. The contents will be parsed such that any character that isn’t in allowedChar is removed. Finally it will return a count of all N-symbol combinations in the parsed text. This function should be stored in a file name “letterStatistics.m” and I made a list of some commands and things of how the function should be organized according to my professors' lecture notes:
Begin the function by setting the default value of N to 1 in case:
a. The user specifies a 0 or negative value of N.
b. The user doesn’t pass the argument N into the function, i.e., counts = letterStatistics(filename, allowedChar)
Using the fopen function, open the file filename for reading in text mode.
Using the function fscanf, read in all the contents of the opened file into a string variable.
I know there exists a MATLAB function to turn all letters in a string to lower case. Since my analysis will disregard case, I have to use this function on the string of text.
Parse this string variable as follows (use logical indexing or regular expressions – do not use for loops):
a. We want to remove all newline characters without this occurring:
e.g.
In my younger and more vulnerable years my father gave me some advice that I've been turning over in my mind ever since.
In my younger and more vulnerableyears my father gave me some advicethat I’ve been turning over in my mindever since.
Replace all newline characters (special character \n) with a single space: ' '.
b. We will treat hyphenated words as two separate words, hence do the same for hyphens '-'.
c. Remove any character that is not in allowedChar. Hint: use regexprep with an empty string '' as an argument for replace.
d. Any sequence of two or more blank spaces should be replaced by a single blank space.
Use the provided permsRep function, to create a matrix of all possible N-symbol combinations of the symbols in allowedChar.
Using the strfind function, count all the N-symbol combinations in the parsed text into an array counts. Do not loop through each character in your parsed text as you would in a C program.
Close the opened file using fclose.
HERE IS MY QUESTION: so as you can see i have made this list of what the function is, what it should do, and using which commands (fclose etc.). the trouble is that I'm aware that closing the file involves use of 'fclose' but other than that I'm not sure how to execute #8. Same goes for the whole function creation. I have a vague idea of how to create a function using what commands but I'm unable to produce the actual code.. how should I begin? Any guidance/hints would seriously be appreciated because I'm having programmers' block and am unable to start!
I think that you are new to matlab, so the documentation may be complicated. The root of the problem is the basic understanding of file I/O (input/output) I guess. So the thing is that when you open the file using fopen, matlab returns a pointer to that file, which is generally called a file ID. When you call fclose you want matlab to understand that you want to close that file. So what you have to do is to use fclose with the correct file ID.
fid = open('test.txt');
fprintf(fid,'This is a test.\n');
fclose(fid);
fid = 0; % Optional, this will make it clear that the file is not open,
% but it is not necessary since matlab will send a not open message anyway
Regarding the function creation the syntax is something like this:
function out = myFcn(x,y)
z = x*y;
fprintf('z=%.0f\n',z); % Print value of z in the command window
out = z>0;
This is a function that checks if two numbers are positive and returns true they are. If not it returns false. This may not be the best way to do this test, but it works as example I guess.
Please comment if this is not what you want to know.
I'm having some trouble automating a matlab script which should prompt the user for the variable they are interested in, as well as the date range they want. I then want the script to concatenate their answers within a naming convention for the file they will ultimately load.
variable=input('please input variable of interest');
% temp
start=input('please state the start date in the form yymmdd: ');
%130418
enddate=input('please state the end date in the form yymmdd: ');
%140418
file=sprintf('%s_dailydata_%d_%d.csv',variable,start,enddate);
%so I thought 'file' would look like: temp_dailydata_130418_140418.csv
vardata=load(file);
the two numbers representing the dates are not causing any issues, but the fact that 'variable' is a string is. I know if I put apostrophes before and after 'temp' when I'm promted, it will work, but I have to assume that the end user won't know to do this. I tried putting curly braces around 'please input your variable..', but that didn't help either. Obviously this approach assumes that the date being requested exists in a filename.
Can anyone offer any advice? Perhaps the sprintf function is not the best option here?
Don't use 'end' as a variable name, it is a reserved name and using it could create conflicts with any function or logic block you're defining.
If you know your input is going to be a string: from the documentation for input()
str = input(prompt,'s')
Returns the entered text as a MATLAB string, without evaluating expressions.
As for knowing whether or not the file exists, that's something you'd have to incorporate some error logic for. Either a try/catch block with your load() call or you could use uigetfile() to get the filename.