import docx
doc = docx.Document('CLT.docx')
test = doc.paragraphs[12].runs[2].text
print(test)
doc.save(input('Name of docx file? Make sure to add file extension '))
I've trying to figure out some way to add/edit text to a pre-existing run using python-docx. I've tried test.clear() just to see if I can remove it, but that doesn't seem to work. Additionally, I tried test.add_run('test') and that didn't work either. I know how to add a new run but it will only add it at the end of the paragraph which doesn't help me much. Currently, 'print' will output the text i'd like to alter within the document, "TERMOFINTERNSHIP". Is there something i'm missing?
The text of a run can be edited in its entirety. So to replace "ac" with "abc" you just do something like this:
>>> run.text
"ac"
>>> run.text = "abc"
>>> run.text
"abc"
You cannot simply insert characters at some location; you need to extract the text, edit that str value using Python str methods, and replace it entirely. In a way of thinking, the "editing" is done outside python-docx and you're simply using python-docx for the "before" and "after" versions.
But note that while this is quite true, it's not likely to benefit you much in the general case because runs break at seemingly random locations in a line. So there is no guarantee your search string will occur within a single run. You will need an algorithm that locates all the runs containing any part of the search string, and then allocate your edits accordingly across those runs.
An empty run is valid, so run.text == "" may be a help when there are extra bits in the middle somewhere. Also note that runs can be formatted differently, so if part of your search string is bold and part not, for example, your results may be different than you might want.
Related
I want to use printing command bellow in many places of my script. But I need to keep replacing "Survived" with some other string.
print(df.Survived.value_counts())
Can I automate the process by formating variable the same way as string? So if I want to replace "Survived" with "different" can I use something like:
var = 'different'
text = 'df.{}.value_counts()'.format(var)
print(text)
unfortunately this prints out "df.different.value_counts()" as as a string, while I need to print the value of df.different.value_counts()
I'm pretty sure alot of IDEs, have this option that is called refactoring, and it allows you to change a similar line of code/string on every line of code to what you need it to be.
I'm aware of VSCode's way of refactoring, is by selecting a part of the code and right click to select the option called change all occurances. This will replace the exact code on every line if it exists.
But if you want to do what you proposed, then eval('df.{}.value_counts()'.format(var)) is an option, but this is very unsecured and dangerous, so a more safer approach would be importing the ast module and using it's literal_eval function which is safer. ast.literal_eval('df.{}.value_counts()'.format(var)).
if ast.literal_eval() doesn't work then try this final solution that works.
def cat():
return 1
text = locals()['df.{}.value_counts'.format(var)]()
Found the way: print(df[var].value_counts())
I'm writing a script to scrape from another website with Python, and I am facing this question that I have yet to figure out a method to resolve it.
So say I have set to replace this particular string with something else.
word_replace_1 = 'dv'
namelist = soup.title.string.replace(word_replace_1,'11dv')
The script works fine, when the titles are dv234,dv123 etc.
The output will be 11dv234, 11dv123.
However if the titles are, dv234, mixed with dvab123, even though I did not set dvab to be replaced with anything, the script is going to replace it to 11dvab123. What should I do here?
Also, if the title is a combination of alphabits,numbers and Korean characters, say DAV123ㄱㄴㄷ,
how exactly should I make it to only spitting out DAV123, and adding - in between alphabits and numbers?
Python - making a function that would add "-" between letters
This gives me the idea to add - in between all characters, but is there a method to add - between character and number?
the only way atm I can think of is creating a table of replacing them, for example something like this
word_replace_3 = 'a1'
word_replace_4 = 'a2'
.......
and then print them out as
namelist3 = soup.title.string.replace(word_replace_3,'a-1').replace(word_replace_4,'a-2')
This is just slow and not efficient. What would be the best method to resolve this?
Thanks.
This is my code:
for films in filmlist:
with codecs.open('peliculas.txt', encoding='utf8', mode='r') as lfile:
filmsDone = lfile.read()
filmsDoneList = filmsDone.split(',')
if films not in filmsDoneList:
with codecs.open('peliculas.txt', encoding='utf8', mode='a+') as lfile:
lfile.write(films.strip() + ',')
It will never recognize the last item of the list.
I have printed filmsDoneList and the last item in PyCharm looks like this: u'X Men.Primera Generacion'. I have printed films and they looks like this: X Men.Primera Generacion'
So I have no idea where is the problem. Thanks in advance.
#Rafa, for you to better understand what I meant in the comments, I had to write an entire answer in order for me to attach codes and screenshots.
Let's say the peliculas.txt file has the following format:
You can import such file in python according the following 3 commands:
fileIN=open('peliculas.txt','r')
filmsDoneList=fileIN.readlines()
fileIN.close()
So you basically open the file, import each line thanks to readlines() and then close the file because its contents are available in filmsDoneList. The latter has the following contents (in PyCharm):
Obviously this list is quite long and does not fit in my screen, but you get the point.
You can now get rid of that annoying newline tag '\r\n' by means of the following loop:
for id in range(len(filmsDoneList)):
filmsDoneList[id]=filmsDoneList[id].strip()
and now filmsDoneList has the form:
much better now, innit?
Now, let's say you want to add the following films:
newFilms=['The Exorcist','Back to the Future','Aliens','Back to the Future']
To make your code more robust, I have added Back to the Future twice. Basically you can get rid of duplicates in newFilms by means of the set() function. This will convert newFilms in a set with duplicates removed, but we will convert it back to a list thanks to this command:
newFilms=list(set(newFilms))
and now newFilms has the form:
Now that everything has been sorted, it's time to check if items in newFilms already are in filmsDoneList which, recall, is the contents of peliculas.txt.
Reopen peliculas.txt as follows:
fileOUT=open('peliculas.txt','a')
the 'a' tag means "append", so basically everything you write will be added to the file without removing anything from it.
And the main loop goes:
for film in newFilms:
if film in filmsDoneList:
pass
else:
fileOUT.write(film+'\n')
the pass means "do nothing". The write commands also appends the newline tag to the movie title: this will keep the previous format of 1 title per line. At the end of this loop you might as well close fileOUT.
The resulting peliculas.txt is
and, as you can see, Back to the Future was in newFilms but wasn't appended to the end of this file because already was in it. As instead, The Exorcist and Aliens have been appended to this file, at the bottom.
If your file has titles separated by commas, this approach is still valid. However you must add
filmsDoneList=filmsDoneList[0].split(',')
after the first for loop. Also in the write function (in the last for loop) you might want to replace the newline value with a comma.
This approach is cleaner, I reckon will also fix the problem you've been having and avoids continuous open/close files in a loop. Hope this helps!
i´ve got small problem and before I spend even more time in trying to solve it i´d like to know if what I want to do is even possible ( and maybe input on how to do it^^).
My problem:
I want to take some text and then split it into different strings at every whitespace (for example "Hello my name is whatever" into "Hello" "my" "name" "is" "whatever").
Then I want to set every string with it´s own variable so that I get something alike to a= "Hello" b= "my" and so on. Then I want to compare the strings with other strings (the idea is to get addresses from applications without having to search through them so I thought I could copy a telephone book to define names and so on) and set matching input to variables like Firstname , LastName and street.
Then, and here comes the "I´d like to know if it´s possible" part I want it to put it into our database, this means I want it to copy the string into a text field and then to go to the next field via tab. I´ve done something like this before with AutoIT but i´ve got no idea how to tell AutoIT whats inside the strings so I guess it must be done through the programm itself.
I´ve got a little bit of experience with c++, python and BATCH files so it would be nice if anyone could tell me if this can even be done using those languages (and I fear C++ can do it and I´m just to stupid to do so).
Thanks in advance.
Splitting a string is very simple, there is usually a built in method called .split() which will help you, the method varies from language to language.
When you've done a split, it will be assigned to an array, you can then use an index to get the variables, for example you'd have:
var str = "Hello, my name is Bob";
var split = str.split(" ");
print split[0]; // is "Hello,"
print split[1]; // is "my" etc
You can also use JSON to return data so you could have an output like
print split["LastName"];
What you're asking for is defiantly possible.
Some links that could be useful:
Split a string in C++?
https://code.google.com/p/cpp-json/
this is my first time asking a question so let me know if I am doing something wrong (post wise)
I am trying to create a function that writes into a .txt but i seem to get two very different results between calling it from within a module, and writing the same loop in the shell directly. The code is as follows:
def function(para1, para2): #para1 is a string that i am searching for within para2. para2 is a list of strings
with open("str" + para1 +".txt", 'a'. encoding = 'utf-8') as file:
#opens a file with certain naming convention
n = 0
for word in para2:
if word == para1:
file.write(para2[n-1]+'\n')
print(para2[n-1]) #intentionally included as part of debugging
n+=1
function("targetstr". targettext)
#target str is the phrase I am looking for, targettext is the tokenized text I am
#looking through. this is in the form of a list of strings, that is the output of
#another function, and has already been 'declared' as a variable
when I define this function in the shell, I get the correct words appearing. However, when i call this same function through a module(in the shell), nothing appears in the shell, and the text file shows a bunch of numbers (eg: 's93161), and no new lines.
I have even gone to the extent of including a print statement right after declaration of the function in the module, and commented everything but the print statement, and yet nothing appears in the shell when I call it. However, the numbers still appear in the text file.
I am guessing that there is a problem with how I have defined the parameters or how i cam inputting the parameters when I call the function.
As a reference, here is the desired output:
‘She
Ashley
there
Kitty
Coates
‘Let
let
that
PS: Sorry if this is not very clear as I have very limited knowledge on speaking python
I have found the solution to issue. Turns out that I need to close the shell and restart everything before the compiler recognizes the changes made to the function in the module. Thanks to those who took a look at the issue, and those who tried to help.