I am working to ingest an online text document, and then reprint it to a text file. The problem is, that there are only about 70 characters printing to a line, which is in accordance with how the document looks online. I would like to change that, so that the lines will wrap all the way around, and will instead go all the way to the margins of the text document, and then come back. Or, I would like to be able to reset the number of characters that print to a line, say 100 or 120. instead of the 70. However, I have not been able to do this.
Please see the following photo:
Below is my code:
import requests
from bs4 import BeautifulSoup
page = requests.get("http://classics.mit.edu/Aristotle/rhetoric.mb.txt")
soup = BeautifulSoup(page.text,"html.parser")
with open("Art_of_Rhetoric.txt", "w") as file_object:
#soup.text.replace("\r\n","")
file_object.write(soup.text)
I've tried a number of things. Including:
Putting the text into a list, line by line, and then trying to print out the lines. I only just duplicated the lines I printed.
As you can see, I tried to search for any line breaks that were within the next and remove them. That also didn't work.
I even found This in some python documentation, but it only seems to allow you to print out up to 70 characters a line anyway.
I'm pretty sure there is a simple solution to this, I just don't know what it is.
I would like the output to my text document to look like it the above in "paragraph 1". Currently, the output comes out in a manner similar to the paragraph labelled "Paragraph 2".
Related
If I have text file
AAAAABDCBBCDA
AAAAACDABBCDA
AAAAADAABBCDA
AAAAABBCBBCDA
AAAAADCBBBCDA
AAAAAABCBBCDA
Because these texts are short, I can see at a glance which position(column) is different when I look at them. Position(column) 1~5, 9~13 of every line are written in the same way. However, position(column) 6~8 are different.
Which command can I use to locate the same part? (I wonder which command I can use to get the results of positions 1 to 5, 9 to 13.)
I'm currently using Google Colab in order to take advantage of its free GPU. I was trying to modify a code that I copy and pasted from machinelearningmaster.com. However, whenever I try to add a new code line, for example "print("some words"), I get an indention error.
I have tried adding tabs or spaces before the print call but I still get the error. for example:
space,space,print("some words")
tab, tab ,print("some words")
I have also checked the colab editor settings, currently the indention width setting are set to two spaces.
The first three lines are part of the original code, the print statement is
my addition. I copy and pasted this directly from the colab editor. In Colab all four lines are aligned. As you can see here only the first three lines are aligned. I don't know what's going on.
img_path = images_dir + filename
ann_path = annotations_dir + image_id + '.xml'
count=count+1
print("this is count: ", count)
I expected this to print the value of count, instead I get an error message telling me:
IndentationError: unindent does not match any outer indentation level
Okay, after much searching and frustration, I have an idea of what went wrong, but even better, a solution to fix it.
It appears that the Google Collaborator (Colab) editor does not have a way to set it for tabs "\t" versus space (space-bar entries). From the settings tab on the cell you can set the width of the tab from 2 to 4, but these will be interpreted as 2 to 4 space-bar entries. Usually, this isn't a problem. However, if you're like me and you want to test out code from the web, or be lazy and just copy paste from your editor, problems can arise.
Here's how I fixed it. Before pasting the copied code into Colab, first put it into notepad++. Go to View> Show Symbols >Show All Characters, click on this, you should now be able so see all the characters in the code. Find a tab, it will look like an arrow pointing to the right -->, right click and copy it. Open Search> Find, open the Replace tab. Depending on your version of notepad++ the tab you copied will automatically be entered and the replace will already be set to four spaces. Hit "Replace all". This will automatically replace all tabs with equivalent spaces. Copy the code from notepad++ back to Colab. Now there will be no more conflicts.
I think using a simple find and replace tool will just work fine. I also came across this error recently in Colab and I went through #Rice Man solution. The only difference was I used Libre office writer instead of Notepad++. I also found this tool to be helpful. I am not proficient in using Colab but this solution worked for me.
Another quick fix that worked for me related to this question.
I was trying to run a python script in colab and faced this error though the line seems at an appropriate indentation in that script.
I checked with the !cat filename.py cmd, and found out that the actual indentation appears different than it is in the script (hence the error).
Taking that unindented line (according to the colab) at the start of the line and using space afterward fixed the error.
I used this website to fix the error.
Copy your code to the site, then click beautify button on top left. This will remove indention errors.
If you want to know where the indention error is coming from, use #Prachi answer.
I am trying to see if there is a way to underline a text posted to slack. I am using webhook for posting messages to slack.
You can approximate it with Unicode’s COMBINING LOW LINE character: http://www.fileformat.info/info/unicode/char/0332/index.htm . Before posting, split your string along grapheme boundaries and insert a COMBINING LOW LINE after each. This sort of works, but with Slack’s default font the underline sometimes splits visually between characters. It’s enough though to give an impression, which might be what you want if, for example, you’re trying to give an example of the position of a link within a piece of text.
I don't think this can be done. See https://api.slack.com/docs/formatting for the available message formatting options.
I am using restructuredText to create a report which includes tom log file outputs.
What I have is a number of sections with numbered lists of literals.
This looks like this:
#. ``some log file output``
#. ``more output``
Now the problem with this is that when I convert to a PDF from this using rst2pdf, the literals can sometimes be quite long and flow off the page.
What I would love is away to mark a section of text as a code literal that can flow onto the next line just like regular text.
I want this because if I don't mark the log file output as being a literal, there is sometimes some crud within the log file output which rst is interpreting as inline markup or other rst related commands.
Any other suggestions as to how this can be best done?
I know that I could ensure that the source rst file only has lines of a certain width but this would make the source file look horrible and make it unwieldy to edit.
I have tried the following 2 things, both of which don't help:
I found a rst2pdf option:
--fit-literal-mode=MODE
What to do when a literal is too wide.
One of error,overflow,shrink,truncate.
Default="shrink"
After some researching, I found mention of a wrapping option for literals.
I got rst2pdf to dump out the default stylesheet using:
rst2pdf --print-stylesheet which I then saved and modified such that the wordWrap option under literal was changed to CJK.
An (old) instrument of mine is generating ASCII data files with text descriptions at the top of the file, before the data. But the number of lines of descriptive text varies from run to run. How can I get Fortran77 to determine this automatically?
Here is an example data file, below the line.
Line of explanatory text.
Notice the possible blank lines.
More text.
The number of lines is NOT the same every time.
1.0, 2.0
2.0, 4.0
3.0, 6.0
4.0, 8.0
[I found the answer myself. Posting here to help others. It is quite annoying having to wait 8 hours to answer my own question, but I understand why the rule exists. Stupid posers!]
A crude but effective solution, if your text never starts with a number (which is my case):
Assume the input file is named Data.dat.
integer NumTextLines
real X
open(8,"Data.dat")
NumTextLines=-1
50 NumTextLines=NumTextLines+1
read(8,*,err=50) X
close(8)
open(8,"Data.dat")
Every time the program tries to read a word from a text line into the real variable X, the read statement errors and program control goes back to line 50. If the read statement is successful, then you don't want to increment NumTextLines any more. Close the file and re-open it to start over from the beginning. But now you know NumTextLines. So you can read the text one line at a time, and either save it or skip it.
{Above method works on most of my files, but not all. Another way is to read each line into a character*500 variable (say, A), then test the ASCII value of the first element of the character array. But that gets complicated.}