How to search part of a text - python-3.x

I need to capture part of a text within several other texts and within these several texts I have some that always have the same initial and final word, how can I do that?
I trying make a program to search part of the a text, in the text I have a key initial and a final key.
The text format is this:
my random text this my random text this my random text this my random
text this my random text this MY_START_WORD_KEY my text this my text
this my text this my text this my text this MY_END_WORD_KEY my random text this my random text this my random text this my random text this my random text this MY_START_WORD_KEY my text this my text
this my text this my text this my text this MY_END_WORD_KEY my random text this my random text this my random text this my random text this my random text this
I created this code:
txt = "my_text.txt"
with open(txt, encoding="utf8") as text:
all_text = text.read()
start='START MY KEY WORD '
end='END MY KEY WORD'
result=[]
temp=all_text.split(start)
for part in temp:
if end in part:
result.append(part.split(end)[0])
But that way the initial and final word is lost in my final full text.
I need everything between the initial keyword to the final keyword.

you can try something like
import re
txt = '''my random text this my random text this my random text this my random text this my random text this START_KEY my text this my text this my text this my text this my text this END_KEY my text this my text this my text this my text this my text this my text this my text this
more random START_KEY text END_KEY.'''
START_KEY='START_KEY'
END_KEY='END_KEY'
matches = re.findall(START_KEY+r"\s.*\s"+END_KEY, txt)
the result will be
matches = ['START_KEY my text this my text this my text this my text this my text this END_KEY',
'START_KEY text END_KEY']
Regex can be very useful for things like this,
you can find more about 're' lib here

Assuming you only see one occurrence of the start and end word in the text:
text = ...
start="start"
end="end"
result = text.split(start)[1].split(end)[0].strip()
You can split and select the middle portion.

Try this:
keys = ['start','end']
text = "Everything between key[0] and key[1]+3 is suposed to return as substring. start extracting here and stop at the end. This part should not appear in extracted substring."
start = text.index(keys[0])
end = text.index(keys[1])
print(text[:start], '\n', text[start:end+3], '\n', text[end+3:])

Tô kee START and END words, use a regexp like:
/(START.*END)/gm
To remove the limiting words,nuse the following regexp:
/START(.*)END/gm
I hope it helps.

Related

Format text of mark_text in Altair

I'm trying to create a chart somewhat along the lines of the Multi-Line Tooltip example, but I'd like to format the string that is being printed to have some text added at the end. I'm trying to modify this part:
# Draw text labels near the points, and highlight based on selection
text = line.mark_text(align='left', dx=5, dy=-5).encode(
text=alt.condition(nearest, 'y:Q', alt.value(' '))
)
Specifically, rather than 'y:Q' I want something along the lines of 'y:Q' + " suffix". I've tried doing something like this:
# Draw text labels near the points, and highlight based on selection
text = line.mark_text(align='left', dx=5, dy=-5).encode(
text=alt.condition(nearest, 'y:Q', alt.value(' '), format=".2f inches")
)
Alternatively, I've tried:
# Draw text labels near the points, and highlight based on selection
y_fld = 'y'
text = line.mark_text(align='left', dx=5, dy=-5).encode(
text=alt.condition(nearest, f"{y_fld:.2f} inches", alt.value(' '))
)
I think I see why those don't work, but I can't figure out how to intercept the value of y and pass it through a format string. Thanks!
I think the easiest way to do this is to calculate a new field using transform_calculate to compute the label that you want.
Using the example from the documentation, I would change the text chart like this:
text = line.mark_text(align='left', dx=5, dy=-5).encode(
text=alt.condition(nearest, 'label:N', alt.value(' '))
).transform_calculate(label='datum.y + " inches"')
That leads to this chart:
If you want more control, you could change the dataset with pandas beforhand. Be sure to set the type to Nominal (and not Quantitative) otherwise you would get NaNs in the tooltips.

Can I retrieve the text color (or background color) of the character at the text cursor from QTextEdit?

I have a QTextEdit window with words and letters displayed in several colors. I want to be able to retrieve the color of each part of the text when processing the contents of the window. My attempt so far has been to save the entire contents as an html file and then parse through that to extract only the text with the color information. This is very cumbersome and difficult. I would much prefer to process the text using the QTextCursor if I could retrieve the color of the text at the cursor position. I have searched for the appropriate function but have not found one.
Is there a function to retrieve the color (or the format) at the QTextCursor position?
Or alternatively is there a way to retrieve each contiguous section of words and/or characters that have the same color (or format) with the format information?
Well I have found a way to do what I wanted. Here is the relevant code:
QTextCursor tc = qte->textCursor();
tc.movePosition(QTextCursor::Start, QTextCursor::MoveAnchor);
while(tc.movePosition(QTextCursor::NextCharacter, QTextCursor::MoveAnchor))
{
QTextCharFormat tcf = tc.charFormat();
int bg = tcf.background().color().rgb();
int fg = tcf.foreground().color().rgb();
printf("bg=%x fg=%x\n", bg, fg);
}
any comments or improvements are welcome.
[Corrected above]: I originally had
QColor bg = tcf.background().color().rgb();
QColor fg = tcf.foreground().color().rgb();
but with .rgb() on the end, it converts QColor to int.

Highlight predetermined locations in textbox

I have been using python 3.4 and tkinter to create an application to parse logs and format data and display results in a text widget. I would like to highlight text that is located at a known position on each line in the text window. I have seen similar highlighting questions regarding highlighting text in text widgets on this site and it has been very helpful.
My problem is that I don't need to search for the string or characters to highlight. I have the locations that I want to highlight and it could be any character in that location including white space. For example: I would like to highlight positions 0, 20, 40 on each line (eg: index 1.0, 1.20, 1.40, 2.0, 2.20, etc).
Since it is large files being written to the textbox I have to do this for the entire scrollable text window, so I need to maintain the textbox line number position.
When referring to a location in a text widget, you can append modifiers to indicate relative positioning. For example, given the position "1.0", the next position can be identified as "1.0+1c" (or "1.0+1char"). So, to highlight a single character at a given offset, make the start of the range the offset, and the end of the range one character greater.
Here's a quick hack that takes one or more "positions" and highlights that position on each line:
def highlight(text, tag, *positions):
last_line = int(text.index("end-1c").split(".")[0])
for linenumber in range(1, last_line+1):
line = text.get("%s.0" % linenumber, "%s.0 lineend-1c" % linenumber)
line_length = len(line)
for pos in positions:
if pos <= line_length:
start = "%s.%s" % (linenumber, pos)
end = start + "+1c"
text.tag_add(tag, start, end)
usage:
text = Text(...)
text.tag_configure("highlight", ...)
...
highlight(text, "highlight", 0, 20, 40)

How to conditionally move lines up?

I have a text file with many lines.
test =
more text more text more text more text
more text more text more text more text
... etc....
more text more text more text more text
more text more text more text more text
1 text
test2 =
more text more text more text more text
more text more text more text more text
3 more text
etc
What I want to do is to move lines up starting with a number
and attach them after the first line found (going backwards) ending with '=\s'
expected output:
test = 1 text
more text more text more text more text
more text more text more text more text
... etc....
more text more text more text more text
more text more text more text more text
test2 = 3 more text
more text more text more text more text
more text more text more text more text
I have no idea how to do this.
Can someone help me?
Using :global, :norm, :move and the possibility to use a search as target for Ex commands:
:g/^\d/m?.*=$|norm kJ
Breakdown:
:g/pattern/command " executes command for every line matching pattern
^\d " pattern for "lines that start with a number"
m?.*=$ " move matched line to right below the first
" line ending with = upward
| " separator between Ex commands
norm " execute normal mode command
kJ " go to line above and join
A macro may help...
/^\d<cr>:.m?=<cr>kJ
short explanation:
/^\d " find line beginning with number
:.m?= " move current line under the previous line with (=)
kJ "move cursor back to the line with (=), and join the next
it is working like:
(it seems that I typed one more ? and the last n in screenshot, but I won't record it again.)

How can I compile a list of unique image filenames in a set of html files?

I have ~3,600 html files with a ton of image tags in them. I'd like to be able to capture all the src attribute values used in these files and aggregate them into a text file where I can then remove duplicates and see how many unique image filenames there are overall.
I use BBEdit and I can easily use regex and multi-file search to find all the image references (18,673), but I don't want to replace them with anything -- instead, I want to capture them from the BBEdit search results 'Notes' and push them into another file.
Is this something that can be AppleScripted? Or are there other means to the same end that would be appropriate?
You've got a tall task there because there's many parts of this you have to solve. To give you a start, here's some advice on reading one html file and putting all the src images in an applescript list. You have to do much more than that but this is a beginning.
First you can read a html file into applescript as regular text. Something like this will get the text of one html file...
set theFile to choose file
set htmlText to read theFile
Once you have the text into applescript you could use text item delimiters to grab the src images. Here's an example. It should work no matter how complex the html code...
set htmlText to "<img src=\"smiley.gif\" alt=\"Smiley face\" height=\"42\" width=\"42\" />
<img src=\"smiley.gif\" alt=\"Smiley face\" height=\"42\" width=\"42\" />
<img src=\"smiley.gif\" alt=\"Smiley face\" height=\"42\" width=\"42\" />"
set text item delimiters to "src=\""
set a to text items of htmlText
if (count of a) is less than 2 then return
set imageList to {}
set text item delimiters to "\""
repeat with i from 2 to count of a
set thisImage to first text item of (item i of a)
set end of imageList to thisImage
end repeat
set text item delimiters to ""
return imageList
I hope that helps!

Resources