What functionality does the "g?" command provide in vim - vim

As a beginner programmer I've been practicing vim on vimgolf recently and saw that the command "g?" was used effectively to switch many lines of 'Ivm' to become 'Vim'. I understand that this shifts each alphabetical letter 13 times to the right but do not understand how this would prove useful except in unique circumstances like these.

The g? command (type :help g? for brief documentation) implements the rot13 algorithm, which rotates each letter 13 places forward or backward in the alphabet.
I'm not sure how commonly it's used today, but on Usenet it used to be a common way to encode spoilers. For example, if I'm writing a post that gives away the ending of something that not everyone has seen, I might use rot13 to weakly encrypt part of the article. It's enough to make it impossible to read accidentally (unless you've had a lot of practice), but easy to read if you're using a newsreader that has a built-in rot13 function -- as most of them do.
For example:
Pretend this is a spoiler. Filter with rot13 to read it.
would become
Cergraq guvf vf n fcbvyre. Svygre jvgu ebg13 gb ernq vg.
If I don't want to read the spoiler, I can ignore it. If I do want to read it, I can decrypt it easily enough.

I have been using Vim since 4 years and learned about that command very early on but, even if I knew perfectly well what ROT13 was, I never found a use for g?.
Until a couple of weeks ago when I needed to add a bunch of <li> with unique IDs to a <ul> in an HTML prototype…
The starting point:
<ul>
<li id="lorem">foo</li>
<li id="ipsum">foo</li>
</ul>
After duplicating the two <li>:
<ul>
<li id="lorem">foo</li>
<li id="ipsum">foo</li>
<li id="lorem">foo</li>
<li id="ipsum">foo</li>
</ul>
After g?i" on the two new <li>'s ids:
<ul>
<li id="lorem">foo</li>
<li id="ipsum">foo</li>
<li id="yberz">foo</li>
<li id="vcfhz">foo</li>
</ul>
There! I found a practical use for g? in actual "programming"! Celebration!!!

It can prove useful on the case where you want to quickly hide some part of text that you typed in a visible vim buffer from onlookers.
For example some piece of password or token which you put in your code (but only do this temporarily, when you must).
Perhaps you want to invite a team-mate to look at some of your code, or you work in a place where people use to walk behind your back all the time so you can just rot13 the string and it is useless to them (at least in a glance).
It probably works best against non technical passerby's or for short exposure period.
Keep in mind it does not rotate numbers and for security purposes it was even better if it could take a rotation size.
It can also become useful when you solve a CTF that has a rot13 challenge...

Related

Can't seem to find an XPath with a value that contains double quotes

I've read through the forums and apparently almost nobody has issues to find XPaths with values with double quotes, most posts I found talked about finding Xpaths to elements with values that had both single quotes and double quotes, so I decided to ask this question, I apologize if this is already answered elsewhere.
Anyway, the element I wanted to find goes more or less like this:
<a class="product" title="REALIZE "WHY NO.T"" width="454" height="423" alt="" id="">
</a>
I tried changing the XPath several times without success, using Selenium Webdriver
'//a[#title="REALIZE "WHY NO.T""]'
'//a[#title="REALIZE \"WHY NO.T\""]'
"//a[#title="REALIZE \"WHY NO.T\""]"
These are a few of the ones I tried, there are a few more but I didn't save all the ones I tried
I feel like it might be a matter of me missing something terribly basic, but I've been looking for the answer for hours without success
//*[#title='REALIZE "WHY NO.T"']
You have to wrap content with single quotes, the python code will be ( escaping the single quotes
driver.find_element_by_xpath('//*[#title=\'REALIZE "WHY NO.T"\']')
You say " the element I wanted to find goes more or less like this:"
"More or less"? What on earth does that mean? Is it more like this, or less like this? How can we help you if we don't know exactly what it's like?
And then you say:
<a class="product" title="REALIZE "WHY NO.T"" width="454" height="423" alt="" id="">
</a>
But that's not well-formed XML. How is the parser supposed to work out where the title attribute ends? The XML parser should throw it out at that point.
OK, you're probably using an HTML parser rather than an XML parser, and HTML parsers try to make sense of any old garbage you throw at them. But I've no idea what an HTML parser will do with this input. HTML parsers are smart, but they're not smart enough to work out which of these quotes are part of the attribute value and which of them mark its beginning and end. It's probably turned it into something quite different from what you were expecting, and that's why your XPath expression doesn't work.
I would recommend
right-clicking the intended element > click inspect element > look over element in console > copy xpath > paste it and analyze how it outputs.
From there I would then compare it to your current solution and maybe tweak a thing or two.

want to extract review but getting some problem

my script that i'm using to extract the review for one of the book is:
URL:
www.goodreads.com/book/show/2657.To_Kill_a_Mockingbird
from selenium import webdriver
import time
driver = webdriver.Chrome()
time.sleep(3)
driver.get('https://www.goodreads.com/book/show/2657.To_Kill_a_Mockingbird')
time.sleep(5)
reviews = driver.find_elements_by_css_selector("div.reviewText")
for r in reviews:
spanText = r.find_element_by_css_selector("span.readable:nth-child(2)").text
print("Span text:", spanText)
I'm facing the problem that i am not able to extract the whole text from the div.reviewText>span as in that div>span there are two nested spans one contains little text(for getting full text have to click on ...more link) not full one and the second span in the div.contains the full text, so i want to get the text frm the second span. Can someone help me please?
HTML(or you can visit the site as link is given above)
<div class="reviewText stacked">
<span id="reviewTextContainer35272288" class="readable">
<span id="freeTextContainer13558188749606170457">If I could give this no stars, I would. This is possibly one of my least favorite books in the world, one that I would happily take off of shelves and stow in dark corners where no one would ever have to read it again.
<br>
<br>I think that To Kill A Mockingbird has such a prominent place in (American) culture because it is a naive, idealistic piece of writing in which naivete and idealism are ultimately rewarded. It's a saccharine, rose-tinted eulogy for the nineteen thirties from an orator who comes not
</span>
<span id="freeText13558188749606170457" style="display:none">If I could give this no stars, I would. This is possibly one of my least favorite books in the world, one that I would happily take off of shelves and stow in dark corners where no one would ever have to read it again.
<br>
<br>I think that To Kill A Mockingbird has such a prominent place in (American) culture because it is a naive, idealistic piece of writing in which naivete and idealism are ultimately rewarded. It's a saccharine, rose-tinted eulogy for the nineteen thirties from an orator who comes not to bury, but to praise. Written in the late fifties, TKAM is free of the social changes and conventions that people at the time were (and are, to some extent) still grating at. The primary dividing line in TKAM is not one of race, but is rather one of good people versus bad people -- something that, of course, Atticus and the children can discern effortlessly.
<br>
<br>The characters are one dimensional. Calpurnia is the Negro who knows her place and loves the children; Atticus is a good father, wise and patient; Tom Robinson is the innocent wronged; Boo is the kind eccentric; Jem is the little boy who grows up; Scout is the precocious, knowledgable child. They have no identity outside of these roles. The children have no guile, no shrewdness--there is none of the delightfully subversive slyness that real children have, the sneakiness that will ultimately allow them to grow up. Jem and Scout will be children forever, existing in a world of black and white in which lacking knowledge allows people to see the truth in all of its simple, nuanceless glory.
<br>
<br>I think that's why people find it soothing: TKAM privileges, celebrates, even, the child's point of view. Other YA classics--Huckleberry Finn; Catcher in the Rye; A Wrinkle in Time; The Day No Pigs Would Die; Are You There, God? It's Me, Margaret; Bridge to Terabithia--feature protagonists who are, if not actively fighting to become adults, at least fighting to find themselves as people. There is an active struggle throughout each of those books to make sense of the world, to define the world as something larger than oneself, as something that the protagonist can somehow be a part of. To Kill A Mockingbird has no struggle to become part of the world--in it, the children *are* the world, and everything else is just only relevant in as much as it affects them. There's no struggle to make sense of things, because to them, it already makes sense; there's no struggle to be a part of something, because they're already a part of everything. There's no sense of maturation--their world changes, but it leaves them, in many ways, unchanged, and because of that, it fails as a story for me. The whole point of a coming of age story--which is what TKAM is generally billed as--is that the characters come of age, or at least mature in some fashion, and it just doesn't happen.
<br>
<br>All thematic issues aside, I think that the writing is very, er, uneven, shall we say? Overwhelmingly episodic, not terribly consistent, and largely as dimensionless as the characters.
<br>
</span>
<a data-text-id="13558188749606170457" href="#" onclick="swapContent($(this));; return false;">...more</a>
</span>
</div>
use get_attribute() to extract hidden content and you don't need unnecessary sleep
driver = webdriver.Chrome()
driver.get('https://www.goodreads.com/book/show/2657.To_Kill_a_Mockingbird')
reviews = driver.find_elements_by_css_selector("span.readable span:nth-child(2)")
for r in reviews:
spanText = r.get_attribute('textContent')
print("Span text:", spanText)
Second span is hidden, so you cannot get its content with text property.
You need to try
spanText = r.find_elements_by_css_selector("span.readable > span")[-1].get_attribute('textContent')
to get content of hidden element

allow twig syntax in htmlpurifier and/or don't tidy

i am running some templates my visitors make with twig through htmlpurifier but it keeps trying to fix the html code.
i have this as an example:
<ul>
{% for update in jobupdates %}
<li>
{{ update.comment|nl2br }}
</li>
{% endfor %}
</ul>
and it will turn that into:
<ul><li>
{% for update in jobupdates %}
</li><li>
{{ update.comment|nl2br }}
{% endfor %}
</li></ul>
which totaly breaks it all.
I have tried setting the option 'HTML.TidyLevel' to none but it still does it.
Is there i way to stop htmlpurifier from trying to fix the html code? or to ignore twig syntax?
Background
Is HTML Purifier the right tool for the job you want to do?
Your problem essentially boils down to that HTML Purifier is designed to sanitise HTML, whereas you are feeding it Twig, a templating mark-up language. It contains some HTML, but that's not the same thing as being HTML (much like HTML can contain plain text, but is not the same thing as plain text).
Why is this happening?
The reason it's doing what you're observing is that much of HTML Purifier's strength in the sanitising department comes from being strict about the structure of the HTML that is fed to it. That way, exploits that depend on implementation details in browsers which lay outside the standard (such as in this case how to handle text in an unordered list (<ul>) that is not a list item (<li>)) are also taken care of, reducing the attack surface.
In this particular case, the chance that anything would break by allowing this constellation is so small as to be negligible, but one can imagine other constellations where it does matter (e.g. imagine someone writing <img>some payload here</img> - that makes no sense in HTML, and I know of no exploit in the wild right now that looks anything like this, but one could imagine a browser trying to get clever with it).
Either way, it's an integral part of HTML Purifier and you can't simply turn it off, as all of the sanitation rules HTML Purifier has essentially exist on top of having well-formed HTML, for the mentioned reason.
Solutions
A: Question the use-case
Depending on what your penultimate use-case for sanitation is, the solution may be as simple as to put the purification after your Twig template has been turned into HTML, but before the result is displayed on the page. This has the added benefit of purifying e.g. the comments that are injected into your template.
That said, this may have no relation to what you're actually hoping to achieve.
B: Use a different tool
If all you want to do is tidy the HTML in your templates rather than sanitise it, you may want to look into a different tool. I have no experience with tools that just tidy HTML and they may have the same shortfalls (even just wanting to produce valid HTML is going to have that effect - but perhaps there are tools out there which only indent the tags and fix up obvious tag errors like removing a stray </img> somewhere).
If you want to sanitise your HTML, you can try a different tool as well. Take a look at http://htmlpurifier.org/comparison for some ideas?
C: Alter HTML Purifier's HTML definition
You can fork HTML Purifier and make changes to its understanding of HTML. This is really only feasible if the example in your post does not have many cousins, i.e. if there are not many completely different constellations where the insistence on well-formed HTML gets in the way. In the example you mentioned, this likely requires digging into the guts of HTMLPurifier_HTMLModule_List and HTMLPurifier_ChildDef_List, specifically into the else-block in validateChildren() from the latter class, but I have no proof-of-concept on hand right now.
Keep in mind what you'd be doing here is essentially turning the HTML definition that HTML Purifier works with into a rudimentary Twig definition. Not only is that potentially a lot of work (depending on how much you want to teach it), it's probably not actually what you want to do.
Conclusion
I'd recommend asking yourself a few questions and taking action based on the answers (the information in brackets exists as a guide to those actions, the thoughts there are not exhaustive):
Is it essential for you to have clean templates or clean output? (If templates, HTML Purifier can't help you, as it's not made for Twig; if output, HTML Purifier can help you.)
Do you want to prevent XSS attacks? (If you do, HTML Purifier can help you, but only after Twig has done its thing and constructed HTML for it to analyse.)
Do you want to catch invalid HTML declarations? (If you do, again HTML Purifier can help you, but also only after Twig has done its thing.)
Do you want to catch invalid Twig declarations? (If you do, HTML Purifier cannot help you - it might make sense to look for a Twig-specific validation tool.)
There are other questions you can ask yourself, but I hope those provide a useful starting point.

What text processing tool is recommended for parsing screenplays?

I have some plain-text kinda-structured screenplays, formatted like the example at the end of this post. I would like to parse each into some format where:
It will be easy to pull up just stage directions that deal with a specific place.
It will be easy to pull up just dialogue belonging to a particular character.
The most obvious approach I can think of is using sed or perl or php to put div tags around each block, with classes representing character, location, and whether it's stage directions or dialogue. Then, open it up as a web-page and use jQuery to pull out whatever I'm interested in. But this sounds like a roundabout way to do it and maybe it only seems like a good idea because these are tools I'm accustomed to. But I'm sure this is a recurring problem that's been solved before, so can anybody recommend a more efficient workflow that can be used on a Linux box? Thanks.
Here is some sample input:
SOMEWHERE CORPORATION - OPTIONAL COMMENT
A guy named BOB is sitting at his computer.
BOB
Mmmm. Stackoverflow. I like.
Footsteps are heard approaching.
ALICE
Where's that report you said you'd have for me?
Closeup of clock ticking.
BOB (looking up)
Huh? What?
ALICE
Some more dialogue.
Some more stage directions.
Here is what sample output might look like:
<div class='scene somewhere_corporation'>
<div class='comment'>OPTIONAL COMMENT</div>
<div class='direction'>A guy named BOB is sitting at his computer.</div>
<div class='dialogue bob'>Mmmm. Stackoverflow. I like.</div>
<div class='direction'>Footsteps are heard approaching.</div>
<div class='dialogue alice'>Where's that report you said you'd have for me?</div>
<div class='direction'>Closeup of clock ticking.</div>
<div class='comment bob'>looking up</div>
<div class='dialogue bob'>Huh? What?</div>
<div class='dialogue alice'>Some more dialogue.</div>
<div class='direction'>Some more stage directions.</div>
</div>
I'm using DOM as an example, but again, only because that's something I understand. I'm open to whatever is considered a best practice for this type of text-processing task if, as I suspect, roll-your-own regexps and jQuery is not the best practice. Thanks.
You could use Celtx to import plain text scripts and export them to HTML (and RDF/XML for the metadata) (see this related thread and this blog post, which describes the file structure).
Other screenplay editors like Trelby might offer this feature, too.
There is also Fountain, a plain text markup language for screenwriting. They offer libraries which you might (I did not check if they offer something for importing and converting) use for your cause:
Fountain is free and open-source, with libraries that make it easy to add support in your apps.
Even if those projects can’t be used for your cause, you could at least reuse their format for your output.
If your input is not too noisy, i.e. if you can trust some regularities like the indentation which is larger for dialogs as opposed to comments, I would use a simple Context Free Grammar. You have good implementations in all languages and you'll find lot of information on SO.
If your input varies a lot, then take the machine learning route, but you'll need to have a big number of inputs with human-validated output for training, which might be a hassle.
In any case, I would never, ever use regular expressions for problems like that.

How to embed blocks of data in ExpressionEngine without using many channels

I have used Drupal and think I'm doing it wrong with EE. I want to create many blocks of embedded User created entries in some of the templates, but don't want to have to create a channel for each one. In Drupal I could create a block specific to the client's needs, but I'm stumped on how to do this in EE.
For example, I have three different content areas on the home page, top/middle and bottom. Client doesn't want to roll out blog entries, they want specific content put in each one. The only way I see is I'd need to create three different channels and embed as such for top, changing channel to middle and bottom for each block. Is there a better way?
{exp:channel:entries channel="top" disable="categories|member_data|pagination" limit="1"
sort="desc" dynamic="no" }
Would I use category group and categories to do this? Meaning, I would create top, middle and bottom categories to call out those entries in my "home" channel?
For less than 1 hour of billable work, you'll get hundreds if not thousands of hours of effort packaged up for you to run with. Someone always pays for code, why not you this time? :)
The solution you have found does work - but I've found that ultimately it does not offer the flexibility needed by many clients.
I've used the following solution for many sites and clients have been pleased with it.
1) Define your block data as channels. For example I often have a Sidbar Ad, Sidebar Scripts, and Sidebar text channels.
2) Use a playa field-type (or another relational field-type) to create relationships from a parent entry (a page) to theses sub content types.
This normally looks something like this on the backend:
3) You can now use the parent entry to display the sub content. You'll of course need to pull all this data into your templates with something like the following:
<div id="right-side">
{exp:playa:children}
{if channel_short_name == 'sidebar_javascript'}
{cf_sidebar_js}
{/if}
{if channel_short_name == 'sidebar_videos'}
{exp:channel_videos:videos entry_id="{entry_id}" embed_width="300" embed_height="238"}
<h4>{title}</h4>
{video:embed_code}
<p class="caption">{video:title}</p>
{/exp:channel_videos:videos}
{/if}
{if channel_short_name == 'sidebar_ads'}
{exp:adman:show group="{cf_sidebar_adman_block}" order="RANDOM" limit="{cf_sidebar_adman_block_number_of}"}
<a href="{ad_url}" target="_blank">
<img src="{ad_image}" alt="{ad_alt}" />
</a>
{/exp:adman:show}
{/if}
{/exp:playa:children}
</div>
We generally make a channel called something like "general content" with a single field that can have any kind of native formatting (none or xhtml would mostly be used) and then use it for one-off bits that don't fit into other channels. It's hard for the client to find these entries in the CP for editing, so we make front-end "edit" links that open the correct entry in the CP and are visible only to member groups with content editing permissions.
This will only get hairy if you really need multiple customized fields for this use.
I have never used Low Variables, but I am under the impression that it could be useful here.
While I agree with the posters talking about the value of add-ons, this is a particular need that I have never had any problem solving natively. Besides the issue of the cost of add-ons (which IMO is worthwhile) you also add complexity to your installation the more software you add to it, making it more time consuming to troubleshoot bugs and to upgrade EE.

Resources