Prevent asciidoctor from doing ellipsis replacement in inline literal - asciidoctor

I'm trying to render the following chunk with asciidoctor (PDF, EPUB and MOBI output formats):
* `versionMin...versionMax`
* `versionMin...<versionMax`
I want this to appear as a bullet list where the text is monospaced. The problem I have at the moment is that I can't seem to prevent asciidoctor from replacing the ... with ellipses (they end up vertically centred). I've tried various incantations of pass[], +...+, etc. but can't seem to find a combination that works. Anyone able to show how to prevent the ellipsis replacement without interfering with any of the other processing?

Here are three solutions that worked for me (for your first bullet):
* `versionMin\...versionMax`
* `versionMin&period;..versionMax`
* `versionMin...versionMax`
My first solution is discussed in https://asciidoctor.org/docs/user-manual/#preventing-substitutions.
The second solution uses a named entity and the third uses a numeric entity. Info about HTML character entities is at https://wikipedia.org/wiki/List_of_XML_and_HTML_character_entity_references#Character_entity_references_in_HTML.

Related

NLP Challenge: Automatically removing bibliography/references?

I recently came across following problem: When applying a topic model on a bunch of parsed PDF files, I discovered that content of the references unfortunately also counts for the model. I.e. words within the references appear in the tokenized list of words.
Is there any known "best-practice" to solve this problem?
I thought about a search strategy where the python code automatically removes all content after the last mention of "references" or "bibliography". If I would go by the first, or a random mention of "references" or "bibliography" within the full text, the parser might not capture the true full content.
The input PDF are all from different journals and thus have a different page structure.
The syntax is what makes a bibliography entry distinct from a regular sentence.
Test for the pattern that coincides with whatever (or multiple) reference styles you are trying to remove.
Aka date, unquoted string, string, page numbers in a certain format.
I'd spend some time searching for a tool that already recognizes bibliography before doing this, as it will be unique to each style (MLA etc.)
Couple of additional features to consider for detecting the start of reference setion
Check if the mention of "references" or "bibliography" is in the last pages as opposed to earlier pages
Run entity recognition on some length of words (~50?) after the word and if a high number of tokens in the 50 are entities, that indicates journal names, author names, etc.

How to refer to an equation in a different page with Mathjax

I have several html pages with a lot of equations rendered by Mathjax. I would like to refer to several equations not especially in the same page.
The problem is that "\eqref" is only working if the equation is in the same page.
One solution would be to use "\tag{}" for all equations and use simple html links.
For example,
\label{myequation} \tag{4.1}
and the link would be
" 4.1 "
The problem with that solution is that I need to have personal tag for all equations.
Does anyone has a solution for referring to equations in different pages by producing tags automatically ?
See the documentation for automatic equation numbering. This allows you to have your equations numbered automatically rather than requiring \tag. Note that any numbered or tagged equation automatically gets an ID that can be used in a link, so there is no requirement to use \label (though it helps if you are adding and removing equations during editing). If an equation is numbered 5, then the id is mjx-eqn-5. There are functions that you can override in order to change the format of the equation numbers. See the equationNumbers section of the TeX configuration parameters documentation.
Edit: Now that I understand your request, I can tell you that to do what you ask for would require MathJax to know the label-to-tag mapping for the other pages in your site (not just the page currently being viewed). MathJax doesn't have that data available to it. The only way to do it would be to create data files for the other pages that includes that data so that MathJax could load it and have it available. While technically possible, it would be a bit cumbersome to do that, especially since JavaScript (normally) can't write files for you.
On the other hand, the only place the label-to-tag mapping is needed is for the text that acts as the link itself (that is, if eq:sum refers to equation 4.1, then you would want "4.1" to be the text that links to that equation. Note that the link itself will be to myotherpage#eq:sum, so the only thing that needs the number is the link text itself. If you were willing to use something like "[sum]" as the link text for eq:sum (e.g., "In equation [sum] we see..."), then you would not need the label-to-tag mapping, and \eqref could be modified to handle that. But if you really want the equation number as the link, you would need a lot more infrastructure to make that happen.

How to keep headline and text in the same column using ColumnText?

I'm using iText to create a PDF file. While testing, a situation occurred where a headline was printed at the bottom of a column but the appendant text in the next.
The easy solution paragraph.setKeepTogether( true ); results in to much white space (see additional info below). Here is an image showing the current situation on the left and what I'd like to get on the right:
current and wanted situation (reputation...)
One of my biggest problems is, columnText.getYLine() returns the lowest position in the "origin" column even if the text flows over to the next.
I looked through the examples on the iText site and all results on stackoverflow for "itext column" plus some blogs but did not find any solution to my problem, other than printing each article in a new column. A concise description of the problem in a few words would help me search myself as English isn't my first language.
additional info
This is part of the report generation in a telemedicine project. A page template is filled by a web front end. There are several post processing steps e.g. attaching images of ECG output. These need a high pixel density to be readable, which is why I use iText (afaik pdfbox scales without taking the density into consideration). Some time ago the physicians noticed they had to insert more text than there's space on one page. So they wanted the overflow in an appendix. I could use another lib. The importance is on high readability on paper and a licence like (l)gpl/apache/... The white space results in more pages hence lessens the overview and wastes paper.
The setKeepTogether() method isn't supposed to work in combination with ColumnText. As documented in my book (the one shown in my avatar), the ColumnText class can be used in similation mode to finetune the layout.
The idea is that you define the location of your content in a trial-and-error process. First you add content and you invoke go(true); This will consume the content in your column, but will not add any content to your document. You can use this to discover how many lines have been written (getLinesWritten()), to check if all the content could be rendered (hasMoreText()), and so on.
Suppose that you find out that all the text can be written using the available space, you add the content to the column once more, change the Y-position to its original value, and add the column for real: go();
If there's content left in the column, you should create a new column, add the content anew and then make a decision. Change the position of the column and go(); If there's content left, but the content can be broken in two parts, change the Y-position and go(); then change the position of the column and invoke go() once more to render the remainder of the content in a different column.
You'll find different ColumnText examples in chapter 3.

Any way in Expression Engine to simulate Wordpress' shortcode functionality?

I'm relatively new to Expression Engine, and as I'm learning it I am seeing some stuff missing that WordPress has had for a while. A big one for me is shortcodes, since I will use these to allow CMS users to place more complex content in place with their other content.
I'm not seeing any real equivalent to this in EE, apart from a forthcoming plugin that's in private beta.
As an initial test I'm attempting to fake shortcodes by using delimited strings (e.g. #foo#) in the content field, then using a regex to pull those out and pass them to a function that can retrieve the content out of EE's database.
This brings me to a second question, which is that in looking at EE's API docs, there doesn't appear to be a simple means of retrieving the channel entries programmatically (thinking of something akin to WP's built-in get_posts function).
So my questions are:
a) Can this be done?
b) If so, is my method of approaching it reasonable? Or is there something stupidly obvious I'm missing in my approach?
To reiterate, my main objective here is to have some means of allowing people managing content to drop a code in place in their content that will be replaced with channel content.
Thanks for any advice or help you can give me.
Here's a simple example of the functionality you're looking for.
1) Start by installing Low Replace.
2) Create two Global Variables called gv_hello and gv_goodbye with the values "Hello" and "Goodbye" respectively.
3) Put this text into the body of an entry:
[say_hello]
Nice to see you.
[say_goodbye]
4) Put this into your template, wrapping the Low Replace tag around your body field.
{exp:low_replace
find="[say_hello]|[say_goodbye]"
replace="{gv_hello}|{gv_goodbye}"
multiple="yes"
}
{body}
{/exp:low_replace}
5) It should output this into your browser:
Hello
Nice to see you.
Goodbye
Obviously, this is a really simple example. You can put full blown HTML into your global variable. For example, we've used that to render a complex, interactive graphic that isn't editable but can be easily dropped into a page by any editor.
Unfortunately, due to parse order issues, EE tags won't work inside Global Variables. If you need EE tags in your short code output, you'll need to use Low Variables addon instead of Global Variables.
Continued from the comment:
Do you have examples of the kind of shortcodes you want to support/include? Because i have doubts if controlling the page-layout from a text-field or wysiwyg-field is the way to go.
If you want editors to be able to adjust layout or show/hide extra parts on the page, giving them access to some extra fields in the channel, is (imo) much more manageable and future-proof. For instance some selectfields, a relationship (or playa) field, or a matrix, to let them choose which parts to include/exclude on a page, or which entry from another channel to pull content from.
As said in the comment: i totally understand if you want to replace some #foo# tags with images or data from another field (see other answers: nsm-transplant, low_replace). But, giving an editor access to shortcodes and picking them out, is like writing a template-engine to generate ee-template code for the ee-template-engine.
Using some custom fields to let editors pick and choose parts to embed is, i think, much more manageable.
That being said, you could make a plugin to parse the shortcodes from a textareas content, and then program a lot, to fetch data from other modules you want to support. For channel entries you could build out of the channel data library by objectiveHTML. https://github.com/objectivehtml/Channel-Data
I hear you, I too miss shortcodes from WP -- though the reason they work so easily there is the ubiquity of the_content(). With the great flexibility of EE comes fewer blanket solutions.
I'd suggest looking at NSM Transplant. It should fit the bill for you.
There is also a plugin called Shortcode, which you can find here at
Devot-ee
A quote from the page:
Shortcode aims to allow for more dynamic use of content by authors and
editors, allowing for injection of reusable bits of content or even
whole pieces of functionality into any field in EE

DataView component in SPD attaches needless strings to field values

In SharePoint Designer I use some lists as sources and then link them together with an operation GetListItems (I fetch items from multiple lists on different site collections for rollup/aggregation):
alt text http://img151.imageshack.us/img151/1807/ss20090428101310.png
Now something is fine as I managed to get the result: alt text http://img410.imageshack.us/img410/4835/ss20090428101013.png
But the strings that are attached to field result (6;#, 2;#) is... disturbing.
How can I get rid from those attached strings? They are not attached to all fields, but to some (important ones):
alt text http://img168.imageshack.us/img168/1647/ss20090428100732.png
Ahh, well usally that happens - you keep searching for answer, then seek for help and find it yourself.
I used substring xsl function, to strip away those first characters. Messy, if i want to add links to that table, but works.
alt text http://img2.imageshack.us/img2/3117/ss20090428102714.png
By the way, the main question how to rollup content from multiple site collections has been journey to me for several days already. If anyone is in the same situation, I recommend (well because I found myself an answer there) these:
How-To Rollup two lists in two site
collections on a page
Or a better way to use for a single
site collection: SharePoint
Customisation Tricks: Use The
SPDataSource, Luke! (Good links
inside that article).
Something I didn't touch, because I
didn't need such an advanced method,
but maybe someone does: Populating
data sources in code

Resources