Markdown to HTML conversion - wmd

I'm still in the middle of coding my final year project at university, and I have come across an issue where I need to either convert from HTML to Markdown or visa versa. Now I have no experience whatsoever of Perl, Python, etc. so I'm in need of an easy-to-implement solution, I only have about 6 weeks left to complete this now. I'm writing the data from a WMD text box to SQL Server, and I can either upload it as Markdown or HTML but if that data needs editing it cannot be in HTML as this would be too confusing for the end user who is perceived to have zero/very little computing "know how".
What should I do?

Karmastan's answer is probably the best here. Keeping the raw Markdown in the database is a really good solution as it allows users to upkeep the content in a form with which they're familiar.
However, if you have a bunch of HTML which is already converted, you might want to look at something like Markdownify: The HTML to Markdown converter for PHP.
Edit: based on what you've said below, there are a few things you should keep in mind:
Make sure that the following is set in wmd.js:
wmd_options = {"output": "Markdown"};
This ensures that you're storing Markdown in the database.
Source: How do you store the markdown using WMD in ASP.NET?
When outputting the Markdown to the web, you need to transform it to HTML. To do this, you'll need a library which does Markdown -> HTML conversion. Here are two examples:
Announcing Markdown.NET
Revisied Markdown.NET Library
I'm not a .NET developer, so I can't really help with how these libraries should be used, but hopefully the documentation will make that clear.

If you look at the web site for Markdown, you'll find a Perl script that converts Markdown-syntax documents to HTML. Keep Markdown text in your database and invoke the script whenever you need to display the text. No Perl knowledge required!

Related

How to make advanced CMS-like blog system from scratch?

When you create a news or blog tab with CMS it's really easy to make a feed of posts with content preview. Also when you follow a link to a particular post you can notice that it consists of a different html tags and css styling and not just plain text. It just uses rich text editor. So just getting text from db is not enough.
My question is how to achieve the same result when making a website from scratch. It doesn't matter what language is used for back-end. I'm just interested in the idea how to do it. But if you could provide a code examples (with any language) it would be greatly appreciated
Ok I've figured it out. Posting the answer for somebody who will have the similar question in the future.
The idea is that you need to put a text with html tags into your database and then to retrieve it you need to put it in your desired div in unescaped state. The thing is that almost all view (template) engines escape html tags by default. To do that you have to use some built in functions specific to that view engine.
To put the article with html tags in db you can just write raw html into input field or you can somehow add richtext editor to input field. Richtext editor will generate html for you.
I've researched it and found out that that's exactly how cms work.
So there you have it. If you want to add something feel free to do it

Google Docs: Table of content page numbers

we are currently building an application on the google cloud platform, which generates reports in Google Doc. For them, it is really important to have a table of content ... with page numbers. I know this is a feature request since a few years and there are add-ons (Paragraph Styles +, which didn't work for us) that provide this solution, butt we are considering to build this ourselves. if anybody has a suggestion on how we could start with this, it would be a great help!
thanks,
Best bet is to file a feature request on the product forums.
Currently the only way to do that level of manipulation of a doc to provide a custom TOC is to use Apps Script. It provides access to the document structure sufficient enough to build and insert a basic table of contents, but I'm not sure there's enough to do paging correctly (unless you force a page break on ever page...) There's no method to answer the question of "what page is this element on?"
Hacks like writing to a DOCX and converting don't work because TOCs are recognized for what they and show up without page numbers.
Of course you could write a DOCX or PDF with the TOC as you'd like and upload as a blob rather than as a Google Doc. They can still be viewed in Drive and such.

Any way in Expression Engine to simulate Wordpress' shortcode functionality?

I'm relatively new to Expression Engine, and as I'm learning it I am seeing some stuff missing that WordPress has had for a while. A big one for me is shortcodes, since I will use these to allow CMS users to place more complex content in place with their other content.
I'm not seeing any real equivalent to this in EE, apart from a forthcoming plugin that's in private beta.
As an initial test I'm attempting to fake shortcodes by using delimited strings (e.g. #foo#) in the content field, then using a regex to pull those out and pass them to a function that can retrieve the content out of EE's database.
This brings me to a second question, which is that in looking at EE's API docs, there doesn't appear to be a simple means of retrieving the channel entries programmatically (thinking of something akin to WP's built-in get_posts function).
So my questions are:
a) Can this be done?
b) If so, is my method of approaching it reasonable? Or is there something stupidly obvious I'm missing in my approach?
To reiterate, my main objective here is to have some means of allowing people managing content to drop a code in place in their content that will be replaced with channel content.
Thanks for any advice or help you can give me.
Here's a simple example of the functionality you're looking for.
1) Start by installing Low Replace.
2) Create two Global Variables called gv_hello and gv_goodbye with the values "Hello" and "Goodbye" respectively.
3) Put this text into the body of an entry:
[say_hello]
Nice to see you.
[say_goodbye]
4) Put this into your template, wrapping the Low Replace tag around your body field.
{exp:low_replace
find="[say_hello]|[say_goodbye]"
replace="{gv_hello}|{gv_goodbye}"
multiple="yes"
}
{body}
{/exp:low_replace}
5) It should output this into your browser:
Hello
Nice to see you.
Goodbye
Obviously, this is a really simple example. You can put full blown HTML into your global variable. For example, we've used that to render a complex, interactive graphic that isn't editable but can be easily dropped into a page by any editor.
Unfortunately, due to parse order issues, EE tags won't work inside Global Variables. If you need EE tags in your short code output, you'll need to use Low Variables addon instead of Global Variables.
Continued from the comment:
Do you have examples of the kind of shortcodes you want to support/include? Because i have doubts if controlling the page-layout from a text-field or wysiwyg-field is the way to go.
If you want editors to be able to adjust layout or show/hide extra parts on the page, giving them access to some extra fields in the channel, is (imo) much more manageable and future-proof. For instance some selectfields, a relationship (or playa) field, or a matrix, to let them choose which parts to include/exclude on a page, or which entry from another channel to pull content from.
As said in the comment: i totally understand if you want to replace some #foo# tags with images or data from another field (see other answers: nsm-transplant, low_replace). But, giving an editor access to shortcodes and picking them out, is like writing a template-engine to generate ee-template code for the ee-template-engine.
Using some custom fields to let editors pick and choose parts to embed is, i think, much more manageable.
That being said, you could make a plugin to parse the shortcodes from a textareas content, and then program a lot, to fetch data from other modules you want to support. For channel entries you could build out of the channel data library by objectiveHTML. https://github.com/objectivehtml/Channel-Data
I hear you, I too miss shortcodes from WP -- though the reason they work so easily there is the ubiquity of the_content(). With the great flexibility of EE comes fewer blanket solutions.
I'd suggest looking at NSM Transplant. It should fit the bill for you.
There is also a plugin called Shortcode, which you can find here at
Devot-ee
A quote from the page:
Shortcode aims to allow for more dynamic use of content by authors and
editors, allowing for injection of reusable bits of content or even
whole pieces of functionality into any field in EE

How to parse html in a client-side script?

What's the best way to create scripts for a browser?
I need to parse some html pages on different domains
I am on windows and use firefox most of all.
If it's just about retrieving the pages to do whatever you want with it, the built-in urllib module in python will do that for you.
It sounds like you want to retrieve webpages and parse them to extract meaningful data? I would suggest something like TagSoup (for Java) which fires off nice SAX events which you can use directly, or using an XML module of your choice (raw DOM, JDOM, dom4j, XOM, etc...). The TagSoup page also lists a number of references for other languages, suck as Beautiful Soup for Python, Rubyful Soup for Ruby and others.
From there, I would suggest using something like XPath to retrieve the bits of data that you want. Another option would be XSLT to transform the HTML into some unified format that you can more easily manipulate.
I'd recommend Synthetics Web. Here is a working example at jsFiddle.
jsFiddle
http://jsfiddle.net/dwayne05/YkLVw/
Synthetics Web
http://www.syntheticsweb.com/

Creating PDF Invoices - Are there any templating solutions? [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
Questions asking us to recommend or find a book, tool, software library, tutorial or other off-site resource are off-topic for Stack Overflow as they tend to attract opinionated answers and spam. Instead, describe the problem and what has been done so far to solve it.
Closed 8 years ago.
Improve this question
Our company is looking to integrate invoices into a new system we are developing.
We require a solution to create a layout of the invoice and then convert to pdf.
We have considered just laying out the invoice in html/css then converting to pdf.
We have also considered using SVG->PDf conversion.
Both of these solutions integrate well into our existing templating language used for our web application.
Historically we have been a Microsoft based business and used Crystal Reports for such a task but we are looking for an open source Linux solution for this project.
Does any one have any suggestions of an approach or technology we could use for such a task?
Try this... create a blank invoice with Word (or whatever you want) and save it as a PDF.
Then use a PDF library to modify the PDF (insert the text at particular coordinates). We do this in the Microsoft world and it is extremely easy.
The biggest benefit is that we can use our own tools to create and modify the template. If we want to add some static text, we just crank open Word, make the change and save it to a PDF file (that is being used as a template).
For Microsoft, we use iTextSharp which is actually a C# port of the original Java version of iText
Additionally...
You can use Adobe Acrobat to insert fields in the PDF (address, phone, invoice number, line item 1, line item 2, etc...) and then use iText/iTextSharp to populate these fields at run time.
This is, in more detail, what we do... and it is extremely easy.
The normal way is to install (La)TeX (probably already on the linux box) and run pdflatex to get the pdfs. You can also use Apache FOP, if you prefer xslt and xsl-fo.
If the number of invoices to create is low you might want to use open-office (directly or as a toolkit).
If you want high-precision positioning and low-level access, a low-level pdf library (I don't know if iTextSharp works with mono) might be what you want.
I would try out LaTeX first, because it allows you to get results with the least effort.
I've previously produced invoices by templating a PostScript file, and then using Ghostscript's ps2pdf to convert those into PDFs.
We use Reportlab with Python. If you look around there are a load of ready-made forms/invoices/etc.
There are several OSS reporting engines (Jasper Reports, Pentaho and BIRT to name three) that you could use in much the same way as you have historically used Crystal Reports. One of the other posters mentions ReportLab, which is an option if you're using Python or can embed a Python runtime in your application.
Probably the most flexible solution is to create XMLs with invoice data and then by using XSLTs transform the, into PDFs, HTMls, whatever...
It depends on your environment. If you have access to Java, you might look at iText (http://www.lowagie.com/iText/), a library that allows you to generate PDF files on the fly.
There are two steps, if i understood correctly:
1) Creation of PDF template with placeholders to populate data programmatically
2) Populating the PDF template programmatically during run time
For #1, OpenOffice allows creation of PDF templates, which can then be populated programmatically. It's good enough to create simple invoices that doesn't probably involve datagrid/table kind of stuff.
For #2, you already have the answers here - iText, iTextSharp.
Hope this helps!
I love wkhtmltopdf http://code.google.com/p/wkhtmltopdf/
Not sure what your goal is here, but there is an opensource php-library called fpdf, which also has an extension for taking a pre-made pdf as layout and then populate it with more content, generating a new PDF with that info.
However, I would go for a solution that you can integrate nicely into the plattform you're building, but I wouldn't go in a HTML->PDF solution since you won't have any clue about what would fit on a piece of paper regarding sizes in that kind of enviroment, meaning you won't know when you should split the content into two separate templates.
You might also try using XSL:FO. XSL:FO is a documented standard for describing page layout: http://www.w3.org/TR/xsl/#fo-section.
I've had success on two projects creating documents by creating an XML schema that defines the content of the "PDF". I then use the XSD tool (from Microsoft) to generate a class representing this document. I then map my data into that structure, serialize the populated class to XML, along with an XSL stylesheet that defines how that data should be mapped into FO, and pass it to an FO formatter. For formatters, I have use Alt-Soft's Xml2Pdf with success. There are a few others out there. There are some tools available to help create the XSL to FO stylesheet (i.e. stylusstudio and XmlSpy), but I recommend learning the FO constructs as the tools seem to produce bloated stylesheets. FO is comparable to HTML (where a P tag is a BLOCK tag in FO), but can be tricky. This nice thing about FO, is that some formatter support conversion to other formats, such as Word, HTML, etc.
Other options:
iTextSharp (C# port of iText). Just started reading about this. Open source and free. I don't think there is any "templating" supported with this, but I could be wrong about that.
SQL Server Reporting Services. Assuming your invoice data is in, or can be put in, a format that can be read by reporting services (SQL Server, Web Service, etc), define the layout in SSRS and then publish to reporting server. Use SSRS Web Services or query parameter execution to execute the report and have it output as PDF.
This html-2-pdf site may be a helpful starting point: http://maarten.lippmann.us/?p=101
A site a friend of mine built uses a script to churn HTML pages into printable PDFs, too - http://philambdaupsilon.org. Not sure on the exact details of it, but he is an SO user, and I'll send this question to him, too.
Unfortunately, the best system on the market (at present) is passing the HTML & CSS to a ColdFusion server and have that return the rendered PDF. So if money isn't a big concern, this is the quickest to deploy solution that'll render the best results.
I've tried very hard to get FPDF, TCPDF, the R&OS pdf class, and even CodeIgniter's recommendation to work, but nothing with stable output for anything beyond the most basic/bland HTML files.
Honestly, if the ColdFusion solution isn't viable, I'd use html2ps, and then ps2pdf to convert your files into a PDF.
(This is all assuming that you don't want to take the time and design each PDF using the native PDF-creator code in PHP. This is what systems like SugarCRM use. Though its very functional with stable results, the actual creation of each PDF-generator file is a most painful process)
We have used Jasper Reports before. It's not what you'd call user-friendly, but it will talk directly to your database.
html2pdf works very well. You can use this to generate both HTML and PDF reports from the same source.
I'm fiddling with Black Sheep Invoices right now, which is great at first but now I'm having trouble actually getting it to render the PDFs. Lots of installation difficulties--probably a lot easier on your own server but i'm up on a shared host with it. The HTML output and data management portions are well done though, which is something you won't get out of just creating a postscript template. I was hoping to find a reference to a library that has an active development team though (Black Sheep is not being updated at this time).
If you want browser perfect HTML converted to PDF then try commandlineprint
You'll need to install firefox on a linux distro, disable all firefox alerts and then run it through a virtual display. Check this thread for more details.
It's infuriating to get running well but does give you the best results for HTML to PDF conversion I've seen.
OK, a search of Google Code projects turned up Simple Invoices, which is awesome and well maintained.
I use TROFF for my invoices because of its extremely simple textual encoding. The logic is a few lines of Perl. Keeping it simple.
For a Ruby solution, try Prawn: http://prawn.majesticseacreature.com/
I use open office on the server and then generate the XML for the document (just unzip the document and hack away)
Some can use Dhek template editor to define area/placeholder for existing PDF, without altering existing document, and then populate it to generate final doc (e.g. with user values from a form): https://github.com/applicius/dhek .

Resources