Create dummy content

Create dummy content - contentful

Using a free Contentful account I've created a New Space and a Content model named Post. Each Post can have a variety of data types for field
and the one's I'm using are
title - Text (short)
subtitle - Text (short)
author - Text (short)
slug - Text (short)
image - Media
content - Rich text
Now, when it comes to the creation of content, I can add Posts manually by filling out the information in the following form and clicking the green button "Publish".
While this is ok if I'm creating one post, if I want to create 50 posts that'd take too long (even duplicating the Posts because they become Drafts and need to still be duplicated, slightly edited and published). How can this be automated?

Contentful DevRel here. 👋
To create a lot of entries and new data we provide the Content Management API (CMA). This API's purpose is to perform WRITE operations in your Contentful space.
One way to create hundreds of new entries you can use this WRITE API and write a custom script to create all of them.
For example in Node.js that could look like follows:
// Call this in a loop 👇
client.getSpace('<space_id>')
.then((space) => space.createEntryWithId('<content_type_id>', '<entry_id>', {
fields: {
title: {
'en-US': 'Entry title'
}
}
}))
.then((entry) => console.log(entry))
.catch(console.error)
If you don't want to define the ID yourself you can use createEntry. (I just figured the docs are missing that one and will fix it 🙈).
Another way to approach this would be to not use the "vanilla CMA". We provide tooling to e.g. import/export all the data that you have in your Contentful spaces. These tools of the ecosystem sit on top of the CMA and abstract the API calls away.
There is import/export tooling that you can use as a provided npm package or CLI tooling. This tools can be useful if you don't want to write a script in your programming language of choice.
What you could do is to export a space to a JSON file and then adjust the JSON file with the entries that you want to create. You could then use the import command to create a bulk of entries based on the JSON file.
contentful space export
# adjust the content file
contentful space import --content-file <file>
I hope that helps, let me know how it goes. :)

Related

Azure Media Services: Provide custom file names for the asset files

I'm encoding a video file using the built-in adaptive streaming transform. Once the file is successfully processed, an asset container is created with the below files:
Is it possible to provide custom file names at the time a job is created? It seems that the default behavior is to take a certain number of characters from the original file name and prepend them in the above file names. If possible, I'd like to configure this behavior.
P.S. I'm using the .NET SDK.

You can create a custom transform to output file names differently. On https://learn.microsoft.com/en-us/rest/api/media/transforms/createorupdate#definitions search for the Mp4Format section. In that you can specify the filenamePattern with certain macros like {Bitrate} and {Codec}.
See https://learn.microsoft.com/en-us/azure/media-services/latest/custom-preset-cli-howto for an example custom transform and the process by which to create it in Media Services.

I use the macros on my jobs, they work ok. I have a process that takes 3 videos (an intro section, the actual content, and the outro section) and encodes them as one single video. The issue I have with the macros is that it uses the file name of the first video in the inputs. So it ends up using the file name of the intro video which is a generic name. They need to have a way where we can have a little more control.
I suppose I could copy/rename the intro video to a desired name before I encode and it would pick it up, but that seems to be a little bit of overkill.
The Macros are good, but they could use some enhancements I think.

Extracting badly formatted text from a PDF

I'm trying to extract some entries from a PDF, but the bad formatting is making it inconvenient to simply parse through like a normal document. There isn't any consistent positioning for the text, so each entry is a unique scramble with no consistent pattern I can find. I only want the entry name and the info on the right, not the field name or description.
I've tried experimenting with headers and layout info using the PyPDF2 Module but there doesn't seem to be any metadata for the PDF besides basic author info.
My idea was using the Google Cloud Vision API to transcribe the text, but that brings up issues of auto-positioning.
Does anyone know of a better methodology for this, or if not, simply how to execute the positioning for the Cloud Vision API?

Extract Keywords from Office Documents with Sharepoint Flow

I am trying to implement a document management system using Sharepoint. One major issue is that colleagues cannot find documents in the current setup (local fileserver). They have asked that we have a system that scans uploaded documents and automatically looks for keywords in them and then populates a "Meta" column.
I have had sort of success with OCR on image files, but getting keywords out of office documents (doc, xls etc.) I have had no success until now.
Is there a way to setup a flow to do this task for me?
any help is much aprechiated.
i tried "Get file metadata" and Azure "Text analysis", but it seems to take the raw data of the files (XML I assume) and returns that the document is to large to analyse.

There is something vague about this requirement - how is a keyword defined in a document?
Therefore, first obvious solution would be to assign keywords for each file upon uploading it. You may create a process for this with flow - have tasks, reminders and so on.
Automating this with OCR first means that you need to user OCR that works with MS flow you have only one choice - ElasticOCR. Then, in your flow
- feed the document content to the ElasticOCR action
- keep in mind that OCR is not 100% accurate
- analyze the generated text content according to your keyword definition
- finally write the meta back to the library in the corresponding columns.
Having worked on a similar requirement, we asked uploaders to publish their documents with a short abstract(column from the content type). The assumption is the abstract contains the keywords and is stored in a multi-line column - making it searchable site wide.

How to export content items from Orchard CMS to CSV?

I have a requirement to export every record from a content type in Orchard CMS to a CSV file that can be opened in Microsoft Excel.
I was surprised to see that this feature is not available out of the box or available as a module from the gallery. The built in export functionality in Orchard provides a custom xml format, great for moving content between Orchard sites but doesn't help me get the content in front of users that want to see it in excel.
A very simple link in the back end of Orchard that allows me to download a CSV file for a particular content type would suffice.
Does anyone know how easiest to achieve this in Orchard?

As far as I know there is nothing like that in Orchard. But it's easy to do, you could do it and share it as a module. The "only" thing that you have to do is serialize the ContentItems and its ContentParts/Fields to that format. If you look at the source, you will see a lot of examples for serializing content items (e.g. json/xml)
With a quick search I found this CSV serializer: http://simplecsv.codeplex.com

Export text (MediaWiki markup) from MediaWiki installation

I want to export the MediaWiki markup for a number of articles (but not all articles) from a local MediaWiki installation. I want just the current article markup, not the history or anything else, with an individual text file for each article. I want to perform this export programatically and ideally on the MediaWiki server, not remotely.
For example, if I am interested in the Apple, Banana and Cupcake articles I want to be able to:
article_list = ["Apple", "Banana", "Cupcake"]
for a in article_list:
get_article(a, a + ".txt")
My intention is to:
extract required articles
store MediaWiki markup in individual text files
parse and process in a separate program
Is this already possible with MediaWiki? It doesn't look like it. It also doesn't look like Pywikipediabot has such a script.
A fallback would be to be able to do this manually (using the Export special page) and easily parse the output into text files. Are there existing tools to do this? Is there a description of the MediaWiki XML dump format? (I couldn't find one.)

On the server side, you can just export from the database. Remotely, Pywikipediabot has a script called get.py which gets the wikicode of a given article. It is also pretty simple to do manually, somehow like this (writing this from memory, errors might occur):
import wikipedia as pywikibot
site = pywikibot.getSite() # assumes you have a user-config.py with default site/user
article_list = ["Apple", "Banana", "Cupcake"]
for title in article_list:
page = pywikibot.Page(title, site)
text = page.get() # handling of not found etc. exceptions omitted
file = open(title + ".txt", "wt")
file.write(text)
Since MediaWiki's language is not well-defined, the only reliable way to parse/process it is through MediaWiki itself; there is no support for that in Pywikipediabot, and the few tools which try to do it fail with complex templates.

It looks like getText.php is a builtin server-side maintenance script for exporting the wikitext of a specific article. (Easier than querying the database.)
Found it via Publishing from MediaWiki which covers all angles on exporting from MediaWiki.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Create dummy content - contentful

Related

Azure Media Services: Provide custom file names for the asset files

Extracting badly formatted text from a PDF

Extract Keywords from Office Documents with Sharepoint Flow

How to export content items from Orchard CMS to CSV?

Export text (MediaWiki markup) from MediaWiki installation

Categories

Resources