How to parse markdown to json using remark - node.js

The remark site has a link to an AST explorer for the output of remark - https://astexplorer.net/#/gist/0a92bbf654aca4fdfb3f139254cf0bad/ffe102014c188434c027e43661dbe6ec30042ee2
What I cannot find is how to do the parsing to AST - all the examples convert to html.
I have this code
import {unified} from 'unified'
import remarkParse from 'remark-parse'
import remarkGfm from 'remark-gfm' // git flavoured markdown
const content = `
# My header
This is my content
- abc
- def
`;
unified()
.use(remarkParse)
.use(remarkGfm)
.process('# Hi\n\n*Hello*, world!')
.then((file) => {
console.log(String(file))
})
but am getting a couple of errors here that I do not know how to get around
[remark-gfm] Warning: please upgrade to remark 13 to use this plugin
file:///markdown/node_modules/unified/lib/index.js:520
throw new TypeError('Cannot `' + name + '` without `Compiler`')
^
TypeError: Cannot `process` without `Compiler`

You almost have it. Below I've simplified your code, removing unused and unnecessary parts.
import {unified} from 'unified'
import remarkParse from 'remark-parse'
let myResult = unified()
.use(remarkParse)
.parse('# Hi\n\n*Hello*, world!');
console.log(JSON.stringify(myResult, null, " "));
According to ChristianMurphy's GitHub Q/A:
unified.process() will try to take text, turn it into an AST, and back into text.
For your stated purpose, "...parsing to AST [JSON]", you don't need or want the "full-cycle process" that unified.process() is failing to complete, TypeError: Cannot 'process' without 'Compiler'. You merely want to parse the input and emit the syntax tree (AST) as JSON. The error here is because process(), after parsing your input (markdown string) and turning it into a syntax tree (AST), is then trying to "compile" it to some output format. However, you haven't supplied a compiler. But, as I understand your post, you don't need or want to compile the syntax tree into another output (language). So, change from process() to parse() and emit the resulting syntax tree as JSON.

Related

How to make the translations work with the Python 3 "format" built-in method in Odoo?

Since Python v3, format is the primary API method to make variable substitutions and value formatting. However, Odoo is still using the Python 2 approach with the %s wildcard.
message = _('Scheduled meeting with %s') % invitee.name
# VS
message = 'Scheduled meeting with {}'.format(invitee.name) # this is not translated
I have seen some parts of the Odoo code where they have used some workaround, isolating strings.
exc = MissingError(
_("Record does not exist or has been deleted.")
+ '\n\n({} {}, {} {})'.format(_('Records:'), (self - existing).ids[:6], _('User:'), self._uid)
)
But, does anybody know if there is a more convenient way to use the format method and make it work with translations?
_ return a string, so you can call format on it directly.
_("{} Sequence").format(sec.code)
or like this:
_("{code} Sequence").format(code=invitee.code)
when you export the translation in PO file you should see this for the second example:
# for FR translation you should not translate the special format expression
msgid "{code} Sequence"
msgstr "{code} Séquence"
and this for the first example:
msgid "{} Sequence"
msgstr "{} Séquence"
If you don't see this then Odoo must be checking that _() must not be followed by . so you can work around this by doing this for example by surrounding the expression by parentheses :
# I think + "" is not needed
( _("{} Séquence") + "").format(sec.code)
Because In python "this {}".format("expression") is the same as this ("this {}").format("expression")

Obtain the desired value from the output of a method Python

i use a method in telethon python3 library:
"client(GetMessagesRequest(peers,[pinnedMsgId]))"
this return :
ChannelMessages(pts=41065, count=0, messages=[Message(out=False, mentioned=False,
media_unread=False, silent=False, post=False, id=20465, from_id=111104071,
to_id=PeerChannel(channel_id=1111111111), fwd_from=None, via_bot_id=None,
reply_to_msg_id=None, date=datetime.utcfromtimestamp(1517325331),
message=' test message test', media=None, reply_markup=None,
entities=[], views=None, edit_date=None, post_author=None, grouped_id=None)],
chats=[Channel(creator=..............
i only need text of message ------> test message test
how can get that alone?
the telethon team say:
"This is not related to the library. You just need more Python knowledge so ask somewhere else"
thanks
Assuming you have saved the return value in some variable, say, result = client(...), you can access members of any instance through the dot operator:
result = client(...)
message = result.messages[0]
The [0] is a way to access the first element of a list (see the documentation for __getitem__). Since you want the text...:
text = message.message
Should do the trick.

MafftCommandline and io.StringIO

I've been trying to use the Mafft alignment tool from Bio.Align.Applications. Currently, I've had success writing my sequence information out to temporary text files that are then read by MafftCommandline(). However, I'd like to avoid redundant steps as much as possible, so I've been trying to write to a memory file instead using io.StringIO(). This is where I've been having problems. I can't get MafftCommandline() to read internal files made by io.StringIO(). I've confirmed that the internal files are compatible with functions such as AlignIO.read(). The following is my test code:
from Bio.Align.Applications import MafftCommandline
from Bio import SeqIO
from Bio.Seq import Seq
from Bio.SeqRecord import SeqRecord
import io
from Bio import AlignIO
sequences1 = ["AGGGGC",
"AGGGC",
"AGGGGGC",
"AGGAGC",
"AGGGGG"]
longest_length = max(len(s) for s in sequences1)
padded_sequences = [s.ljust(longest_length, '-') for s in sequences1] #padded sequences used to test compatibilty with AlignIO
ioSeq = ''
for items in padded_sequences:
ioSeq += '>unknown\n'
ioSeq += items + '\n'
newC = io.StringIO(ioSeq)
cLoc = str(newC).strip()
cLocEdit = cLoc[:len(cLoc)] #create string to remove < and >
test1Handle = AlignIO.read(newC, "fasta")
#test1HandleString = AlignIO.read(cLocEdit, "fasta") #fails to interpret cLocEdit string
records = (SeqRecord(Seq(s)) for s in padded_sequences)
SeqIO.write(records, "msa_example.fasta", "fasta")
test1Handle1 = AlignIO.read("msa_example.fasta", "fasta") #alignIO same for both #demonstrates working AlignIO
in_file = '.../msa_example.fasta'
mafft_exe = '/usr/local/bin/mafft'
mafft_cline = MafftCommandline(mafft_exe, input=in_file) #have to change file path
mafft_cline1 = MafftCommandline(mafft_exe, input=cLocEdit) #fails to read string (same as AlignIO)
mafft_cline2 = MafftCommandline(mafft_exe, input=newC)
stdout, stderr = mafft_cline()
print(stdout) #corresponds to MafftCommandline with input file
stdout1, stderr1 = mafft_cline1()
print(stdout1) #corresponds to MafftCommandline with internal file
I get the following error messages:
ApplicationError: Non-zero return code 2 from '/usr/local/bin/mafft <_io.StringIO object at 0x10f439798>', message "/bin/sh: -c: line 0: syntax error near unexpected token `newline'"
I believe this results due to the arrows ('<' and '>') present in the file path.
ApplicationError: Non-zero return code 1 from '/usr/local/bin/mafft "_io.StringIO object at 0x10f439af8"', message '/usr/local/bin/mafft: Cannot open _io.StringIO object at 0x10f439af8.'
Attempting to remove the arrows by converting the file path to a string and indexing resulted in the above error.
Ultimately my goal is to reduce computation time. I hope to accomplish this by calling internal memory instead of writing out to a separate text file. Any advice or feedback regarding my goal is much appreciated. Thanks in advance.
I can't get MafftCommandline() to read internal files made by
io.StringIO().
This is not surprising for a couple of reasons:
As you're aware, Biopython doesn't implement Mafft, it simply
provides a convenient interface to setup a call to mafft in
/usr/local/bin. The mafft executable runs as a separate process
that does not have access to your Python program's internal memory,
including your StringIO file.
The mafft program only works with an input file, it doesn't even
allow stdin as a data source. (Though it does allow stdout as a
data sink.) So ultimately, there must be a file in the file system
for mafft to open. Thus the need for your temporary file.
Perhaps tempfile.NamedTemporaryFile() or tempfile.mkstemp() might be a reasonable compromise.

In node.js, how do I validate UTF8 data in a buffer? [duplicate]

I just discovered that Node (tested: v0.8.23, current git: v0.11.3-pre) ignores any decoding errors in its Buffer handling, silently replacing any non-utf8 characters with '\ufffd' (the Unicode REPLACEMENT CHARACTER) instead of throwing an exception about the non-utf8 input. As a consequence, fs.readFile, process.stdin.setEncoding and friends mask a large class of bad input errors for you.
Example which doesn't fail but really ought to:
> notValidUTF8 = new Buffer([ 128 ], 'binary')
<Buffer 80>
> decodedAsUTF8 = notValidUTF8.toString('utf8') // no exception thrown here!
'�'
> decodedAsUTF8 === '\ufffd'
true
'\ufffd' is a perfectly valid character that can occur in legal utf8 (as the sequence ef bf bd), so it is non-trivial to monkey-patch in error handling based on this showing up in the result.
Digging a little deeper, it looks like this stems from node just deferring to v8's strings and that those in turn have the above behaviour, v8 not having any external world full of foreign-encoded data.
Are there node modules or otherwise that let me catch utf-8 decode errors, preferrably with context about where the error was discovered in the input string or buffer?
I hope you solved the problem in those years, I had a similar one and eventually solved with this ugly trick:
function isValidUTF8(buf){
return Buffer.compare(new Buffer(buf.toString(),'utf8') , buf) === 0;
}
which converts the buffer back and forth and check it stays the same.
The 'utf8' encoding can be omitted.
Then we have:
> isValidUTF8(new Buffer('this is valid, 指事字 eè we hope','utf8'))
true
> isValidUTF8(new Buffer([128]))
false
> isValidUTF8(new Buffer('\ufffd'))
true
where the '\ufffd' character is correctly considered as valid utf8.
UPDATE: now this works in JXcore, too
From node 8.3 on, you can use util.TextDecoder to solve this cleanly:
const util = require('util')
const td = new util.TextDecoder('utf8', {fatal:true})
td.decode(Buffer.from('foo')) // works!
td.decode(Buffer.from([ 128 ], 'binary')) // throws TypeError
This will also work in some browsers by using TextDecoder in the global namespace.
As Josh C. said above: "npmjs.org/package/encoding"
From the npm website: "encoding is a simple wrapper around node-iconv and iconv-lite to convert strings from one encoding to another."
Download:
$ npm install encoding
Example Usage
var result = encoding.convert(new Buffer([ 128 ], 'binary'), "utf8");
console.log(result); //<Buffer 80>
Visit the site: npm - encoding

Python 3 C-API IO and File Execution

I am having some serious trouble getting a Python 2 based C++ engine to work in Python3. I know the whole IO stack has changed, but everything I seem to try just ends up in failure. Below is the pre-code (Python2) and post code (Python3). I am hoping someone can help me figure out what I'm doing wrong.I am also using boost::python to control the references.
The program is supposed to load a Python Object into memory via a map and then upon using the run function it then finds the file loaded in memory and runs it. I based my code off an example from the delta3d python manager, where they load in a file and run it immediately. I have not seen anything equivalent in Python3.
Python2 Code Begins here:
// what this does is first calls the Python C-API to load the file, then pass the returned
// PyObject* into handle, which takes reference and sets it as a boost::python::object.
// this takes care of all future referencing and dereferencing.
try{
bp::object file_object(bp::handle<>(PyFile_FromString(fullPath(filename), "r" )));
loaded_files_.insert(std::make_pair(std::string(fullPath(filename)), file_object));
}
catch(...)
{
getExceptionFromPy();
}
Next I load the file from the std::map and attempt to execute it:
bp::object loaded_file = getLoadedFile(filename);
try
{
PyRun_SimpleFile( PyFile_AsFile( loaded_file.ptr()), fullPath(filename) );
}
catch(...)
{
getExceptionFromPy();
}
Python3 Code Begins here: This is what I have so far based off some suggestions here... SO Question
Load:
PyObject *ioMod, *opened_file, *fd_obj;
ioMod = PyImport_ImportModule("io");
opened_file = PyObject_CallMethod(ioMod, "open", "ss", fullPath(filename), "r");
bp::handle<> h_open(opened_file);
bp::object file_obj(h_open);
loaded_files_.insert(std::make_pair(std::string(fullPath(filename)), file_obj));
Run:
bp::object loaded_file = getLoadedFile(filename);
int fd = PyObject_AsFileDescriptor(loaded_file.ptr());
PyObject* fileObj = PyFile_FromFd(fd,fullPath(filename),"r",-1,"", "\n","", 0);
FILE* f_open = _fdopen(fd,"r");
PyRun_SimpleFile( f_open, fullPath(filename) );
Lastly, the general state of the program at this point is the file gets loaded in as TextIOWrapper and in the Run: section the fd that is returned is always 3 and for some reason _fdopen can never open the FILE which means I can't do something like PyRun_SimpleFile. The error itself is a debug ASSERTION on _fdopen. Is there a better way to do all this I really appreciate any help.
If you want to see the full program of the Python2 version it's on Github
So this question was pretty hard to understand and I'm sorry, but I found out my old code wasn't quite working as I expected. Here's what I wanted the code to do. Load the python file into memory, store it into a map and then at a later date execute that code in memory. I accomplished this a bit differently than I expected, but it makes a lot of sense now.
Open the file using ifstream, see the code below
Convert the char into a boost::python::str
Execute the boost::python::str with boost::python::exec
Profit ???
Step 1)
vector<char> input;
ifstream file(fullPath(filename), ios::in);
if (!file.is_open())
{
// set our error message here
setCantFindFileError();
input.push_back('\0');
return input;
}
file >> std::noskipws;
copy(istream_iterator<char>(file), istream_iterator<char>(), back_inserter(input));
input.push_back('\n');
input.push_back('\0');
Step 2)
bp::str file_str(string(&input[0]));
loaded_files_.insert(std::make_pair(std::string(fullPath(filename)), file_str));
Step 3)
bp::str loaded_file = getLoadedFile(filename);
// Retrieve the main module
bp::object main = bp::import("__main__");
// Retrieve the main module's namespace
bp::object global(main.attr("__dict__"));
bp::exec(loaded_file, global, global);
Full Code is located on github:

Resources