Parsing GHC Core in ghc-7.10 - haskell

I am trying to parse some GHC Core to extract name information and other bits needed.
I am currently using the GHC API given that I haven't found other useful packages help with it.
I've looked through some packages like ghc-core, ghc-core-html and extcore but they seem slightly outdated and I haven't managed to use extcore with ghc-7.10.3.
I have also tried to look for up to date documentation on Core without luck. The best post I've come across is this one, but the discussion is slightly outdated (e.g. compiling the example from these slides, gives a different core dump using the latest ghc.
The question
Having said all this, do you guys know of any recent package that can help in parsing Core? Is there any new documentation regarding CORE manipulation?
Thanks!

The external core feature was removed because it was buggy and a hassle to maintain and if people were using it they didn't speak up. So there is no longer any textual representation of Core intended for machine consumption. Only the internal (AST) representation is available. Of course, I'm sure you'd be welcome to revive the external representation if you want to maintain it.

Related

Type hints for lxml?

New to Python and come from a statically typed language background. I want type hints for https://lxml.de just for ease of development (mypy flagging issues and suggesting methods would be nice!)
To my knowledge, this is a python 2.0 module and doesn’t have types. Currently I’ve used https://mypy.readthedocs.io/en/stable/stubgen.html to create stub type definitions and filling in “any”-types I’m using with more information, but it’s really hacky. Are there any safer ways to get type hints?
There is an official stubs package for lxml now called lxml-stubs:
$ pip install lxml-stubs
Note, however, that the stubs are still in development and are not 100% complete yet (although very much usable from my experience). These stubs were once part of typeshed, then curated by Jelle Zijlstra after removal and now are developed as part of the lxml project.
If you want the development version of the stubs, install via
$ pip install git+https://github.com/lxml/lxml-stubs.git
(the project's readme installation command is missing the git+ prefix in URL's scheme and won't work).
Recently I have done much more gap filling based on lxml-stubs with some good progress.
Welcome to check out types-lxml if any late comer are still interested. For most people I think lxml.objectify is the only missing piece lacking from the stubs, which is planned immediately after current release.
I’ve used stubgen to create stub type definitions and filling in “any”-types
This is actually the correct approach if it's not lxml; creating template from mypy stubgen is the starting point for many stub files. But lxml is mostly written in Cython, for which stubgen do not have perfect support yet. Besides as OP noted, this is a python 2.0 era module, and author uses function arguments in a quite polymorphous way. There are lots of unique challenges annotating lxml, as lxml is essentially a python interface for libxml and libxslt in its core.
As an example, the support of both unicode and bytes input complicates matter too; this is the same difficulty found when annotating xml.etree bundled with python, but in a much greater magnitude.
I would not call this "hacky", rather it is gradual typing.
You can take a closer look at lxml-stubs repository. From about:
This repository contains external type annotations (see PEP 484) for the lxml package. Such type annotations are normally included in typeshed, but lxml's annotations were frequently problematic and have therefore been deleted from typeshed. In particular, the stubs are incomplete and it has been difficult to provide complete stubs.
Perhaps it will be useful to you

How to Decompile Bytenode "jsc" files?

I've just seen this library ByteNode it's the same as ByteCode of java but this is for NodeJS.
This library compiles your JavaScript code into V8 bytecode, which protect your source code, I'm wondering is there anyway to Decompile byteNode therefore it's not secure enough. I'm wondering because I would like to protect my source code using this library?
TL;DR It'll raise the bar to someone copying the code and trying to pass it off as their own. It won't prevent a dedicated person from doing so. But the primary way to protect your work isn't technical, it's legal.
This library compiles your JavaScript code into V8 bytecode, which protect your source code...
Well, we don't know it's V8 bytecode, but it's "compiled" in some sense. All we know is that it creates a "code cache" via the built-in vm.Script.prototype.createCachedData API, which is officially just a cache used to speed up recompiling the code a second time, third time, etc. In theory, you're supposed to also provide the original source code as a string to the vm.Script constructor. But if you go digging into Node.js's vm.Script and V8 far enough it seems to be the actual code in some compiled form (whether actual V8 bytecode or not), and the code string you give it when running is ignored. (The ByteNode library provides a dummy string when running the code from the code cache, so clearly the actual code isn't [always?] needed.)
I'm wondering is there anyway to Decompile byteNode therefore it's not secure enough.
Naturally, otherwise it would be useless because Node.js wouldn't be able to run it. I didn't find a tool to do it that already exists, but since V8 is open source, it would presumably be possible to find the necessary information to write a decompiler for it that outputs valid JavaScript source code which someone could then try to understand.
Experimenting with it, local variable names appear to be lost, although function names don't. Comments appear to get lost (this may not be as obvious as it seems, given that Function.prototype.toString is required to either return the original source text or a synthetic version [details]).
So if you run the code through a minifier (particularly one that renames functions), then run it through ByteNode (or just do it with vm.Script yourself, ByteNode is a fairly thin wrapper), it will be feasible for someone to decompile it into something resembling source code, but that source code will be very hard to understand. This is very similar to shipping Java class files, which can be decompiled (there's even a standard tool to do it in the JDK, javap), except that the format Java class files are well-documented and don't change from one dot release to the next (though they can change from one major release to another; new releases always support the older format, though), whereas the format of this data is not documented (though it's an open source project) and is subject to change from one dot release to the next.
Certain changes, such as changing the copyright message, are probably fairly easy to make to said source code. More meaningful changes will be harder.
Note that the code cache appears to have a checksum or other similar integrity mechanism, since directly editing the .jsc file to swap one letter for another in a literal string makes the code cache fail to load. So someone tampering with it (for instance, to change a copyright notice) would either need to go the decompilation/recompilation route, or dive into the V8 source to find out how to correct the integrity check.
Fundamentally, the way to protect your work is to ensure that you've put all the relevant notices in the relevant places such that the fact copying it is a violation of copyright is clear, then pursue your legal recourse should you find out about someone passing it off as their own.
is there any way
You could get a hundred answers here saying "I don't know a way", but that still won't guarantee that there isn't one.
not secure enough
Secure enough for what? What's your deployment scenario? What kind of scenario/attack are you trying to defend against?
FWIW, I don't know of an existing tool that "decompiles" V8 bytecode (i.e. produces JavaScript source code with the same behavior). That said, considering that the bytecode is a fairly straightforward translation of the source code, I'm sure it wouldn't be very hard to write such a tool, if someone had a reason to spend some time on it. After all, V8's JS-to-bytecode compiler is open source, so one would only have to look at those sources and implement the reverse direction. So I would assume that shipping as bytecode provides about as much "protection" as shipping as uglified JavaScript, i.e. none that I would trust.
Before you make any decisions, please also keep in mind that bytecode is considered an internal implementation detail of V8; in particular it is not versioned and can change at any time, so it has to be created by exactly the same V8 version that consumes it. If you want to update your Node.js you'll have to recreate all the bytecode, and there is no checking or warning in place that will point out when you forgot to do that.
Node.js source already contains code for decompiling binary bytecode.
You can get a text string from your V8 bytecode and then you would need to analyze it.
But text string would be very long and miss some important information such as a constant pool. So you need to modify the Node.js source.
Please check https://github.com/3DGISKing/pkg10.17.0
I have attached exported xml file.
If you study V8, it would be possible to analyze it and get source code from it.
It keeping it short and sweet, You can try Ghidra node.js package which is based on Ghidra reverse engineering framework which was open-sourced by NSA in the year 2019. Ghidra is capable of disassembling and decompiling the v8 bytecode. The inner working of disassembling is quite complex, this answer is short but sufficient.

What are the tradeoffs between the Haskell SQLite packages?

There are many Haskell SQLite bindings, which implies to me that there are many different tradeoffs on using building/using a SQLite binding. I've tried to read through the documentation of many of these packages but it became a blur after a while, and I was unable to really identify the primary tradeoffs of choosing one over another.
A search on Hackage finds:
direct-sqlite
HDBC-sqlite3
hdbi-sqlite
hsql-sqlite3
hsSqlite3
persistent-sqlite
simplest-sqlite
sql-simple-sqlite
sqlite
sqlite-simple
sqlite-simple-typed
bindings-sqlite3
Nevermind some "meta" SQLite packages. haskelldb-hdbc-sqlite3, haskelldb-hsql-sqlite3, language-sqlite, opaleye-sqlite
Hoping that someone has been able to do this successfully and can help me understand how to choose.
I looked at the package mentioned. Some of these package are a dependency of another package (like opaleye-sqlite and sqlite-simple) depend on direct-sqlite.
Therefore, let's first look at the package that provide the actual driver. Most of them are outdated. There seem to be 3 that still have recent updates:
https://hackage.haskell.org/package/simplest-sqlite https://github.com/YoshikuniJujo/test_haskell/tree/master/features/ffi/sqlite3/simplest-sqlite i wouldn't use it because the repository says "It's just my private Haskell learning/testing repository."
https://hackage.haskell.org/package/persistent-sqlite this one is based on direct-sqlite (seems like part of the direct-sqlite has been forked)
The last one being the direct-sqlite package. I used this website to find which package depend on direct-sqlite. Now leaving out package that don't have the purpose of working with sqlite (such as bake: Continuous integration system). And also leaving out packages that haven't seen updates in a long time.
That leaves us with the following package that provide extra functionality based on direct-sqlite. This list includes more levels of reverse lookup to see which other package make use of the package listed below.
persistent-sqlite as mentioned before
esqueleto
eventful
groundhog-sqlite
opaleye-sqlite
selda-sqlite
sqlite-simple
beam-sqlite
I've had very good experiences with the ...-simple family of libraries. They are very full-featured and sit at a nice medium level of abstraction where you get a large amount of flexibility over how you intereact with the database.
I'm the author of opaleye-sqlite. It is a somewhat experimental version of Opaleye for SQLite. The Postgres version of Opaleye is very solid and used in production in several places, but I only know of one person who has used opaleye-sqlite in production.

Dead code and/or how to generate a cross reference from Haskell source

I've got some unused functionality in my codebase, but it's hard to identify. The code has evolved over the last year as I explore its problem space and possible solutions. What I'm needing to do is find that unused code so I can get rid of it. I'm happy if it deals with the problem on an exportable name basis.GHC has warnings that deal with non-exported unused code. Any tools specific to this task would be of interest.
However, I'm curious about a comprehensive cross referencing tool. I can find the unused code with such a tool. Years ago when I was working in C and assembler, I found that a good xref was a pretty handy tool, useful for many different purposes.
I'm getting nowhere with googling. Apparently in Haskell the dominant meaning of cross-reference is within literate programming. Though maybe something there would be useful.
I don’t know of such a tool, so in the past I have done a bit of a hack instead.
If you have a comprehensive test suite, you can run it with GHC’s code coverage tracing enabled. Compile with -fhpc and use hpc markup to generate annotated source. This gives you the union of unused code and untested code, both of which you would probably like to address anyway.
SourceGraph can give you a bunch of information which you may also find useful.
There is now a tool for this very purpose: https://hackage.haskell.org/package/weeder
It's been around since 2017, and while it has limitations, it definitely helps with large codebases.

Is there a typical config or property file format and library in Haskell?

I need a set of key-value pairs for configuration read in from a file. I tried using show on a Data.Map and it doesn't look at all like what I want. It seems this is something many others might have already done so I'm wondering if there is a standard way to do it and what library to use.
Go to hackage.
Click on "packages"
Search for "config".
Notice ConfigFile(TH), EEConfig, and tconfig.
Read the Haddock documentation
Select a couple and implement your task.
Blog about your findings so the rest of us can learn from your new found expertise (thanks!).
EDIT:
I've recently used configurator - which was easy enough. I suggest you try that one!
(Yes, yes. If I took my own advice I would have made a blog for you all)
The configuration category on Hackage should list all relevant libraries:
http://hackage.haskell.org/packages/#cat:Configuration
I have researched the topic myself now, and my conclusion is:
configurator is very good, but it's currently only for user-edited configurations. The application only reads the configuration and cannot modify it. So it's more for server-side applications.
tconfig has a a simple API and looked like it was what I wanted, maybe a bit raw, until I realized it's unmaintained and that some commits which are really important to use the app are applied on github but the hackage package was not updated
Other solutions didn't look like they'd work for me, I didn't like the API, but every application (and tastes) are different.
I think using JSON for instance is not a good solution because at least with Aeson when you add new settings in a new release, the old JSON without the new member from the previous version won't load. Also, i find that solution a bit verbose.
The conclusion of my research is that I wrote my own library, app-settings, which aims to be key-value, read-write, with a as succint and type-safe API as possible. And you'll find it also in the hackage links for the configurations category that I gave.
So to summarize, I think configurator is the standard for read-only configurations (and it's very powerful too, you can split the configuration file with imports for instance). For read-write there are many small libraries, some unmaintained, and no real standard I think.
UPDATE 2018 be sure to look at dhall
I'd also suggest just using Text.JSON or one of the yaml libraries available (I prefer JSON myself, but...).
The configfile package looks like what you want.

Resources