Can't parse COBOL source code with Antlr4 - antlr4
I'm learning on how to use Antlr4 to parse COBOL source codes. Currently, I'm following the steps, exactly as demonstrated by Enam Biswas in his Youtube video.
Basically, I've downloaded antlr-4.7.1-complete.jar and placed it in C:\Javalib. Yes, I've also include the path into my Windows environment and created the antlr.bat and grun.bat files.
For the grammar files, I'm using Cobol85.g4 and Cobol85Preprocessor.g4 which were taken from Ulrich Wolffgang github. On the same time, I use HellowWorl.cbl sample source code to see how the parsing works.
After running the antlr.bat, I executed the command below:
C:\Users\ffa\Desktop\COBOL>grun Cobol85Preprocessor startRule HellowWorld.cbl
As the result, I received the error message as shown below:
Warning: TestRig moved to org.antlr.v4.gui.TestRig; calling automatically
Can't load Cobol85.g4 as lexer or parser
As I'm not sure why I can't get it parsed as shown in the video, I also attempted below commands:
C:\Users\ffa\Desktop\COBOL>grun Cobol85 startRule HellowWorld.cbl
and
C:\Users\ffa\Desktop\COBOL>grun Cobol85* startRule HellowWorld.cbl
End up, I still get the same error message. So, I did my search through Google and found a suggestion to download antlr-runtime-4.7.1.jar. So, I downloaded the file and placed it in the same directory which is located at C:\Javalib.
When I executed the commands above, this time, I received a different message
Error: Could not find or load main class org.antlr.v4.runtime.misc.TestRig
Could anyone please assist me to parse the COBOL source code with Antlr4? It would also be good if someone could explain the difference between Cobol85.g4 and Cobol85Preprocessor.g4.
From your console, go into a new directory and do the following:
1. Download the ANTLR jar:
wget http://www.antlr.org/download/antlr-4.7.1-complete.jar
(or just download it if wget is not available on your console)
2. Download the COBOL grammar:
wget https://raw.githubusercontent.com/antlr/grammars-v4/master/cobol85/Cobol85.g4
3. Download a COBOL source file:
wget https://raw.githubusercontent.com/uwol/cobol85parser/master/src/test/resources/io/proleap/cobol/ast/HelloWorld.cbl
4. Generate all .java lexer and parser classes from the COBOL grammar:
java -jar antlr-4.7.1-complete.jar Cobol85.g4
5. Comile all .java source files:
javac -cp antlr-4.7.1-complete.jar *.java
6. Feed the COBOL source file to the generated lexer/parser
... and instruct the parser to start with the startRule rule:
java -cp .;antlr-4.7.1-complete.jar org.antlr.v4.gui.TestRig Cobol85 startRule -gui < HelloWorld.cbl
(*nix users, do java -cp .:antlr-4.7.1-complete.jar org.antlr.v4.gui.TestRig Cobol85 startRule -gui < HelloWorld.cbl)
If the < does not work on Windows, just do this:
java -cp .;antlr-4.7.1-complete.jar org.antlr.v4.gui.TestRig Cobol85 startRule -gui
The prompt will now be silent. It is writing for you to type in some source to be parsed. When you're done typing in some COBOL code, terminate with CTRL+Z (*nix users do CTRL+D).
That's it.
Now there are some errors printed to your console, meaning the COBOL parser
cannot properly parse the source file. Whether that has something to do with
first doing something with the pre-processor,
or the input that is invalid, I don't know.
Disclaimer: I am the author of these COBOL ANTLR4 grammar files.
The parser generated from grammar Cobol85.g4 has to be provided with COBOL source code, which has been preprocessed with a COBOL preprocessor. Cobol85Preprocessor.g4 is at the core of this preprocessor and enables parsing of statements such as COPY REPLACE, EXEC SQL etc.
Cobol85Preprocessor.g4 is meant to be augmented with quite extensive additional logic, which is not included in the grammar files and enables normalization of line formats, line breaks, comment lines, comment entries, EXEC SQL, EXEC CICS and so on.
The ProLeap COBOL parser written by me implements all of this in Java based on the files Cobol.g4 and Cobol85Preprocessor.g4.
Related
Custom extension for haskell file
Is it possible to customize the extension that haskell files can have? That is, to tell GHC that a file with extension .yy.xxx should be accepted as a valid haskell file, and that a file with extension .yy.lxx should be accepted as literate haskell?
GHC has a -x option to override the meaning of file suffixes, see the user guide: -x ⟨suffix⟩ Causes all files following this option on the command line to be processed as if they had the suffix ⟨suffix⟩. For example, to compile a Haskell module in the file M.my-hs, use ghc -c -x hs M.my-hs. I've used this to compile .md files as .lhs (instead of storing the files directly as .lhs, which may prevent other tooling from telling the format to render from).
programmatically access IME
Is there a way to access Japanese or chinese IME either from the command line or python? I have Linux/osx/win8 boxes, so which ever system exposes the easiest accessible api is fine. I'm experimenting with building a Japanese kana-kanji conversion algorithm and would like to establish a baseline using existing tools. I also have some collections of kana I would like to process. Preferably I would like something along the lines of $ ime JP "きしゃのきしゃがきしゃできしゃした" 貴社の記者が汽車で帰社した I've looked at anthy, mozc and dbus on Linux but can't find anyway to interact with them via the terminal or scripting (such as python)
Anthy provides a cli tool Personally, I prefer google's IME / mozc for better results, but perhaps this helps. The source for anthy (sourceforge, file anthy-9100h.tar.gz) includes a simple cli program for testing. Download the source file, extract it, run ./configure && make Enter the directory test which contains the binary anthy. By default, it reads from test.txt and uses EUC_JP encoding. Simple test: Input file test.txt *にほんごにゅうりょく *もももすももももものうち。 Run (using iconv to convert to UTF-8: ./anthy --all | iconv -f EUC-JP -t UTF-8 Output: 1:(にほんごにゅうりょく) |にほんご|にゅうりょく にほんご(日本語:(1,1000,N,72089)2500,001 ,にほんご:(N,0,-)2 ,ニホンゴ:(N,0,-)1 ,): にゅうりょく(入力:(1,1000,N,62394)2500,001 ,にゅうりょく:(N,0,-)2 ,ニュウリョク:(N,0,-)1 ,): 2:(もももすももももものうち。) |ももも|すももも|もものうち|。 ももも(桃も:(,1000,Ny,72089)225,279 ,ももも:(N,1000,Ny,72089)220,773 ,モモも:(,1000,Ny,72089)205,004 ,腿も:(,1000,Ny,72089)204,722 ,股も:(,1000,Ny,72089)146,431 ,モモモ:(N,0,-)1 ,): すももも(すももも:(N,1000,Ny,72089)202,751 ,スモモも:(,1000,Ny,72089)168,959 ,李も:(,1000,Ny,72089)168,677 ,スモモモ:(N,0,-)1 ,): もものうち(桃のうち:(,1000,N,655)2,047 ,もものうち:(N,1000,N,655)2,006 ,モモのうち:(,1000,N,655)1,863 ,腿のうち:(,1000,N,655)1,861 ,股のうち:(,1000,N,655)1,331 ,モモノウチ:(N,0,-)1 ,): 。(。:(1N,100,N,70203)57,040 ,.:(1,100,N,70203)52,653 ,.:(1,100,N,70203)3,840 ,): You can uncomment some printf statements in the source files test/main.c and src-main/context.c to make the output more readable/parsable, eg: 1 にほんごにゅうりょく にほんご 日本語 にゅうりょく 入力 2 もももすももももものうち。 ももも 桃も すももも すももも もものうち 桃のうち 。 。
emacs syntax highlighting for jags / bugs
Are there packages to color-highlight jags amd bugs model files? I have ESS installed, but it doesn't seem to recognize .bug files or jags/bugs syntax out of the box.
Syntax highlighting I'm using ESS 5.14 (from ELPA) and syntax highlighting or smart underscore works fine for me with GNU Emacs 24.1.1. If you want to highlight a given file, you can try M-x ess-jags-mode or add a hook to highlight JAGS file each time, e.g. (add-to-list 'auto-mode-alist '("\\.jag\\'" . jags-mode)) However, that is not really needed since you can simply (require 'ess-jags-d) in your .emacs. There's a corresponding mode for BUGS file. This file was already included in earlier release (at least 5.13), and it comes with the corresponding auto-mode-alist (for "\\.[jJ][aA][gG]\\'" extension). (Please note that there seems to exist subtle issue with using both JAGS and BUGS, but I can't tell more because I only use JAGS.) Running command file If you want to stick with Emacs for running JAGS (i.e., instead of rjags or other R interfaces to JAGS/BUGS), there's only one command to know: As described in the ESS manual, when working on a command file, C-c C-c should create a .jmd file, and then C-c C-c'ing again should submit this command file to Emacs *shell* (in a new buffer), and call jags in batch mode. Internally, this command is binded to a 'Next Action' instruction (ess-*-next-action). For example, using the mice data that comes with JAGS sample files, you should get a mice.jmd that looks like that: model in "mice.jag" data in "mice.jdt" compile, nchains(1) parameters in "mice.in1", chain(1) initialize update 10000 update 10000 # parameters to "mice.to1", chain(1) coda \*, stem("mice") system rm -f mice.ind system ln -s miceindex.txt mice.ind system rm -f mice1.out system ln -s micechain1.txt mice1.out exit Local Variables: ess-jags-chains:1 ess-jags-command:"jags" End: Be careful with default filenames! Here, data are assumed to be in file mice.jdt and initial values for parameters in mice.in1. You can change this in the Emacs buffer if you want, as well as modify the number of chains to use.
Compiling user-written source code files for beginners?
everyone.. I'm not a complete noob to linux, I'm using Fedora 16, but I've always had difficulty compiling programs from the command line and I would really like to learn how to do it the right way. I've had experience with Python, Ruby, Perl, PHP, Lua, bash and other languages, recently I've getting into Fortran code and here's the problem: Every time I run the f77 command with an option and filename, I get one of the following errors: [code] [Eddie_Nygma#localhost ~]$ f77 -S #classicpayroll.f# f77: no input files [Eddie_Nygma#localhost ~]$ f77 -o #classicpayroll.f# f77: argument to `-o' missing [/code] I really need to get this compiled and running for my cs class, somebody please help me out, could it possibly be some sort of a a syntax error or how do I correct it?
I used fortran long ago. In fortran, the first letter in file name shuold be A-Z or a-z. number or # is not allowed
Windres syntax error
I am working in MinGW environment (downloaded with their installer on 12/12/2011). I am attempting to compile a resource (.rc) file using Windres. The specific command I use is Windres -O coff About1.rc -o About1.res Windres generates at least 100 lines of warning messages reading: "warning: null characters ignored". Following this Windres emits: "Abouty1.rc:1:syntax error". As a matter of fact, there are no null characters in the About1.rc file. In addtition, the first line of the file is an include statement: #include "dlgresource.h". I played around and eliminated this statement and it turns out that it doesn't matter what I put there, I get the same flurry of messages and the syntax error notification. To make things more confusing, this same .rc file compiles without any problem using MSFT's rc.exe. The resulting .res file links smoothly with the program .obj file and runs perfectly. I have no idea what is going on. Any ideas? Thanks, Mark Allyn
Your .rc file is probably encoded as UTF-16. That's what's required in general by Microsoft's [rc.exe], in order to be able to deal with international characters, but GNU [windres.exe] can only deal with ANSI encoding. One workaround is to convert the file to ANSI on the spot (possibly losing e.g. Russian or Greek characters): > chcp 1252 Active code page: 1252 > type my.rc | windres --output-format=COFF -o my.res > _
You probably used VS or a similar tool to generate the file. There are some parts of the character encodings that you cannot see resulting in null characters and etc. Generate a new .res file with the same content, don't copy/paste the content, type it in yourself.
Try: windres About1.rc -o About1.o and then just use the resulting .o file instead of the originally intended .res file.
I've had the same troubles than you today. I know it has passed a lot of time from your question, but I'm writting this on the hope that it can be useful for someone. First, I obtained an object file .o compiled using Cygwin, writting: windres -o resource.o resource.rc By doing that, you dont need to use the .res file, but the .o one, and you can then link this object with all the others, when you compile yout program, using GNU resources: g++ Header_files CPP_files flags ... -o program.exe recource.o -lm For instance.