How to use cTAKES from the command line? - nlp

I wonder how to use Apache cTAKES from the command line.
E.g. :
I have a file note.txt that contains some text like "Patient had
elevated blood sugar but tests confirm no diabetes. Patient's father had
adult onset diabetes."
I want to use the provided analysis engine
\apache-ctakes-3.2.2-bin\apache-ctakes-3.2.2\desc\ctakes-clinical-pipeline\desc\analysis_engine\AggregatePlaintextUMLSProcessor.xml
How can I get the analyse engine's output (viz. the annotations) using the
command line (i.e. without using graphical user interfaces such as UIMA CAS
Visual Debugger or the Collection Processing Engine)? I'd prefer to
use the provided JAR files rather than having to compile the code.
The question is fairly simple but I couldn't find the information in
cTAKES's README or
on Confluence.

Please try the following steps to use cTAKES CPE from the command line (the key class is "org.apache.uima.examples.cpe.SimpleRunCPE"):
Change directory to $CTAKES_HOME/desc/ctakes-clinical-pipeline/desc/collection_processing_engine/
Copy test_plaintext.xml to another file (e.g., "test_plaintext_test.xml").
Edit "test_plaintext_test.xml" to set input directory; find "nameValuePair" with name = "InputDirectory", and set the value string to the input directory. The following example set the input directory as "$CTAKES_HOME/note_input":
<nameValuePair>
<name>InputDirectory</name>
<value>
<string>note_input</string>
</value>
</nameValuePair>
Similarly, edit "test_plaintext_test.xml" to set the output directory ("$CTAKES_HOME/result_output" in the following example):
<nameValuePair>
<name>OutputDirectory</name>
<value>
<string>result_output</string>
</value>
</nameValuePair>
Save "test_plaintext_test.xml" and change directory to $CTAKES_HOME/bin.
Copy runctakesCPE.sh to another file (e.g., "runctakesCPE_CLI.sh").
Edit "runctakesCPE_CLI.sh"; replace the last line ("java ...") to the following line ("USER" and "PW" should be replaced by your UMLS Username and Password, and the memory setting Xms and Xms may be adjusted based on the size of memory on your machine):
java -Dctakes.umlsuser=USER -Dctakes.umlspw=PW -cp $CTAKES_HOME/lib/*:$CTAKES_HOME/desc/:$CTAKES_HOME/resources/ -Dlog4j.configuration=file:$CTAKES_HOME/config/log4j.xml -Xms2g -Xmx3g org.apache.uima.examples.cpe.SimpleRunCPE $CTAKES_HOME/desc/ctakes-clinical-pipeline/desc/collection_processing_engine/test_plaintext_test.xml
Save "runctakesCPE_CLI.sh", and then create the input directory ("$CTAKES_HOME/note_input") and the output directory ("$CTAKES_HOME/result_output").
Put your note.txt to the input directory (e.g., "$CTAKES_HOME/note_input/note.txt"), and then run "runctakesCPE_CLI.sh".
cTAKES CPE will start running under command line mode, and the resulting file will be generated in the output directory (e.g., "$CTAKES_HOME/result_output/note.txt.xml").
I actually used your note.txt to run the steps above and here are the first several lines of the generated note.txt.xml:
<?xml version="1.0" encoding="UTF-8"?><CAS version="2">
<uima.cas.Sofa _indexed="0" _id="3" sofaNum="1" sofaID="_InitialView" mimeType="text" sofaString="Patient had elevated blood sugar but tests confirm no diabetes. Patient's father had adult onset diabetes.
"/>
<org.apache.ctakes.typesystem.type.structured.DocumentID _indexed="1" _id="1" documentID="note.txt"/>
<uima.tcas.DocumentAnnotation _indexed="1" _id="10" _ref_sofa="3" begin="0" end="107" language="x-unspecified"/>
<org.apache.ctakes.typesystem.type.textspan.Segment _indexed="1" _id="15" _ref_sofa="3" begin="0" end="107" id="SIMPLE_SEGMENT"/>
<org.apache.ctakes.typesystem.type.textspan.Sentence _indexed="1" _id="21" _ref_sofa="3" begin="0" end="63" sentenceNumber="0"/>
Hope this helps :-)

java -Dctakes.umlsuser=USER -Dctakes.umlspw=PW -cp $CTAKES_HOME/lib/*;$CTAKES_HOME/desc/;$CTAKES_HOME/resources‌​/ -
Dlog4j.configuration=file:$CTAKES_HOME/config/log4j.xml -Xms2g -Xmx3g to_replace $CTAKES_HOME/desc/ctakes-clinical-pipeline/desc/collection_p‌​rocessing_engine/tes‌​t_plaintext_test.xml
replace "to_replace" with either
org.apache.ctakes.ytex.tools.RunCPE or
org.apache.ctakes.core.cpe.CmdLineCpeRunner

Related

How to buil an app like google pdf viewer? [duplicate]

So the idea is to make an encryption software which will work only on .txt files and apply some encryption functions on it and generate a new file. To avoid the hassle of user having to drag-and-drop the file, I have decided to make an option similar to my anti-virus here.
I want to learn how to make these for various OS, irrespective of the architecture :)
What are these menus called? I mean the proper name so next time I can refer to them in a more articulate way
How to make these?
My initial understanding:
What I think it will do is: pass the file as an argument to the main() method and then leave the rest of the processing to me :)
Probably not exactly the answer you were hoping for, but it seems that this is a rather complicated matter. Anyway, I'll share what I know about it and it will hopefully prove enough to (at least) get you started.
Unfortunately, the easiest way to create a context menu using Java is editing the Registry. I'll try to summarize the milestones of the overall requirements and steps to achieve our objective.
<UPDATE>
See at the end of the post for links to sample code and a working demo.
</UPDATE>
What needs to be done
We need to edit the Registry adding an additional entry (for our java-app) in the context menus of the file-types we are interested in (e.g. .txt, .doc, .docx).
We need to determine which entries in Registry to edit, because our targeted file-extensions might be associated with another 'Class' (I couldn't test it on XP, but on Windows 7/8 this seems to be the case). E.g. instead of editing ...\Classes\.txt we might need to edit ...\Classes\txtfile, which the .txt Class is associated with.
We need to specify the path to the installed jre (unless we can be sure that the directory containing javaw.exe is in the PATH variable).
We need to insert the proper keys, values and data under the proper Registry nodes.
We need a java-app packaged as a .JAR file, with a main method expecting a String array containing one value that corresponds to the path of the file we need to process (well, that's the easy part - just stating the obvious).
All this is easier said than done (or is it the other way around ?), so let's see what it takes to get each one done.
First of all, there are some assumption we'll be making for the rest of this post (for the sake of simplicity/clarity/brevity and the like).
Assumptions
We assume that the target file-category is .TXT files - the same steps could be applied for every file-category.
If we want the changes (i.e. context-menus) to affect all users, we need to edit Registry keys under HKCR\ (e.g. HKCR\txtfile), which requires administrative priviledges.
For the sake of simplicity, we assume that only current user's settings need to be changed, thus we will have to edit keys under HKCU\Software\Classes (e.g. HKCU\Software\Classes\txtfile), which does not require administrative priviledges.
If one chooses to go for system-wide changes, the following modifications are necessary:
In all REG ADD/DELETE commands, replace HKCU\Software\Classes\... with HKCR\... (do not replace it in REG QUERY commands).
Have your application run with administrative priviledges. Two options here (that I am aware of):
Elevate your running instance's priviledges (can be more complicated with latest windows versions, due to UAC). There are plenty of resources online and here in SO; this one seems promising (but I haven't tested it myself).
Ask the user to explicitely run your app "As administrator" (using right-click -> "Run as administrator" etc).
We assume that only simple context-menu entries are needed (as opposed to a context-submenu with more entries).
After some (rather shallow) research, I have come to believe that adding a submenu in older versions of Windows (XP, Vista), would require more complex stuff (ContextMenuHandlers etc). Adding a submenu in Windows 7 or newer is considerably more easy. I described the process in the relevant part of this answer (working demo provided ;)).
That said, let's move on to...
Getting things done
You can achieve editing the Registry by issuing commands of the form REG Operation [Parameter List], with operations involving ADD, DELETE, QUERY (more on that later).
In order to execute the necessary commands, we can use a ProcessBuilder instance. E.g.
String[] cmd = {"REG", "QUERY", "HKCR\\.txt", "/ve"};
new ProcessBuilder(cmd).start();
// Executes: REG QUERY HKCR\.txt /ve
Of course, we will probably want to capture and further process the command's return value, which can be done via the respective Process' getInputStream() method. But that falls into scope "implementation details"...
"Normally" we would have to edit the .txt file-class, unless it is associated with another file-class. We can test this, using the following command:
// This checks the "Default" value of key 'HKCR\.txt'
REG QUERY HKCR\.txt /ve
// Possible output:
(Default) REG_SZ txtfile
All we need, is parse the above output and find out, if the default value is empty or contains a class name. In this example we can see the associated class is txtfile, so we need to edit node HKCU\Software\Classes\txtfile.
Specifying the jre path (more precisely the path to javaw.exe) falls outside the scope of this answer, but there should be plenty of ways to do it (I don't know of one I would 100% trust though).
I'll just list a few off the top of my head:
Looking for environment-variable 'JAVA_HOME' (System.getenv("java.home");).
Looking in the Registry for a value like HKLM\Software\JavaSoft\Java Runtime Environment\<CurrentVersion>\JavaHome.
Looking in predifined locations (e.g. C:\Program Files[ (x86)]\Java\).
Prompting the user to point it out in a JFileChooser (not very good for the non-experienced user).
Using a program like Launch4J to wrap your .JAR into a .EXE (which eliminates the need of determining the path to 'javaw.exe' yourself).
Latest versions of Java (1.7+ ?) put a copy of javaw.exe (and other utilities) on the path, so it might be worth checking that as well.
3. So, after collecting all necessary data, comes the main part: Inserting the required values into Registry. After compliting this step, our HKCU\Software\Classes\txtfile-node should look like this:
HKCU
|_____Software
|_____Classes
|_____txtfile
|_____Shell
|_____MyCoolContextMenu: [Default] -> [Display name for my menu-entry]
|_____Command: [Default] -> [<MY_COMMAND>]*
*: in this context, a '%1' denotes the file that was right-clicked.
Based on how you addressed step (1.2), the command could look like this:
"C:\Path\To\javaw.exe" -jar "C:\Path\To\YourApp.jar" "%1"
Note that javaw.exe is usually in ...\jre\bin\ (but not always only there - recently I've been finding it in C:\Windows\System32\ as well).
Still being in step (1.3), the commands we need to execute, in order to achieve the above structure, look as follows:
REG ADD HKCU\Software\Classes\txtfile\Shell\MyCoolContextMenu /ve /t REG_SZ /d "Click for pure coolness" /f
REG ADD HKCU\Software\Classes\txtfile\Shell\MyCoolContextMenu\Command /ve /t REG_SZ /d "\"C:\Path\To\javaw.exe\" -jar \"C:\Path\To\Demo.jar\" \"%%1\" /f"
// Short explanation:
REG ADD <Path\To\Key> /ve /t REG_SZ /d "<MY_COMMAND>" /f
\_____/ \___________/ \_/ \_______/ \_______________/ \_/
__________|_______ | | |___ | |
|Edit the Registry | | _______|________ | _______|_______ |
|adding a key/value| | |Create a no-name| | |Set the data | |
-------------------- | |(default) value | | |for this value.| |
| ------------------ | |Here: a command| |
_______________|______________ | |to be executed.| |
|Edit this key | | ----------------- |
|(creates the key plus | ____|_________ _________|_____
| any missing parent key-nodes)| |of type REG_SZ| |No confirmation|
-------------------------------- |(string) | -----------------
----------------
Implementation Considerations:
It is probably a good idea to check if our target class (e.g. txtfile), does already have a context-menu entry named "MyCoolContextMenu", or else we might be overriding an existing entry (which will not make our user very happy).
Since the data part of the value (the part that comes after /d and before /f) needs to be enclosed in "", keep in mind that you can escape " inside the string as \".
You also need to escape the %1 so that it is stored in the Registry value as-is (escape it like: %%1).
It is a good idea to provide your user with an option to "un-register" your context-menu entry.
The un-registering can be achieved by means of the command:
REG DELETE HKCU\Software\Classes\txtfile\Shell\MyCoolContextMenu /f
Omitting the /f at the end of the commands may prompt the "user" (in this case your app) for confirmation, in which case you need to use the Process' getOutputStream() method to output "Yes" in order for the operation to be completed.
We can avoid that unnecessary interaction, using the force flag (/f).
Almost, there !
Finding ourselves at step (2), we should by now have the following:
A context-menu entry registered for our files in category txtfile (note that it is not restricted to .TXT files, but applies to all files pertained by the system as "txtfiles").
Upon clicking that entry, our java-app should be run and its main() method passed a String array containing the path to the right-clicked .TXT file.
From there, our app can take over and do its magic :)
That's (almost) all, folks !
Sorry, for the long post. I hope it turns out to be of use to someone.
I'll try to add some demo-code soon (no promises though ;)).
UPDATE
The demo is ready !
I created a tiny demo-project.
Here is the source code.
Here is a ready-to-go JARred App.

How to get non standard tags in rpm query

I wanted to add things such as Size, BuildHost, BuildDate etc in rpm query but adding this thing in spec file results in unknown tag?? How can I do this so that these things are reflected when i give the rpm query command?
These tags are determined when the package is built; they cannot be forced to specific values.
For example BuildHost is hardcoded in rpmbuild and cannot be changed. There is RFE https://bugzilla.redhat.com/show_bug.cgi?id=1309367 to allow it modify from command line. But right now you cannot change it by any tag in spec file nor by passing some option on command line to rpmbuild.
I assume it will be very similar to other values you specified.
RPM5 permits arbitrary unique tag names to be added to header metadata.
The tag names are configured in a colon separated list in a macro. Then the new tags can be used in spec files and can be extracted using --queryformat.
All arbitrary tags are string (or string array) valued.

Difference in bdf format between Patran and Virtual Lab

Here is my problem, i received a body-in-white in the bdf format which, by reading the file, seems to have been created through Patran 2013 and i need to make some changes to the body-in-white. I use LMS Virtual Lab in order to make my modifications but unfortunately with Virtual Lab i can only import the bdf file and then after modifications i need to export the work into a new bdf file. One remarkable difference is the size of the bdf file shortening from 128MB to 92MB.
My next step is to compute the normal modal analysis through MD Nastran which works fine with the original bdf file giving me an OP2 result file. However with the modified bdf file it runs, gives an exit(0) (which means no abnormal exit) as does the original file and then when i try to look at the resulting OP2 file (in Nastran NX) it tells me it can't find any results (whereas in the original it does).
Any thoughts ?
Maybe you already found the answer but I would do the following:
First step would be to check f06 for Fatal Errors. If no errors found, you can check if Eigenvalues are listet in f06. If no clue by checking f06, I would check what LMS is doing with the model:
1.) input original file in LMS and output it as BDF
-> check if output-file contains same entities as input-file (number of grids,number of rbe2/3, number of quads,trias,hex, number of forces,spc)
2.) if 1.) shows you that input and output are the same, this means that LMS output-settings are ok
3.) if 1.) shows that there are some small differences, but size of the file is the same it means that some LMS settings are not suitable for your output
-> check LMS output Settings
-> start modal analysis
I hope it helps

Stanford NER CharacterOffsetBegin

I am using Stanford CoreNLP for NER for a list of short documents.
java -cp "*" -Xmx2g edu.stanford.nlp.pipeline.StanfordCoreNLP
-annotators tokenize,ssplit,pos,lemma,ner -ssplit.eolonly -pos.model edu/stanford/nlp/models/pos-tagger/english-caseless-left3words-distsim.tagger
-ner.model edu/stanford/nlp/models/ner/english.all.3class.caseless.distsim.crf.ser.gz
-file .../input -outputDirectory .../stanford_ner
The problem is the CharacterOffsetBegin and CharacterOffsetEnd I get from each token are continuous number from the previous documents. Therefore for example the very first token of document_2 has a CharacterOffsetBegin of 240 rather than 0. Is there any option I can use in the command line? Any help would be greatly appreciated, thanks!
Yes--if you split your input into separate files. There's a -filelist option for batch jobs. In your case, each line of the file list has a path to a document file. For example, if you have all of your separate doc files in a directory .../input, then input.txt contains something like:
.../input/doc_1.txt
.../input/doc_2.txt
.../input/doc_3.txt
Though it might be a good idea to put the full paths there if possible. Then, you'd execute CoreNLP as such:
java -cp "*" -Xmx2g edu.stanford.nlp.pipeline.StanfordCoreNLP
-annotators tokenize,ssplit,pos,lemma,ner -ssplit.eolonly -pos.model edu/stanford/nlp/models/pos-tagger/english-caseless-left3words-distsim.tagger
-ner.model edu/stanford/nlp/models/ner/english.all.3class.caseless.distsim.crf.ser.gz
-filelist .../input.txt -outputDirectory .../stanford_ner
If you write some script to split input up into multiple documents, it would probably be a good idea to generate input.txt concurrently.
This will restart the token counter for each document you process.

How to concatenate SVG files lengthwise from linux command line?

I have a series of square SVG files that I would like to arrange lengthwise into one super long SVG file.
I attempted to use imagemagick to combine them. Based on this page:
http://linux.about.com/library/cmd/blcmdl1_ImageMagick.htm
and this
http://www.imagemagick.org/Usage/compose/
I tried this command
composite 'file1.svg' 'file2.svg' +adjoin 'outputfile.svg'
However, I received the following error message:
composite: unrecognized option '+adjoin' # error/composite.c/CompositeImageCommand/565.
I tried several other imagemagick commands (convert, display), but had no success. How can I combine these files on the command line? Is there an inkscape command that does this?
There's currently no convenient way to do this with only the command line and no custom scripting.
Closest pre-written thing I could find currently (4-16-2012) is https://github.com/astraw/svg_stack, which lets you write commands of the form:
svg_stack.py --direction=h --margin=100 red_ball.svg blue_triangle.svg > shapes.svg
to concatenate.
It should be pretty easy if you're willing to use a scripting language. For each file, just add a prefix to all id tags; so in file 1, id="circle" becomes id="file1_circle", and in file 2, id="circle" becomes id="file2_circle".
In most cases you would get away with a trivial search and replace (find id=" and replace it with id="fileX_) although it is possible to have cases where this won't work (specifically if that find string appears in an item of text, for example).
If you want to do this 'the proper way', you'll need an XML parser (such as XMLReader in PHP).

Resources