#Openseadragon how to configure plugin?

[openseadragon-canvas-overlay] requires PaperJS (anonymous) # openseadragon-paperjs-overlay.js:19 (anonymous) # openseadragon-paperjs-overlay.js:115
Already added screen in angular.cli

Its resolved Order of script inclusion matters,


how can I add rabbitmq queue to elk

I want to display RabbitMQ Queues in ELK, for this I ran Kibana, ElasticSearch and LogStash, below are all the details.
The result of running Kiabana
samira#elk:/var/www/apps/kibana-8.4.2-linux-x86_64(1)/kibana-8.4.2/bin$ ./kibana
[2022-10-20T12:37:57.842+03:30][INFO ][node] Kibana process configured with roles: [background_tasks, ui]
[2022-10-20T12:38:08.711+03:30][INFO ][http.server.Preboot] http server running at http://localhost:5601
[2022-10-20T12:38:08.740+03:30][INFO ][plugins-system.preboot] Setting up [1] plugins: [interactiveSetup]
[2022-10-20T12:38:08.771+03:30][WARN ][config.deprecation] The default mechanism for Reporting privileges will work differently in future versions, which will affect the behavior of this cluster. Set "xpack.reporting.roles.enabled" to "false" to adopt the future behavior before upgrading.
[2022-10-20T12:38:08.935+03:30][INFO ][plugins-system.standard] Setting up [121] plugins: [translations,monitoringCollection,licensing,globalSearch,globalSearchProviders,features,mapsEms,licenseApiGuard,usageCollection,taskManager,telemetryCollectionManager,telemetryCollectionXpack,kibanaUsageCollection,share,embeddable,uiActionsEnhanced,screenshotMode,banners,newsfeed,fieldFormats,expressions,dataViews,charts,esUiShared,customIntegrations,home,searchprofiler,painlessLab,grokdebugger,management,advancedSettings,spaces,security,lists,encryptedSavedObjects,cloud,snapshotRestore,screenshotting,telemetry,licenseManagement,eventLog,actions,console,bfetch,data,watcher,reporting,fileUpload,ingestPipelines,alerting,unifiedSearch,savedObjects,graph,savedObjectsTagging,savedObjectsManagement,presentationUtil,expressionShape,expressionRevealImage,expressionRepeatImage,expressionMetric,expressionImage,controls,eventAnnotation,dataViewFieldEditor,triggersActionsUi,transform,stackAlerts,ruleRegistry,discover,fleet,indexManagement,remoteClusters,crossClusterReplication,indexLifecycleManagement,cloudSecurityPosture,discoverEnhanced,aiops,visualizations,canvas,visTypeXy,visTypeVislib,visTypeVega,visTypeTimeseries,rollup,visTypeTimelion,visTypeTagcloud,visTypeTable,visTypeMetric,visTypeHeatmap,visTypeMarkdown,dashboard,dashboardEnhanced,expressionXY,expressionTagcloud,expressionPartitionVis,visTypePie,expressionMetricVis,expressionLegacyMetricVis,expressionHeatmap,expressionGauge,lens,osquery,maps,dataVisualizer,ml,cases,timelines,sessionView,kubernetesSecurity,securitySolution,visTypeGauge,sharedUX,observability,synthetics,infra,upgradeAssistant,monitoring,logstash,enterpriseSearch,apm,dataViewManagement]
[2022-10-20T12:38:08.948+03:30][INFO ][plugins.taskManager] TaskManager is identified by the Kibana UUID: 114adb80-0285-4b29-b403-64ea1e454f19
[2022-10-20T12:38:09.009+03:30][WARN ][plugins.security.config] Session cookies will be transmitted over insecure connections. This is not recommended.
[2022-10-20T12:38:09.028+03:30][WARN ][plugins.security.config] Session cookies will be transmitted over insecure connections. This is not recommended.
[2022-10-20T12:38:09.034+03:30][INFO ][plugins.encryptedSavedObjects] Hashed 'xpack.encryptedSavedObjects.encryptionKey' for this instance: B6ABZzCc0sMI2CQc1eJYyeLXhC0I61v8xdNjUusVvp0=
[2022-10-20T12:38:09.159+03:30][INFO ][plugins.ruleRegistry] Installing common resources shared between all indices
[2022-10-20T12:38:09.189+03:30][INFO ][plugins.cloudSecurityPosture] Registered task successfully [Task: cloud_security_posture-stats_task]
[2022-10-20T12:38:09.678+03:30][INFO ][plugins.screenshotting.config] Chromium sandbox provides an additional layer of protection, and is supported for Linux Ubuntu 20.04 OS. Automatically enabling Chromium sandbox.
[2022-10-20T12:38:09.707+03:30][ERROR][elasticsearch-service] Unable to retrieve version information from Elasticsearch nodes. connect ECONNREFUSED
[2022-10-20T12:38:10.162+03:30][INFO ][plugins.screenshotting.chromium] Browser executable: /var/www/apps/kibana-8.4.2-linux-x86_64(1)/kibana-8.4.2/x-pack/plugins/screenshotting/chromium/headless_shell-linux_x64/headless_shell
[2022-10-20T12:39:34.886+03:30][INFO ][savedobjects-service] Waiting until all Elasticsearch nodes are compatible with Kibana before starting saved objects migrations...
[2022-10-20T12:39:34.887+03:30][INFO ][savedobjects-service] Starting saved objects migrations
[2022-10-20T12:39:34.935+03:30][INFO ][savedobjects-service] [.kibana] INIT -> OUTDATED_DOCUMENTS_SEARCH_OPEN_PIT. took: 28ms.
[2022-10-20T12:39:34.937+03:30][INFO ][savedobjects-service] [.kibana_task_manager] INIT -> OUTDATED_DOCUMENTS_SEARCH_OPEN_PIT. took: 27ms.
[2022-10-20T12:39:34.948+03:30][ERROR][savedobjects-service] [.kibana_task_manager] Action failed with 'search_phase_execution_exception: '. Retrying attempt 1 in 2 seconds.
[2022-10-20T12:39:34.949+03:30][INFO ][savedobjects-service] [.kibana_task_manager] OUTDATED_DOCUMENTS_SEARCH_OPEN_PIT -> OUTDATED_DOCUMENTS_SEARCH_OPEN_PIT. took: 11ms.
[2022-10-20T12:39:34.950+03:30][ERROR][savedobjects-service] [.kibana] Action failed with 'search_phase_execution_exception: '. Retrying attempt 1 in 2 seconds.
[2022-10-20T12:39:34.950+03:30][INFO ][savedobjects-service] [.kibana] OUTDATED_DOCUMENTS_SEARCH_OPEN_PIT -> OUTDATED_DOCUMENTS_SEARCH_OPEN_PIT. took: 15ms.
[2022-10-20T12:39:36.961+03:30][INFO ][savedobjects-service] [.kibana] OUTDATED_DOCUMENTS_SEARCH_OPEN_PIT -> OUTDATED_DOCUMENTS_SEARCH_READ. took: 2011ms.
The result of running elasticsearch
samira#elk:/var/www/apps/elasticsearch-8.4.2/bin$ ./elasticsearch
warning: ignoring JAVA_HOME=/usr/lib/jvm/java-11-openjdk-amd64; using bundled JDK
[2022-10-20T12:39:26,417][INFO ][o.e.n.Node ] [elk.kifarunix-demo.com] version[8.4.2], pid[3594], build[tar/89f8c6d8429db93b816403ee75e5c270b43a940a/2022-09-14T16:26:04.382547801Z], OS[Linux/5.15.0-52-generic/amd64], JVM[Oracle Corporation/OpenJDK 64-Bit Server VM/]
[2022-10-20T12:39:26,422][INFO ][o.e.n.Node ] [elk.kifarunix-demo.com] JVM home [/var/www/apps/elasticsearch-8.4.2/jdk], using bundled JDK [true]
[2022-10-20T12:39:26,422][INFO ][o.e.n.Node ] [elk.kifarunix-demo.com] JVM arguments [-Des.networkaddress.cache.ttl=60, -Des.networkaddress.cache.negative.ttl=10, -Djava.security.manager=allow, -XX:+AlwaysPreTouch, -Xss1m, -Djava.awt.headless=true, -Dfile.encoding=UTF-8, -Djna.nosys=true, -XX:-OmitStackTraceInFastThrow, -Dio.netty.noUnsafe=true, -Dio.netty.noKeySetOptimization=true, -Dio.netty.recycler.maxCapacityPerThread=0, -Dlog4j.shutdownHookEnabled=false, -Dlog4j2.disable.jmx=true, -Dlog4j2.formatMsgNoLookups=true, -Djava.locale.providers=SPI,COMPAT, --add-opens=java.base/java.io=ALL-UNNAMED, -XX:+UseG1GC, -Djava.io.tmpdir=/tmp/elasticsearch-11974649730192173872, -XX:+HeapDumpOnOutOfMemoryError, -XX:+ExitOnOutOfMemoryError, -XX:HeapDumpPath=data, -XX:ErrorFile=logs/hs_err_pid%p.log, -Xlog:gc*,gc+age=trace,safepoint:file=logs/gc.log:utctime,pid,tags:filecount=32,filesize=64m, -Xms2g, -Xmx2g, -XX:MaxDirectMemorySize=1073741824, -XX:G1HeapRegionSize=4m, -XX:InitiatingHeapOccupancyPercent=30, -XX:G1ReservePercent=15, -Des.distribution.type=tar, --module-path=/var/www/apps/elasticsearch-8.4.2/lib, --add-modules=jdk.net, -Djdk.module.main=org.elasticsearch.server]
[2022-10-20T12:39:27,685][INFO ][c.a.c.i.j.JacksonVersion ] [elk.kifarunix-demo.com] Package versions: jackson-annotations=2.13.2, jackson-core=2.13.2, jackson-databind=, jackson-dataformat-xml=2.13.2, jackson-datatype-jsr310=2.13.2, azure-core=1.27.0, Troubleshooting version conflicts: https://aka.ms/azsdk/java/dependency/troubleshoot
[2022-10-20T12:39:28,669][INFO ][o.e.p.PluginsService ] [elk.kifarunix-demo.com] loaded module [aggs-matrix-stats]
[2022-10-20T12:39:28,670][INFO ][o.e.p.PluginsService ] [elk.kifarunix-demo.com] loaded module [analysis-common]
[2022-10-20T12:39:28,670][INFO ][o.e.p.PluginsService ] [elk.kifarunix-demo.com] loaded module [constant-keyword]
[2022-10-20T12:39:28,670][INFO ][o.e.p.PluginsService ] [elk.kifarunix-demo.com] loaded module [data-streams]
The result of running logstash
$ bin/logstash -f sdamiii.conf
this is my sdamiii.conf
input {
rabbitmq {
host => "localhost"
port => 5672
heartbeat => 30
durable => true
queue => "system_logs"
user => "guest"
password => "guest"
vhost => "/"
output {
elasticsearch {
hosts => ["localhost:9200"]
rabbitmq {
exchange => "system_logs"
host => "localhost"
exchange_type => "fanout"
key => "logstash"
persistent => false
This is kibana.yml
elasticsearch.username: "kibana_system"
elasticsearch.password: "pass"
xpack.encryptedSavedObjects.encryptionKey: 5b6d5d7b20e971b2e562cc7c8ca181ae
xpack.reporting.encryptionKey: ba46a278d1dcb511339ea01f2a9d2651
xpack.security.encryptionKey: 11f8ec40b5bf442404f5c5a53b38ad13
This is elasticsearch.yml
network.host: localhost
xpack.security.enabled: false
xpack.security.enrollment.enabled: false
# Enable encryption for HTTP API client connections, such as Kibana, Logstash, and Agents
enabled: true
keystore.path: certs/http.p12
# Enable encryption and mutual authentication between cluster nodes
enabled: false
verification_mode: certificate
keystore.path: certs/transport.p12
truststore.path: certs/transport.p12
# Create a new cluster with the current node only
# Additional nodes can still join the cluster later
cluster.initial_master_nodes: ["elk.kifarunix-demo.com"]
# Allow HTTP API connections from anywhere
# Connections are encrypted and require user authentication
http.host: localhost
# ------------ X-Pack Settings (not applicable for OSS build)--------------
# X-Pack Monitoring
# https://www.elastic.co/guide/en/logstash/current/monitoring-logstash.html
#xpack.monitoring.enabled: false
#xpack.monitoring.elasticsearch.username: logstash_system
#xpack.monitoring.elasticsearch.password: password
#xpack.monitoring.elasticsearch.proxy: ["http://proxy:port"]
#xpack.monitoring.elasticsearch.hosts: ["https://es1:9200", "https://es2:9200"]
# an alternative to hosts + username/password settings is to use cloud_id/cloud_auth
#xpack.monitoring.elasticsearch.cloud_id: monitoring_cluster_id:xxxxxxxxxx
#xpack.monitoring.elasticsearch.cloud_auth: logstash_system:password
# another authentication alternative is to use an Elasticsearch API key
#xpack.monitoring.elasticsearch.api_key: "id:api_key"
#xpack.monitoring.elasticsearch.ssl.certificate_authority: "/path/to/ca.crt"
#xpack.monitoring.elasticsearch.ssl.ca_trusted_fingerprint: xxxxxxxxxx
#xpack.monitoring.elasticsearch.ssl.truststore.path: path/to/file
#xpack.monitoring.elasticsearch.ssl.truststore.password: password
#xpack.monitoring.elasticsearch.ssl.keystore.path: /path/to/file
#xpack.monitoring.elasticsearch.ssl.keystore.password: password
#xpack.monitoring.elasticsearch.ssl.verification_mode: certificate
#xpack.monitoring.elasticsearch.sniffing: false
#xpack.monitoring.collection.interval: 10s
#xpack.monitoring.collection.pipeline.details.enabled: true
# X-Pack Management
# https://www.elastic.co/guide/en/logstash/current/logstash-centralized-pipeline-management.html
#xpack.management.enabled: false
#xpack.management.pipeline.id: ["main", "apache_logs"]
#xpack.management.elasticsearch.username: logstash_admin_user
#xpack.management.elasticsearch.password: password
#xpack.management.elasticsearch.proxy: ["http://proxy:port"]
#xpack.management.elasticsearch.hosts: ["https://es1:9200", "https://es2:9200"]
# an alternative to hosts + username/password settings is to use cloud_id/cloud_auth
#xpack.management.elasticsearch.cloud_id: management_cluster_id:xxxxxxxxxx
#xpack.management.elasticsearch.cloud_auth: logstash_admin_user:password
# another authentication alternative is to use an Elasticsearch API key
#xpack.management.elasticsearch.api_key: "id:api_key"
#xpack.management.elasticsearch.ssl.ca_trusted_fingerprint: xxxxxxxxxx
#xpack.management.elasticsearch.ssl.certificate_authority: "/path/to/ca.crt"
#xpack.management.elasticsearch.ssl.truststore.path: /path/to/file
#xpack.management.elasticsearch.ssl.truststore.password: password
#xpack.management.elasticsearch.ssl.keystore.path: /path/to/file
#xpack.management.elasticsearch.ssl.keystore.password: password
#xpack.management.elasticsearch.ssl.verification_mode: certificate
#xpack.management.elasticsearch.sniffing: false
#xpack.management.logstash.poll_interval: 5s
# X-Pack GeoIP plugin
# https://www.elastic.co/guide/en/logstash/current/plugins-filters-geoip.html#plugins-filters-geoip-manage_update
#xpack.geoip.download.endpoint: "https://geoip.elastic.co/v1/database"
Now I don't know how to save the queues that Lagstash reads from RabbitMQ in Elasticsearch and watch them as a chart in Kiabana??

No NgModule metadata found for 'Fy', Angular 6

This question is similar to
No NgModule metadata found for 'AppModule'" after Upgrade to Angular 5.1.0 and AngularCli 1.6.0
Or, many of the other "No NgModule metadata found for 'AppModule'" questions.
But, seems to be more specific. I only get the following error when I publish to a web server. The program runs fine when published to my local machine and when built in development mode.
Instead of no metadata found for 'AppModule', it reports no metadata found for 'Cm' or 'Fy'.
Other people were saying that might be an issue with Webpack, but I couldn't find any solution. I've tried updating my npm packages, deleting them all, clearing the cache, and reinstalling.
GET https://dev.celinainsurance.com/styles.52c5d139a2a0d528a6bd.css 404 ()
3 GET https://dev.celinainsurance.com/polyfills.16c7192dd87d7bd6ba49.js 404 ()
main.d7aa191a672411cff0f3.js:1 Uncaught Error: No NgModule metadata found for 'Fy'.
at e.resolve (main.d7aa191a672411cff0f3.js:1)
at e.getNgModuleMetadata (main.d7aa191a672411cff0f3.js:1)
at e._loadModules (main.d7aa191a672411cff0f3.js:1)
at e._compileModuleAndComponents (main.d7aa191a672411cff0f3.js:1)
at e.compileModuleAsync (main.d7aa191a672411cff0f3.js:1)
at e.compileModuleAsync (main.d7aa191a672411cff0f3.js:1)
at e.bootstrapModule (main.d7aa191a672411cff0f3.js:1)
at Object.zUnb (main.d7aa191a672411cff0f3.js:1)
at p (runtime.a66f828dca56eeb90e02.js:1)
at Object.4 (main.d7aa191a672411cff0f3.js:1)
e.resolve # main.d7aa191a672411cff0f3.js:1
e.getNgModuleMetadata # main.d7aa191a672411cff0f3.js:1
e._loadModules # main.d7aa191a672411cff0f3.js:1
e._compileModuleAndComponents # main.d7aa191a672411cff0f3.js:1
e.compileModuleAsync # main.d7aa191a672411cff0f3.js:1
e.compileModuleAsync # main.d7aa191a672411cff0f3.js:1
e.bootstrapModule # main.d7aa191a672411cff0f3.js:1
zUnb # main.d7aa191a672411cff0f3.js:1
p # runtime.a66f828dca56eeb90e02.js:1
4 # main.d7aa191a672411cff0f3.js:1
p # runtime.a66f828dca56eeb90e02.js:1
n # runtime.a66f828dca56eeb90e02.js:1
e # runtime.a66f828dca56eeb90e02.js:1
(anonymous) # main.d7aa191a672411cff0f3.js:1
Any help would be greatly appreciated. Let me know if you need to see more code. Thanks!
I believe I found my error. I decided to create a new angular project and to repetitively publish it as I added npm packages, to see when the error would occur. When I looked up a tutorial on how to migrate my new project from angular 5 to 6, I followed the steps found here:
When I compared my old project to the working new one, I noticed two major differences. One was that when I had migrated my old project to angular 6, I had missed this step:
"Once this is done, remove rxjs-compat. This package is required to get backwards compatibility with RxJS previous to version 6. But, it is no longer required now, so let’s remove it using the following command: "
npm uninstall rxjs-compat
-- TalkingDotNet
I believe that was the major cause of the error. The other major difference I found was a segment of unnecessary code which probably didn't cause the problem, but I'm going to mention anyway just in case.
I had accidentally duplicated this line of code, in my main.ts file.
.catch(err => console.log(err));
The lines of code looked slightly different, so I hadn't noticed the duplication.

How to set the path for module command?

This is my .modulepath file, the last two lines are the paths where I have my modules mounted from another hard drive. Even though I added these lines module avail command does not fetch me any of the modules in those folders. If anyone could help it would be of great help.
# #(#)$Id: 38aa24cc33a5f54a93781d63005a084f74418022 $
# Module version 3.2.10
# init/.modulespath. Generated from .modulespath.in by configure.
# Modulepath initial setup
# ========================
# This file defines the initial setup for the module files search path.
# Comments may be added anywhere, which begin on # and continue until the
# end of the line
# Each line containing a single path will be added to the MODULEPATH
# environment variable. You may add as many as you want - just
# limited by the maximum variable size of your shell.
#/usr/share/modules/versions # location of version files
#/usr/Modules/$MODULE_VERSION/modulefiles # Module pkg modulefiles (if versioning)
#/usr/Modules/modulefiles # Module pkg modulefiles (if no versioning)
#/usr/share/modules/modulefiles # General module files
#/usr/Modules/3.2.10/your_contribs # Edit for your requirements
I have even tried using module use /opt/apps/modulesfiles/Core
user#user-N501VW:~$ module use /opt/apps/modulefiles/Core
user#user-N501VW:~$ $MODULEPATH
bash: /opt/apps/modulefiles/Core:/etc/environment-modules/modules:/usr/share/modules/versions:/usr/Modules/$MODULE_VERSION/modulefiles:/usr/share/modules/modulefiles: No such file or directory
akhila#akhila-N501VW:~$ module avail
------------------------------------------------------------- /usr/share/modules/versions --------------------------------------------------------------
------------------------------------------------------------ /usr/share/modules/modulefiles ------------------------------------------------------------
dot module-git module-info modules null use.own
Even though your specific modulepaths are correctly set in MODULEPATH environment variable, module avail does not return any modulefiles for these directories. It means module has not found a file in these directories that is a modulefile.
So I suggest you to:
check if your mountpoint is correctly mounted
verify that the files in the directories are modulefiles compatible with the module command you use (on the module command version you use, the content of a modulefile should start with the #%Module magic cookie)

ipython3 - Almost every time I tab complete in ipython3 it runs %rehashx, is there a workaround?

I've tried googling around but haven't found much / anything, the following also doesn't help at all...
typical usecase is:
In [31]: from sqlalch<TAB>
Caching the list of root modules, please wait!
(This will only be done once - type '%rehashx' to reset cache!)
Caching the list of root modules, please wait!
(This will only be done once - type '%rehashx' to reset cache!)
Caching the list of root modules, please wait!
(This will only be done once - type '%rehashx' to reset cache!)
Also running %rehashx by itself also doesn't help. I also pip installed pyreadline.
Any ideas what is going wrong? Where does %rehashx store info?
The output from get_ipython().db['rootmodules_cache'] gives the following:
for key in d.keys(): print key
# /usr/local/bin
# /usr/lib/python3/dist-packages
# /usr/lib/python3.5
# /usr/local/lib/python3.5/dist-packages <- should be in here
# /usr/lib/python3.5/lib-dynload
# /usr/lib/python35.zip
# /usr/local/lib/python3.5/dist-packages/IPython/extensions
# /usr/lib/python3.5/plat-x86_64-linux-gnu
# /home/user/.ipython
import sqlalchemy
# /user/local/lib/python3.5/dist-packages/sqlalchemy/__init__.py
However sqlalchemy is not in the list
d = get_ipython().db['rootmodules_cache']
'sqlalchemy' in d['/user/local/lib/python3.5/dist-packages']
# False
This command solved to me, in the Ipython:
!rm .ipython/profile_default/db/*
I hope it adds to yours.

How can I use the MIST library to de-identify a text?

I wonder how I use can the MIST library to de-identify a text, e.g., transforming
Patient ID: P89474
Mary Phillips is a 45-year-old woman with a history of diabetes.
She arrived at New Hope Medical Center on August 5 complaining
of abdominal pain. Dr. Gertrude Philippoussis diagnosed her
with appendicitis and admitted her at 10 PM.
Patient ID: [ID]
[NAME] is a [AGE]-year-old woman with a history of diabetes.
She arrived at [HOSPITAL] on [DATE] complaining
of abdominal pain. Dr. [PHYSICIAN] diagnosed her
with appendicitis and admitted her at 10 PM.
I've wandered through the documentation but no luck so far.
This answer was tested on Windows 7 SP1 x64 Ultimate with Anaconda Python 2.7.11 x64, and MIST 2.0.4. MIST 2.0.4 does not work with Python 3.x (according to the manual, I haven't tested it myself).
MIST (MITRE Identification Scrubber Toolkit) [1] is a customization of MAT (MITRE Annotation Toolkit), which is a tool to tag documents automatically or with humans (for the latter it provides a GUI via webserver). The automatic tagger is based on Carafe (ConditionAl RAndom Fields) [2], which is an OCaml implementation of conditional random fields (CRF).
MIST does not come with any trained model, and is has only ~10 short, non-medical documents annotated with typical NER class (like organization and person).
De-id (de-identification) is the process of tagging PHIs (Private Health Information) in a document, and replacing them with fake data. Let's ignore PHI replacement for now, and focus on tagging. In order to tag a document (e.g., a patient note), MAT follows a typical machine learning scheme: the CRF needs to be trained on a labeled dataset (= a set of labeled documents), then we use it to tag unlabeled documents.
The main technical concept in MAT is tasks. A task is a set of activities, called workflows, which can be broken down into steps. Named-entity recognition (NER) is one task. De-id is another task (mostly, NER geared toward medical texts): in other words, MIST is just one task of MAT (actually 3: core, HIPAA, and AMIA. Core is a parent task, while HIPAA and AMIA are two different tagets). Steps are for example tokenization, tagging, or cleaning. Workflows are just list of steps that one may follow.
With this in mind, here is the code for Microsoft Windows:
rem Instructions for Windows 7 SP1 x64 Ultimate
rem Installing MIST: set MAT_PKG_HOME depending on where you downloaded it
SET MAT_PKG_HOME=C:\Users\Francky\Downloads\MIST_2_0_4\MIST_2_0_4\src\MAT
SET TMP=C:\Users\Francky\Downloads\MIST_2_0_4\MIST_2_0_4\temp
cd C:\Users\Francky\Downloads\MIST_2_0_4\MIST_2_0_4
python install.py
# MAT is now installed. We'll show how to use it for NER.
# We will be taking snippets from some of the 8 tutorials.
# A lot of the tutorial content are about the annotation GUI,
# which we don't care here.
# Tuto 1: install task
bin\MATManagePluginDirs.cmd install %CD%\sample\ne
# Tuto 2: build model (i.e., train it on labeled dataset)
bin\MATModelBuilder.cmd --task "Named Entity" --model_file %TMP%\ne_model ^
--input_files "%CD%\sample\ne\resources\data\json\*.json"
# Tuto 2: Add trained model as the default model
bin\MATModelBuilder.cmd --task "Named Entity" --save_as_default_model ^
--input_files "%CD%\sample\ne\resources\data\json\*.json"
# Tudo 5: use CLI -> prepare the document
bin\MATEngine.cmd --task "Named Entity" --workflow Demo --steps "zone,tokenize" ^
--input_file %CD%\sample\ne\resources\data\raw\voa2.txt --input_file_type raw ^
--output_file %CD%\voa2_txt.json --output_file_type mat-json
# Tuto 5: use CLI -> tag the document
bin\MATEngine.cmd --task "Named Entity" --workflow Demo --steps "tag" ^
--input_file %CD%\voa2_txt.json --input_file_type mat-json ^
--output_file %CD%\voa2_txt.json --output_file_type mat-json ^
NER is now done.
Here are the same instructions for Ubuntu 14.04.4 LTS x64:
# Instructions for Ubuntu 14.04.4 LTS x64
# Installing MIST: set MAT_PKG_HOME depending on where you downloaded it
export MAT_PKG_HOME=/home/ubuntu/mist/MIST_2_0_4/MIST_2_0_4/src/MAT
export TMP=/home/ubuntu/mist/MIST_2_0_4/MIST_2_0_4/temp
mkdir $TMP
cd /home/ubuntu/mist/MIST_2_0_4/MIST_2_0_4/
python install.py
# MAT is now installed. We'll show how to use it for NER.
# We will be taking snippets from some of the 8 tutorials.
# A lot of the tutorial content are about the annotation GUI,
# which we don't care here.
# Tuto 1: install task
bin/MATManagePluginDirs install $PWD/sample/ne
# Tuto 2: build model (i.e., train it on labeled dataset)
bin/MATModelBuilder --task "Named Entity" --model_file $TMP/ne_model \
--input_files "$PWD/sample/ne/resources/data/json/*.json"
# Tuto 2: Add trained model as the default model
bin/MATModelBuilder --task "Named Entity" --save_as_default_model \
--input_files "$PWD/sample/ne/resources/data/json/*.json"
# Tudo 5: use CLI -> prepare the document
bin/MATEngine --task "Named Entity" --workflow Demo --steps "zone,tokenize" \
--input_file $PWD/sample/ne/resources/data/raw/voa2.txt --input_file_type raw \
--output_file $PWD/voa2_txt.json --output_file_type mat-json
# Tuto 5: use CLI -> tag the document
bin/MATEngine --task "Named Entity" --workflow Demo --steps "tag" \
--input_file $PWD/voa2_txt.json --input_file_type mat-json \
--output_file $PWD/voa2_txt.json --output_file_type mat-json \
To run de-id, there is no need to install the de-id tasks are they are pre-installed. There are 2 de-id tasks (\MIST_2_0_4\src\tasks\HIPAA\task.xml and \MIST_2_0_4\src\tasks\AMIA\task.xml). They don't come with any trained model nor labeled dataset, so you may want to get some data at Physician notes with annotated PHI.
For Microsoft Windows ( tested with Windows 7 SP1 x64 Ultimate ):
To train the model (you can replace HIPAA Deidentification with AMIA Deidentification depending on the tagset you wish to use):
bin\MATModelBuilder.cmd --task "HIPAA Deidentification" ^
--save_as_default_model --nthreads=3 --max_iterations=15 ^
--lexicon_dir="%CD%\sample\mist\gazetteers" ^
--input_files "%CD%\sample\mist\i2b2-60-00-40\train\*.json"
To run the trained model on one file:
bin\MATEngine --task "HIPAA Deidentification" --workflow Demo ^
--input_file .\note.txt --input_file_type raw ^
--output_file .\note.json --output_file_type mat-json ^
--tagger_local ^
--steps "clean,zone,tag"
To run the trained model on one directory:
bin\MATEngine --task "HIPAA Deidentification" --workflow Demo ^
--input_dir "%CD%\sample\test" --input_file_type raw ^
--output_dir "%CD%\sample\test" --output_file_type mat-json ^
--tagger_local ^
--steps "clean,zone,tag"
As usual, one can specify the input file format to be JSON:
bin\MATEngine --task "HIPAA Deidentification" --workflow Demo ^
--input_dir "%CD%\sample\mist\i2b2-60-00-40\test" --input_file_type mat-json ^
--output_dir "%CD%\sample\mist\i2b2-60-00-40\test_out" --output_file_type mat-json ^
--tagger_local --steps "tag"
For Ubuntu 14.04.4 LTS x64:
To train the model (you can replace HIPAA Deidentification with AMIA Deidentification depending on the tagset you wish to use):
bin/MATModelBuilder --task "HIPAA Deidentification" \
--save_as_default_model --nthreads=20 --max_iterations=15 \
--lexicon_dir="$PWD/sample/mist/gazetteers" \
--input_files "$PWD/sample/mist/i2b2-60-00-40/train/*.json"
To run the trained model on one file:
bin/MATEngine --task "HIPAA Deidentification" --workflow Demo \
--input_file ./note.txt --input_file_type raw \
--output_file ./note.json --output_file_type mat-json \
--tagger_local \
--steps "clean,zone,tag"
To run the trained model on one directory:
bin/MATEngine --task "HIPAA Deidentification" --workflow Demo \
--input_dir "$PWD/sample/test" --input_file_type raw \
--output_dir "$PWD/sample/test" --output_file_type mat-json \
--tagger_local \
--steps "clean,zone,tag"
As usual, one can specify the input file format to be JSON:
bin/MATEngine --task "HIPAA Deidentification" --workflow Demo \
--input_dir "$PWD/sample/mist/i2b2-60-00-40/test" --input_file_type mat-json \
--output_dir "$PWD/sample/mist/i2b2-60-00-40/test_out" --output_file_type mat-json \
--tagger_local --steps "tag"
Typical error messages:
raise PluginError, "Carafe not configured properly for this task and workflow: " + str(e) (when trying to tag a document): it often means that no model was specified. You need to either defined a default model, or use --tagger_model /path/to/model/.
Exception in thread "main" java.lang.OutOfMemoryError: GC overhead limit exceeded (when training a model): it's easy to go over the heap_size limit ( the default is 2GB ). You can increase the heap_size with the --heap_size parameter. Example (Linux):
bin/MATModelBuilder --task "HIPAA Deidentification" \
--save_as_default_model --nthreads=20 --max_iterations=15 \
--lexicon_dir="$PWD/sample/mist/gazetteers" \
--heap_size=60G \
--input_files "$PWD/sample/mist/mimic-140-20-40/train/*.json"
[1] John Aberdeen, Samuel Bayer, Reyyan Yeniterzi, Ben Wellner, Cheryl Clark, David Hanauer, Bradley Malin, Lynette Hirschman, The MITRE identification scrubber toolkit: design, training, and assessment, Int. J. Med. Informatics 79 (12) (2010) 849–859, http://dx.doi.org/10.1016/j.ijmedinf.2010.09.007.
[2] B. Wellner, Sequence Models and Ranking Methods for
Discourse Parsing [Ph.D. Dissertation]. Brandeis University,
Waltham, MA, 2009. http://www.cs.brandeis.edu/~wellner/pubs/wellner_dissertation.pdf
Usage: MATEngine [core options] [input/output/task options] [other options]
-h, --help show this help message and exit
Core options:
additional directory to load a task from. Optional and
a file of settings to use which overwrites existing
settings. The file should be a Python config file in
the style of the template in
etc/MAT_settings.config.in. Optional.
--task=task name of the task to use. Obligatory if the system
knows of more than one task. Known tasks are: AMIA
Deidentification, Named Entity, HIPAA
Deidentification, Enhanced Named Entity
--version Print version number and exit
--debug Enable debug output.
Set the subprocess debug level to the value provided,
overriding the global setting. 0 disables, 1 shows
some subprocess activity, 2 shows all subprocess
Enable subprocess statistics (memory/time), if the
capability is available and it isn't globally enabled.
--tmpdir_root=dir Override the default system location for temporary
files. If the directory doesn't exist, it will be
created. Use this feature to control where temporary
files are created, for added security, or in
conjunction with --preserve_tempfiles, as a debugging
Preserve the temporary files created, as a debugging
--verbose_config If specified, print to stderr the source of each MAT
configuration variable the first time it's accessed.
