Syntaxnet Turkish Language Data Set Non Existent Map Files

Syntaxnet Turkish Language Data Set Non Existent Map Files - nlp

I am new to Syntaxnet and i tried to use pre-trained model of Turkish language through the instructions here
Point-1 : Although I set the MODEL_DIRECTORY environment variable, tokenize.sh didn't find the related path and it gives error like below :
root#4562a2ee0202:/opt/tensorflow/models/syntaxnet# echo "Eray eve geldi." | syntaxnet/models/parsey_universal/tokenize.sh
F syntaxnet/term_frequency_map.cc:62] Check failed: ::tensorflow::Status::OK() == (tensorflow::Env::Default()->NewRandomAccessFile(filename, &file)) (OK vs. **Not found: label-map**)
Point-2 : So, I changed the tokenize.sh through commenting the MODEL_DIR=$1 and set my Turkish language model path like below to go on :
PARSER_EVAL=bazel-bin/syntaxnet/parser_eval
CONTEXT=syntaxnet/models/parsey_universal/context.pbtxt
INPUT_FORMAT=stdin-untoken
MODEL_DIR=$1
MODEL_DIR=syntaxnet/models/etiya-smart-tr
Point-3 : After that when I run it as told, it gives error like below :
root#4562a2ee0202:/opt/tensorflow/models/syntaxnet# echo "Eray eve geldi" | syntaxnet/models/parsey_universal/tokenize.sh
I syntaxnet/term_frequency_map.cc:101] Loaded 29 terms from syntaxnet/models/etiya-smart-tr/label-map.
I syntaxnet/embedding_feature_extractor.cc:35] Features: input.char input(-1).char input(1).char; input.digit input(-1).digit input(1).digit; input.punctuation-amount input(-1).punctuation-amount input(1).punctuation-amount
I syntaxnet/embedding_feature_extractor.cc:36] Embedding names: chars;digits;puncts
I syntaxnet/embedding_feature_extractor.cc:37] Embedding dims: 16;16;16
F syntaxnet/term_frequency_map.cc:62] Check failed: ::tensorflow::Status::OK() == (tensorflow::Env::Default()->NewRandomAccessFile(filename, &file)) (OK vs. **Not found: syntaxnet/models/etiya-smart-tr/char-map**)
I had downloaded the Turkish package through tracing the link pattern indicated like download.tensorflow.org/models/parsey_universal/.zip
and my language mapping file list like below :
-rw-r----- 1 root root 50646 Sep 22 07:24 char-ngram-map
-rw-r----- 1 root root 329 Sep 22 07:24 label-map
-rw-r----- 1 root root 133477 Sep 22 07:24 morph-label-set
-rw-r----- 1 root root 5553526 Sep 22 07:24 morpher-params
-rw-r----- 1 root root 1810 Sep 22 07:24 morphology-map
-rw-r----- 1 root root 10921546 Sep 22 07:24 parser-params
-rw-r----- 1 root root 39990 Sep 22 07:24 prefix-table
-rw-r----- 1 root root 28958 Sep 22 07:24 suffix-table
-rw-r----- 1 root root 561 Sep 22 07:24 tag-map
-rw-r----- 1 root root 5234212 Sep 22 07:24 tagger-params
-rw-r----- 1 root root 172869 Sep 22 07:24 word-map
QUESTION-1 :
I am aware that there is no char-map file in the directory so I got the error written # Point-3 above. So, does anyone have an opinion about how the Turkish language test could be done and the result was shared as %93,363 for Part-of-Speech for example?
QUESTION-2:
How can I find the char-map file for Turkish language?
QUESTION-3:
If there is no char-map file, must I train through tracing the steps indicated as SyntaxNet's Obtain Data & Training?
QUESTION-4:
Is there a way to generate word-map, char-map... etc. files? Is it the well known word2vec approach that can be used to generate map files which will be able to be processed wt. Syntaxnet tokenizers?

Try this https://github.com/tensorflow/models/issues/830 issue - it contains an (at this moment) temporary solution.

Related

Groovy copy file, modify values and save it

I need to use Groovy to copy a "template" file and replace some text (a variable) with current values and then save the file in a new file.
bay_equipment.each{
hname="Network-${it['hostname']}-${it['function']}"
nodeName="${it['hostname']}"
nodeIP="${it['ip_address']}"
fname="${it['hostname']}-${it['function']}-Network.cfg"
def src= new File("../config-files/nagios-switch-bayg20.cfg")
def dst= new File("../dest-files/$fname")
dst << src.text
//everything work ups to here
//now trying to open the dst file and make modifications then save it.
//This string <%=#deviceType%>-<%=#deviceGroup%>-<%=#nodeName%> needs to be replaced with hname variable and likewise the <%=#nodeIP%> with nodeIP
dst = (dst =~ /<%=#deviceType%>-<%=#deviceGroup%>-<%=#nodeName%>/).replaceFirst("$hname")
dst = (dst =~ /<%=#nodeIP%>/).replaceFirst("$nodeIP")
dst.write(dst)
}
I'm getting an error when I run it:
Caught: groovy.lang.MissingMethodException: No signature of method: java.lang.String.write() is applicable for argument types: (String) values: [../dest-files/u100en00700-BAYG20-Network.cfg]
Possible solutions: wait(), wait(long), with(groovy.lang.Closure), trim(), size(), toSet()
groovy.lang.MissingMethodException: No signature of method: java.lang.String.write() is applicable for argument types: (String) values: [../dest-files/u100en00700-BAYG20-Network.cfg]
Possible solutions: wait(), wait(long), with(groovy.lang.Closure), trim(), size(), toSet()
at nsr_nagios$_run_closure4.doCall(nsr_nagios:53)
at nsr_nagios.run(nsr_nagios:38)

For an example of how you can work with this, the following code:
def equipment = [
[hostname: 'foo', function: "bar", ip_address: '1.2.3.4'],
[hostname: 'alice', function: "raven", ip_address: '5.6.7.8']
]
equipment.each {
def binding = [
hname: "Network-${it['hostname']}-${it['function']}",
nodeName: it.hostname,
nodeIP: it.ip_address,
fname: "${it['hostname']}-${it['function']}-Network.cfg"
]
def srcData = new File("src-file.txt").text
def template = new groovy.text.StreamingTemplateEngine().createTemplate(srcData)
new File(binding.fname).text = template.make(binding)
}
given the following source file:
<%= hname %>-<%= nodeName %>-<%= nodeIP %>
will create two files, both templated. Example execution sequence:
─➤ ls -la
total 24
drwxrwxr-x 2 mbjarland mbjarland 4096 Mar 4 17:55 .
drwxr-xr-x 112 mbjarland mbjarland 12288 Mar 4 17:47 ..
-rw-rw-r-- 1 mbjarland mbjarland 634 Mar 4 17:55 solution.groovy
-rw-rw-r-- 1 mbjarland mbjarland 42 Mar 4 17:55 src-file.txt
─➤ cat src-file.txt
<%= hname %>-<%= nodeName %>-<%= nodeIP %>
─➤ groovy solution.groovy
─➤ ls -la
total 32
drwxrwxr-x 2 mbjarland mbjarland 4096 Mar 4 17:56 .
drwxr-xr-x 112 mbjarland mbjarland 12288 Mar 4 17:47 ..
-rw-rw-r-- 1 mbjarland mbjarland 33 Mar 4 17:56 alice-raven-Network.cfg
-rw-rw-r-- 1 mbjarland mbjarland 27 Mar 4 17:56 foo-bar-Network.cfg
-rw-rw-r-- 1 mbjarland mbjarland 634 Mar 4 17:55 solution.groovy
-rw-rw-r-- 1 mbjarland mbjarland 42 Mar 4 17:55 src-file.txt
─➤ cat foo-bar-Network.cfg
Network-foo-bar-foo-1.2.3.4
─➤ cat alice-raven-Network.cfg
Network-alice-raven-alice-5.6.7.8
─➤
where the template engine is a groovy built in class for doing just these kinds of things.

transforms.route.topic.expression and groovy expression

I'm trying to use debezium transforms.route.topic.expression
Here the entries in connector configuration
"transforms": "dropPrefix,unwrapi,route",
"transforms.dropPrefix.type": "org.apache.kafka.connect.transforms.RegexRouter",
"transforms.dropPrefix.regex": "rocketlawyer.dbo.(.*)",
"transforms.dropPrefix.replacement": "$1",
"transforms.unwrap.type": "io.debezium.transforms.ExtractNewRecordState",
"transforms.unwrap.drop.tombstones": "false",
"transforms.route.type": "io.debezium.transforms.ContentBasedRouter",
"transforms.route.language": "jsr223.groovy",
"transforms.route.topic.expression": "value.snapshotequals('true') ? ${topic} : cdc.$1"
when I'm applying config call I'm getting following error
iguenkin_rocketlawyer_com#dbadm-r208:~/confluent$ curl -d #mssql_trg_cf.json -H "Content-Type: application/json" -X PUT http://localhost:8083/connectors/mssql_trg/config | jq
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 1326 100 242 100 1084 30250 132k --:--:-- --:--:-- --:--:-- 161k
{
"error_code": 500,
"message": "Unexpected character ('\"' (code 34)): was expecting comma to separate Object entries\n at [Source: (org.glassfish.jersey.message.internal.ReaderInterceptorExecutor$UnCloseableInputStream); line: 1, column: 832]"
}
column: 832
"transforms.route.topic.expression": "value.snapshotequals('true') ? ${topic} : cdc.$1"
column 830 is a blank space in string : cdc
Here the questions
how can I be sure that jsr223.groovy is installed and accessible to transformations
as I am understand it is a part of dibezium connector, so it is not listed as a separate plugin
here the list of jars whee debezium is installed
`-rw-r--r-- 1 iguenkin_rocketlawyer_com 1234193955 20272 Dec 9 20:44 debezium-api-1.3.1.Final.jar
-rw-r--r-- 1 iguenkin_rocketlawyer_com 1234193955 91369 Dec 9 20:44 debezium-connector-sqlserver-1.3.1.Final.jar
-rw-r--r-- 1 iguenkin_rocketlawyer_com 1234193955 844500 Dec 9 20:44 debezium-core-1.3.1.Final.jar
-rw-r--r-- 1 iguenkin_rocketlawyer_com 1234193955 19090 Dec 15 22:01 debezium-scripting-1.3.1.Final.jar
-rw-r--r-- 1 iguenkin_rocketlawyer_com 1234193955 16839 Dec 16 01:04 groovy-jsr223-3.0.7-indy.jar`
if groovy is installed what is wrong with expression ? I've used following documentation to set connector https://debezium.io/documentation/reference/configuration/content-based-routing.html

Python PATH .is_file() evaluates symlink as a file

In my Python3 program, I take a bunch of paths and do things based on what they are. When I evaluate the following symlinks (snippet):
lrwxrwxrwx 1 513 513 5 Aug 19 10:56 console -> ttyS0
lrwxrwxrwx 1 513 513 11 Aug 19 10:56 core -> /proc/kcore
lrwxrwxrwx 1 513 513 13 Aug 19 10:56 fd -> /proc/self/fd
the results are:
symlink console -> ttyS0
file core -> /proc/kcore
symlink console -> ttyS0
It evaluates core as if it were a file (vs a symlink). What is the best way for me to evaluate it as a symlink vs a file? code below
#!/usr/bin/python3
import sys
import os
from pathlib import Path
def filetype(filein):
print(filein)
if Path(filein).is_file():
return "file"
if Path(filein).is_symlink():
return "symlink"
else:
return "doesn't match anything"
if __name__ == "__main__":
file = sys.argv[1]
print(str(file))
print(filetype(file))

The result of is_file is intended to answer the question "if I open this name, will I open a file". For a symlink, the answer is "yes" if the target is a file, hence the return value.
If you want to know if the name is a symlink, ask is_symlink.

How to use alsa sound and/or snd_pcm_open in docker?

I am running an Ubuntu 12.04 Docker container on an Ubuntu 16.04 host. Some test code I have exercises 'snd_pcm_open'/'snd_pcm_close' operations with the SND_PCM_STREAM_PLAYBACK and SND_PCM_STREAM_CAPTURE stream types.
I do not need any actual sound/audio capabilities but just getting the 'snd_pcm_open' return 0 with a valid handle, then 'snd_pcm_close' to return 0 on the same handle would be good enough for my purposes. I do not want to modify the code as it's already got some not-so-nice platform dependent switches and I am not the maintainer.
I am using the simple code and compiling it as 'g++ alsa_test.cpp -lasound'
#include <stdio.h>
#include <alsa/asoundlib.h>
int main() {
snd_pcm_t* handle;
snd_pcm_stream_t stream_type[]= {SND_PCM_STREAM_PLAYBACK, SND_PCM_STREAM_CAPTURE};
printf("\nstarting\n");
for (unsigned char i = 0; i < sizeof(stream_type) / sizeof(stream_type[0]); ++i) {
printf(">>>>>>>>\n\n");
int deviceResult = snd_pcm_open(&handle, "default" , stream_type[i], 0);
printf("\n%d open: %d\n", stream_type[i], deviceResult);
if (deviceResult >= 0) {
printf("attempting to close %d\n", stream_type[i]);
snd_pcm_drain(handle);
deviceResult = snd_pcm_close(handle);
printf("%d close: %d\n\n", stream_type[i], deviceResult);
}
printf("<<<<<<<<\n\n");
}
return 0;
}
It works just fine on the host but despite all the different things I tried, 'snd_pcm_open' returns '-2' for both stream types in the container.
I tried installing the 'libasound2.dev' but 'modinfo soundcore' is empty and '/dev/snd' does not exist.
Also tried running the container with the options below, even though it feels like a massive over kill for such a simple purpose
--privileged --cap-add=ALL -v /dev:/dev -v /lib/modules:/lib/modules
After these extra parameters to the container, following commands generate the same output both in the host and the container.
root#31142791f82d:/export# modinfo soundcore
filename: /lib/modules/4.4.0-59-generic/kernel/sound/soundcore.ko
alias: char-major-14-*
license: GPL
author: Alan Cox
description: Core sound module
srcversion: C941364F5CD0B525693B243
depends:
intree: Y
vermagic: 4.4.0-59-generic SMP mod_unload modversions
parm: preclaim_oss:int
root#31142791f82d:/export# ls -l /dev/snd/
total 0
drwxr-xr-x 2 root root 100 Feb 2 21:10 by-path
crw-rw----+ 1 root audio 116, 2 Feb 2 07:42 controlC0
crw-rw----+ 1 root audio 116, 7 Feb 2 07:42 controlC1
crw-rw----+ 1 root audio 116, 12 Feb 2 21:10 controlC2
crw-rw----+ 1 root audio 116, 6 Feb 2 07:42 hwC0D0
crw-rw----+ 1 root audio 116, 11 Feb 2 07:42 hwC1D0
crw-rw----+ 1 root audio 116, 3 Feb 2 07:42 pcmC0D3p
crw-rw----+ 1 root audio 116, 4 Feb 2 07:42 pcmC0D7p
crw-rw----+ 1 root audio 116, 5 Feb 2 07:42 pcmC0D8p
crw-rw----+ 1 root audio 116, 9 Feb 2 10:44 pcmC1D0c
crw-rw----+ 1 root audio 116, 8 Feb 2 07:42 pcmC1D0p
crw-rw----+ 1 root audio 116, 10 Feb 2 21:30 pcmC1D1p
crw-rw----+ 1 root audio 116, 14 Feb 2 21:10 pcmC2D0c
crw-rw----+ 1 root audio 116, 13 Feb 2 21:10 pcmC2D0p
crw-rw----+ 1 root audio 116, 1 Feb 2 07:42 seq
crw-rw----+ 1 root audio 116, 33 Feb 2 07:42 timer
The container only has the 'root' user by the way, so, access rights shouldn't be an issue either.
What would be the easiest and least hacky way to get this working? I'd rather get rid off the privileged mode and dev/modules mapping to the container however, these containers are not accessed from the outside world and are only created/destroyed for some short lived tasks. So, safety isn't exactly a massive concern.
Thanks in advance.

If you don't actually need the device to work correctly, use the null device instead of default.
To make the null plugin the default one, put this into the container's /etc/asound.conf, or into the user's ~/.asoundrc:
pcm.!default = null;

Convert string to class object in Python

I have a list of strings. Each string of the list has same format. I would like to convert each string into a class object (if that is the best option), so I can do some analysis of the list of class object.
As an example,
I have the following list
ls_list = ['-rw-r--r-- 1 ahmed None 0 Apr 21 17:10 bar1',
'-rw-r--r-- 1 ahmed None 0 Apr 21 17:10 bar2',
'-rw-r--r-- 1 ahmed None 0 Apr 21 17:10 foo1',
'-rw-r--r-- 1 ahmed None 0 Apr 21 17:10 foo2']
I would like to convert each of the above string into a class that has Nine members (perm, etc).

You don't want to do that. Use os.listdir() and os.stat() to get the information you want.

You do it something like this:
import os
data = {}
for file in os.listdir('.'):
data[file] = os.stat(file)
This gives you the information for all the files in the current directory, as objects, as you requested. These can then be inspected, and you can use other functions to figure out the username of the userid you get, etc.

Here's one way of doing what you ask, but as others have noted, don't use this for parsing the output of ls. Also, it assumes that there are only 8 areas of whitespace in your input string separating your data. If any of the data substrings also contain whitespace, this code will fail:
class FileAttribs(object):
ORDERED_ATTRIB_NAMES = ["permissions", "links", "owner",
"groups", "size", "month", "day", "time", "name"]
def __init__(self, lsString):
for (attrib, s) in zip(self.ORDERED_ATTRIB_NAMES, lsString.split()):
setattr(self, attrib, s)
if __name__ == '__main__':
ls_list = ['-rw-r--r-- 1 ahmed None 0 Apr 21 17:10 bar1',
'-rw-r--r-- 1 ahmed None 0 Apr 21 17:10 bar2',
'-rw-r--r-- 1 ahmed None 0 Apr 21 17:10 foo1',
'-rw-r--r-- 1 ahmed None 0 Apr 21 17:10 foo2']
files = []
for s in ls_list:
files.append(FileAttribs(s))
#do stuff with files, e.g.
for f in files:
print f.permissions, f.name

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Syntaxnet Turkish Language Data Set Non Existent Map Files - nlp

Try this https://github.com/tensorflow/models/issues/830 issue - it contains an (at this moment) temporary solution.

Related

Groovy copy file, modify values and save it

transforms.route.topic.expression and groovy expression

Python PATH .is_file() evaluates symlink as a file

How to use alsa sound and/or snd_pcm_open in docker?

Convert string to class object in Python

Categories

Resources