icu tokenizer in node.js 12 - node.js

I have a ICU tokenizer for Python3. This python code uses BreakIterator and Locale from icu (PyICU) library:
from icu import Locale,BreakIterator
def wordSegmenter(txt, iter):
tokens = []
bi.setText(txt)
start = iter.first()
try:
while True:
end = next(iter)
tokens.append(txt[start:end])
start = end
except StopIteration:
pass
return tokens
text = u'退屈であくびばっかしていた毎日'
tokens = wordSegmenter(text, wordBreakIterator("ja"))
['退屈', 'で', 'あくび', 'ばっか', 'し', 'てい', 'た', '毎日']
I have now to port ICU to to NodeJS via node-gyp bindings. When building the native library here I get the error
../src/wordsplit.cc:82:65: error: too few arguments to function call, single argument 'context' was
not specified
Nan::New<FunctionTemplate>(SplitWords)->GetFunction());
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ^
/Users/loretoparisi/Library/Caches/node-gyp/12.13.1/include/node/v8.h:5995:3: note: 'GetFunction'
that seems to be related to Nan support and Node12.x. How to correctly correctly port to Node12 and get rid of V8 deprecations?

Well, as the error message tells you, FunctionTemplate::GetFunction now needs a Context. I don't know much about Nan, but I'd suggest to try Nan::GetCurrentContext().
That said, if you run Node 12 with --harmony-intl-segmenter, you get access to V8's supposedly complete (but not yet shipping by default) implementation of the JavaScript Intl.Segmenter proposal (https://github.com/tc39/proposal-intl-segmenter), which is also based on ICU. That might save you a bunch of work, and it's on the way towards becoming an official part of JavaScript, so it will soon be widely available.

Related

pgfopts: arguments with spaces don't play well with babel

If I define a new package like this
\NeedsTeXFormat{LaTeX2e}
\ProvidesClass{myPlanning}[2022/07/16 my Planning class]
\LoadClass[french]{article}
\RequirePackage{pgfopts}
\pgfkeys{
/myOrg/.cd,
lang/.initial = english , lang/.store in = \myOrg#lang,
title/.initial = title , title/.store in = \myOrg#title,
}
\ProcessPgfOptions{/myOrg}
\RequirePackage[\myOrg#lang]{babel}
and I try to compile this document
\documentclass[lang=french,title={truc bidul}]{myPlanning}
\begin{document}
some text here
\end{document}
I get the following error:
This is pdfTeX, Version 3.141592653-2.6-1.40.22 (TeX Live 2022/dev/Debian) (preloaded format=pdflatex)
restricted \write18 enabled.
entering extended mode
(./Test.tex
LaTeX2e <2021-11-15> patch level 1
L3 programming layer <2022-01-21>
(/home/hylkema/texmf/tex/latex/local/Org/myPlanning.cls
Document Class: myPlanning 2022/07/16 my Planning class
(/usr/share/texlive/texmf-dist/tex/latex/base/article.cls
Document Class: article 2021/10/04 v1.4n Standard LaTeX document class
(/usr/share/texlive/texmf-dist/tex/latex/base/size10.clo))
(/usr/share/texlive/texmf-dist/tex/latex/pgfopts/pgfopts.sty
(/usr/share/texlive/texmf-dist/tex/latex/pgf/utilities/pgfkeys.sty
(/usr/share/texlive/texmf-dist/tex/generic/pgf/utilities/pgfkeys.code.tex
(/usr/share/texlive/texmf-dist/tex/generic/pgf/utilities/pgfkeysfiltered.code.t
ex))))) (/usr/share/texlive/texmf-dist/tex/generic/babel/babel.sty
(/usr/share/texlive/texmf-dist/tex/generic/babel/txtbabel.def)
(/usr/share/texlive/texmf-dist/tex/generic/babel-french/french.ldf)
! LaTeX Error: Missing \begin{document}.
See the LaTeX manual or LaTeX Companion for explanation.
Type H <return> for immediate help.
...
l.4261 \ifin#\edef\bbl#tempc{\bbl#tempb}\fi}
?
However, if I compile this (No spaces in the title argument):
\documentclass[lang=french,title=truc]{myPlanning}
\begin{document}
some text here
\end{document}
It compiles fine with no errors.
What's more, the first document with spaces in the title argument compiles fine if I remove the \RequirePackage[\myOrg#lang]{babel} line from the package definition.
Is this a known problem and is there a solution ?
Thanks for your help,
Jouke
Update: The underlying issue was fixed in upstream babel on 2022-11-24 and should be resolved in the next release.
The issue is not the key=value parser but a code block of babel that tries to detect whether the wrong language is loaded if it got both options from the \documentclass and options when the package was called. That being said, pgfopts isn't up-to-date with current LaTeX (there were substantial changes to the option-system about a year ago that pgfopts has not yet followed suit), the only solutions to key=value options that are compatible to my knowledge are the builtin mechanism and expkv-opt.
But as I said, neither will solve your issue. Using main=\myOrg#lang in the options of babel however will, as if you used the main-option the problematic code of babel will not be used. So if you change your class to the following the only (admittedly strange) warning regarding your options you'll get will be
LaTeX Warning: Unused global option(s):
[french].
(but that stems from your strange usage of \LoadClass, in which article will add french to the unused options list though it isn't in the global options list, hence babel will not pick it up and remove it from said list).
\NeedsTeXFormat{LaTeX2e}
\ProvidesClass{myPlanning}[2022/07/16 my Planning class]
\LoadClassWithOptions{article}
\RequirePackage{pgfopts}
\pgfkeys{
/myOrg/.cd
,lang/.initial = english
,lang/.store in = \myOrg#lang
,title/.initial = title
,title/.store in = \myOrg#title,
}
\ProcessPgfOptions{/myOrg}
\RequirePackage[main=\myOrg#lang]{babel}

Problem stripping characters from a pexpect query

I'm testing a python script to retrieve the software version from Cisco gear. The script works fine connecting to the remote device and obtaining the information required. The problem starts when I try to filter some unwanted information from pexpect output. The function's code in charge of getting the vesion information is as follows:
def get_version_info(session):
session.sendline('show version | include Version')
result = session.expect(['>', pexpect.TIMEOUT])
# Extract the 'version' part of the output
version_output_lines = session.before.splitlines()
version = version_output_lines[4].strip("'")
print("--- got version: ", version)
return version
The required information from the raw output after spliting lines (line 5 in code) was placed in the following list:
[b' ', b'', b'HOSTNAME#show version | include Version', b'Cisco IOS XE Software, Version 16.09.02', b'Cisco IOS Software [Fuji], ASR1000 Software (X86_64_LINUX_IOSD-UNIVERSALK9-M), Version 16.9.2, RELEASE SOFTWARE (fc4)', b'licensed under the GNU General Public License ("GPL") Version 2.0. The', b'software code licensed under GPL Version 2.0 is free software that comes', b'GPL code under the terms of GPL Version 2.0. For more details, see the', b'HOSTNAME#']
All I need from the above output is "Version 16.9.2", the rest needs to be stripped out.
I tried to isolate element 4 of the list (version_output_lines[4]) and use the strip function, but it keeps sending an error:
TypeError: a bytes-like object is required, not 'str'
So I'd like to know if you have a better idea on how to strip all the unwanted information from the list and only leave the Version info as mentioned before.
Thanks.
The .strip() method exists for type strings and also for type bytes. When applied to bytes it needs a bytes parameter instead of the string "'" you supplies. So use .strip(b"'").
You can also decide to have pexpect encode the bytes into strings by using, for example, encoding='utf8' as a parameter in the spawn() call.

How to run program using angr after loading with the elfcore backend?

I am attempting to write a python script using the angr binary analysis library (http://angr.io/). I have written code that successfully loads a core dump of the process I want to play with by using the ElfCore back end (http://angr.io/api-doc/cle.html#cle.backends.elf.elfcore.ELFCore) passed to the project constructor, doing something like the following:
ap = angr.Project("corefile", main_opts={'backend': 'elfcore'})
What I am wondering is, how do I now "run" the program forward from the state (registers and memory) which was defined by the core dump? For example, when I attempted to create a SimState using the above project:
ss = angr.sim_state.SimState(project=ap)
ss.regs.rip
I got back that rip was uninitialized (which it was certainly initialized in the core dump/at the point when the core dump was generated).
Thanks in advance for any help!
Alright! I figured this out. Being a total angr n00b® this may not be the best way of doing this, but since nobody offered a better way this is what I came up with.
First...
ap = angr.Project("corefile", main_opts={'backend': 'elfcore'}, rebase_granularity=0x1000)
ss = angr.factory.AngrObjectFactory(ap).blank_state()
the rebase_granularity was needed because my core file had the stack mapped high in the address range and angr refuses to map things above your main binary (my core file in this case).
From inspecting the angr source (and playing at a Python terminal) I found out that at this point, the above state will have memory all mapped out the way the core file defined it to be, but the registers are not defined appropriately yet. Therefore I needed to proceed to:
# Get the elfcore_object
elfcore_object = None
for o in ap.loader.all_objects:
if type(o) == cle.backends.elf.elfcore.ELFCore:
elfcore_object = o
break
if elfcore_object is None:
error
# Set the reg values from the elfcore_object to the sim state, realizing that not all
# of the registers will be supported (particularly some segment registers)
for regval in elfcore_object.initial_register_values():
try:
setattr(ss.regs, regval[0], regval[1])
except Exception:
warn
# get a simgr
simgr = ap.factory.simgr(ss)
Now, I was able to run forward from here using the state defined by the core dump as my starting point...
for ins in ap.factory.block(simgr.active[0].addr).capstone.insns:
print(ins)
simgr.step()
...repeat

Take mean of feature set using openSMILE audio feature extractor

My problem is taking mean of all features from different frames in one sample .wav file. I am trying cFunctionals in "chroma_fft.conf" file which belongs to latest OpenEar framework. For best explanation, i am writing these essential codes which i wrote in "chroma_fft.conf" and it is shown below;
[componentInstances:cComponentManager]
instance[functL1].type = cFunctionals
[functL1:cFunctional]
reader.dmLevel = chroma
writer.dmLevel = func
frameMode = full
frameSize=0
frameStep=0
functionalsEnabled = Means
Means.amean = 1
[csvSink:cCsvSink]
reader.dmLevel = func
..NOT-IMPORTANT......
..NOT-IMPORTANT......
However, when i run from command prompt in windows, i got error;
"(ERROR) [1] in configManager : base instance of field 'functL1.reader.dmInstance' not found in configmanager!"
Very similar code is running succesfully from "emo_large.conf" but this code got error. If any body knows how to use OpenSmile audio feature extractor, can give advice or answer why it has error and how to use "cFunctionals" properly to take mean, variance, moments etc. of large feature sets.
Thanks!
In this case you have a typo in
[functL1:cFunctional]
which should be
[functL1:cFunctionals]
I admit the error message
"(ERROR) [1] in configManager : base instance of field 'functL1.reader.dmInstance' not found in configmanager!"
is not intutive, but it refers to the fact that openSMILE expects a configuration section functL1 of type cFunctionals in the config to read the mandatory (sub-)field functL1.reader.dmInstance, which it then cannot find, because the section (due to the typo) is not defined.
Cheers,
Florian

Watir::Wait.until not working with frames

I'm automating an internal tool that is rife with frames, using Watir. I am able to manipulate all the various elements so I know I am identifying the frames correctly, but any time I attempt to use a Wait statement for any of these elements it fails. Tracking back through the error message, it always hits the activesupport gem in core_ext/time/calculations and it looks like it can't get the duration value, it gets set to false, and then the operation fails because it is expecting a Float. Is this a bug?
Using Ruby 1.8.7 and Watir 1.6.7
My code is:
require 'rubygems'
require 'watir/testcase'
require 'main_setup'
require 'win32ole'
require 'common'
class Smoketest < Watir::TestCase
include CommonCode
def test_AddEdit_Endpoint
Watir::Wait.until { #b.link(:id,"lbShowEndpointForm").exists? }
end
end
Error is the following:
test_basic_smoke(Smoketest):
TypeError: can't convert false into Float
C:/Ruby187/lib/ruby/gems/1.8/gems/activesupport-2.3.9/lib/active_support/core_ext/time/calculations.rb:278:in `plus_without_duration'
C:/Ruby187/lib/ruby/gems/1.8/gems/activesupport-2.3.9/lib/active_support/core_ext/time/calculations.rb:278:in `+'
C:/Ruby187/lib/ruby/gems/1.8/gems/commonwatir-1.6.7/lib/watir/wait.rb:15:in `until'
C:/qa/trunk/CCAdmin/Automation/CCAdmin/lib/smoketest.rb:27:in `test_basic_smoke'
So, which line is the C:/qa/trunk/CCAdmin/Automation/CCAdmin/lib/smoketest.rb:27?
I thought the correct usage for the command was wait_until, unless it's changes from Watir 1.6.5, http://wtr.rubyforge.org/rdoc/1.6.5/classes/Watir/Waiter.html

Resources