Amend with multiple indices per substitution in J - j

In J, how do you idiomatically amend an array when you have:
substitution0 multipleIndices0
...
substitutionN multipleIndicesN
(not to be confused with:
substitution0 multipartIndex0
...
substitutionN multipartIndexN
)
For example, my attempt at the classic fizzbuzz problem looks like this:
i=.3 :'<#I.(,*./)(=<.)3 5%~"(0 1)y'
d=.1+i.20
'fizz';'buzz';'fizzbuzz' (i d)};/d
|length error
| 'fizz';'buzz';'fizzbuzz' (i d)};/d
I have created the verb m} where m is i d which is 3 boxes containing different-sized lists of 1-dimensional indices, whereas I think } expects m to be boxes containing single lists that each represent a single index with dimensions at least as few as the rank of the right argument.
How is this generally solved?

'fizz';'buzz';'fizzbuzz' (i d)};/d
This has a few problems:
the x param of } is 'fizzbuzz', not a list of boxes, as the } happens before the ;s on the left. You mean
('fizz';'buzz';'fizzbuzz') (i d)};/d
the boxed numbers in the m param of } are not interpreted as you expect:
_ _ (1 1;2 2) } i.3 3
0 1 2
3 _ 5
6 7 _
_ _ (1 1;2 2) } ,i.3 3 NB. your length error
|length error
| _ _ (1 1;2 2)},i.3 3
If you raze the m param you get the right kind of indices, but still don't have enough members in the x list to go around:
_ _ (;1 3;5 7) } i.9
|length error
| _ _ (;1 3;5 7)}i.9
_ _ _ _ (;1 3;5 7) } i.9
0 _ 2 _ 4 _ 6 _ 8
These work:
NB. raze m and extend x (fixed)
i=.3 :'<#I.(,*./)(=<.)3 5%~"(0 1)y'
d=.1+i.20
((;# each i d)#'fizz';'buzz';'fizzbuzz') (;i d)};/d
+-+-+----+-+----+----+-+-+--------+----+--+----+--+--+----+--+--+--------+--+----+
|1|2|fizz|4|fizz|buzz|7|8|fizzbuzz|buzz|11|fizz|13|14|buzz|16|17|fizzbuzz|19|fizz|
+-+-+----+-+----+----+-+-+--------+----+--+----+--+--+----+--+--+--------+--+----+
NB. repeatedly mutate d
i=.3 :'<#I.(,*./)(=<.)3 5%~"(0 1)y'
d=.;/1+i.20
('fizz';'buzz';'fizzbuzz'),.(i ;d)
+--------+--------------+
|fizz |2 5 8 11 14 17|
+--------+--------------+
|buzz |4 9 14 19 |
+--------+--------------+
|fizzbuzz|14 |
+--------+--------------+
0$(3 : 'd =: ({.y) (1{::y) }d')"1 ('fizz';'buzz';'fizzbuzz'),.(i ;d)
d
+-+-+----+-+----+----+-+-+----+----+--+----+--+--+--------+--+--+----+--+----+
|1|2|fizz|4|buzz|fizz|7|8|fizz|buzz|11|fizz|13|14|fizzbuzz|16|17|fizz|19|buzz|
+-+-+----+-+----+----+-+-+----+----+--+----+--+--+--------+--+--+----+--+----+

Related

Stata test if string contains same character

I want to automatically test if the string contains only one type of character, with the result in a true/false variable "check"
input str11 contactno
"aaaaaaaaaaa"
"bbbbbbbbbbb"
"aaaaaaaaaab"
end
my attempt
gen check = .
//loop through dataset
local db =_N
forval x = 1/`db'{
dis as error "obs `x'"
//get first character in string
local f = substr(contactno, 1, 1) in `x'
//loop through each character in string
capture drop check_*
forvalues i = 1/11 {
quietly gen check_`i'=.
local j = substr(contactno, `i', 1) in `x'
//Tag characters that match
if "`j'" == "`f'" {
local y = 1
replace check_`i'= 1 in `x'
}
else {
local y= 0
replace check_`i'= 0 in `x'
}
}
Expected results the first two observations should be true and the third false.
You can achieve this in one line of code as follows:
Take the first character of contactno.
Find all instances of this character in contactno and replace with an empty string (i.e., "").
Test whether the resulting string is empty.
gen check = missing(subinstr(contactno,substr(contactno,1,1),"",.))
+---------------------+
| contactno check |
|---------------------|
1. | aaaaaaaaaaa 1 |
2. | bbbbbbbbbbb 1 |
3. | aaaaaaaaaab 0 |
+---------------------+
So we are leveraging the fact that if all characters are not equal to the first character, then the string cannot contain only one (type of) character.
Here's another way to do it.
clear
input str11 contactno
"aaaaaaaaaaa"
"bbbbbbbbbbb"
"aaaaaaaaaab"
end
gen long id = _n
save original_data, replace
expand 11
bysort id : gen character = substr(contactno, _n, 1)
bysort id (character) : gen byte OK = character[1] == character[_N]
drop character
bysort id : keep if _n == 1
merge 1:1 id using original_data
list
+-------------------------------------+
| contactno id OK _merge |
|-------------------------------------|
1. | aaaaaaaaaaa 1 1 Matched (3) |
2. | bbbbbbbbbbb 2 1 Matched (3) |
3. | aaaaaaaaaab 3 0 Matched (3) |
+-------------------------------------+

Exception: Invalid_argument "String.sub / Bytes.sub"

I wrote a tail recursive scanner for basic arithmetic expressions in OCaml
Syntax
Exp ::= n | Exp Op Exp | (Exp)
Op ::= + | - | * | /
type token =
| Tkn_NUM of int
| Tkn_OP of string
| Tkn_LPAR
| Tkn_RPAR
| Tkn_END
exception ParseError of string * string
let tail_tokenize s =
let rec tokenize_rec s pos lt =
if pos < 0 then lt
else
let c = String.sub s pos 1 in
match c with
| " " -> tokenize_rec s (pos-1) lt
| "(" -> tokenize_rec s (pos-1) (Tkn_LPAR::lt)
| ")" -> tokenize_rec s (pos-1) (Tkn_RPAR::lt)
| "+" | "-" | "*" | "/" -> tokenize_rec s (pos-1) ((Tkn_OP c)::lt)
| "0" | "1" | "2" | "3" | "4" | "5" | "6" | "7" | "8" | "9" ->
(match lt with
| (Tkn_NUM n)::lt' ->
(let lta = Tkn_NUM(int_of_string (c^(string_of_int n)))::lt' in
tokenize_rec s (pos-1) lta)
| _ -> tokenize_rec s (pos-1) (Tkn_NUM (int_of_string c)::lt)
)
|_ -> raise (ParseError ("Tokenizer","unknown symbol: "^c))
in
tokenize_rec s (String.length s) [Tkn_END]
During execution I get
tail_tokenize "3+4";;
Exception: Invalid_argument "String.sub / Bytes.sub".
Your example case is this:
tail_tokenize "3+4"
The first call will look like this:
tokenize_rec "3+4" 3 Tkn_END
Since 3 is not less than 0, the first call inside tokenize_rec will look like this:
String.sub "3+4" 3 1
If you try this yourself you'll see that it's invalid:
# String.sub "3+4" 3 1;;
Exception: Invalid_argument "String.sub / Bytes.sub".
It seems a little strange to work through the string backwards, but to do this you need to start at String.length s - 1.
From the error message it's clear that String.sub is the problem. Its arguments are s, pos and 1 with the last being a constant and the two others coming straight from the function arguments. It might be a good idea to run this in isolation with the arguments substituted for the actual values:
let s = "3+4" in
String.sub s (String.length s) 1
Doing so we again get the same error, and hopefully it's now clear why: You're trying to get a substring of length 1 from the last character, meaning it will try to go past the end of the string, which of course it can't.
Logically, you might try to subtract 1 from pos then, so that it takes a substring of length 1 starting from before the last character. But again you get the same error. That is because your terminating condition is pos < 0, which means you'll try to run String sub s (0 - 1) 1. Therefore you need to adjust the terminating condition too. But once you've done that you should be good!

How to check for differences between two spaCy Doc objects?

I have two lists of the same strings each, except for slight variations in the strings of the second list, i.e. no capitalization, spelling errors, etc.
I want to check whether or not spaCy does anything differently between the two strings. This means that even if the strings aren't equivalent, I want to know if there are differences in the tagging and parsing.
I tried the following:
import spacy
import en_core_web_sm
nlp = en_core_web_sm.load()
doc = nlp("foo")
doc2 = nlp("foo")
print(doc == doc2)
This prints False so == is not the way to go.
Ideally, I would want my code to find where potential differences are, but checking if anything at all is different would be a very helpful first step.
EDIT:
== was changed to work in newer SpaCy versions. However, it only compares the text level. For dependency, this is an entirely different story and it has not been answered for spaCy yet, apart from this thread now of course.
Token-Level Comparison
If you want to know whether the annotation is different, you'll have to go through the documents token by token to compare POS tags, dependency labels, etc. Assuming the tokenization is the same for both versions of the text, you can compare:
import spacy
nlp = spacy.load('en')
doc1 = nlp("What's wrong with my NLP?")
doc2 = nlp("What's wring wit my nlp?")
for token1, token2 in zip(doc1, doc2):
print(token1.pos_, token2.pos_, token1.pos1 == token2.pos1)
Output:
NOUN NOUN True
VERB VERB True
ADJ VERB False
ADP NOUN False
ADJ ADJ True
NOUN NOUN True
PUNCT PUNCT True
Visualization for Parse Comparison
If you want to visually inspect the differences, you might be looking for something like What's Wrong With My NLP?. If the tokenization is the same for both versions of the document, then I think you can do something like this to compare the parses:
First, you'd need to export your annotation into a supported format (some version of CoNLL for dependency parses), which is something textacy can do. (See: https://www.pydoc.io/pypi/textacy-0.4.0/autoapi/export/index.html#export.export.doc_to_conll)
from textacy import export
export.doc_to_conll(nlp('What's wrong with my NLP?'))
Output:
# sent_id 1
1 What what NOUN WP _ 2 nsubj _ SpaceAfter=No
2 's be VERB VBZ _ 0 root _ _
3 wrong wrong ADJ JJ _ 2 acomp _ _
4 with with ADP IN _ 3 prep _ _
5 my -PRON- ADJ PRP$ _ 6 poss _ _
6 NLP nlp NOUN NN _ 4 pobj _ SpaceAfter=No
7 ? ? PUNCT . _ 2 punct _ SpaceAfter=No
Then you need to decide how to modify things so you can see both versions of the token in the analysis. I'd suggest concatenating the tokens where there are variations, say:
1 What what NOUN WP _ 2 nsubj _ SpaceAfter=No
2 's be VERB VBZ _ 0 root _ _
3 wrong_wring wrong ADJ JJ _ 2 acomp _ _
4 with_wit with ADP IN _ 3 prep _ _
5 my -PRON- ADJ PRP$ _ 6 poss _ _
6 NLP_nlp nlp NOUN NN _ 4 pobj _ SpaceAfter=No
7 ? ? PUNCT . _ 2 punct _ SpaceAfter=No
vs. the annotation for What's wring wit my nlp?:
1 What what NOUN WP _ 3 nsubj _ SpaceAfter=No
2 's be VERB VBZ _ 3 aux _ _
3 wrong_wring wr VERB VBG _ 4 csubj _ _
4 with_wit wit NOUN NN _ 0 root _ _
5 my -PRON- ADJ PRP$ _ 6 poss _ _
6 NLP_nlp nlp NOUN NN _ 4 dobj _ SpaceAfter=No
7 ? ? PUNCT . _ 4 punct _ SpaceAfter=No
Then you need to convert both files to an older version of CoNLL supported by whatswrong. (The main issue is just removing the commented lines starting with #.) One existing option is the UD tools CoNLL-U to CoNLL-X converter: https://github.com/UniversalDependencies/tools/blob/master/conllu_to_conllx.pl, and then you have:
1 What what NOUN NOUN_WP _ 2 nsubj _ _
2 's be VERB VERB_VBZ _ 0 root _ _
3 wrong_wring wrong ADJ ADJ_JJ _ 2 acomp _ _
4 with_wit with ADP ADP_IN _ 3 prep _ _
5 my -PRON- ADJ ADJ_PRP$ _ 6 poss _ _
6 NLP_nlp nlp NOUN NOUN_NN _ 4 pobj _ _
7 ? ? PUNCT PUNCT_. _ 2 punct _ _
You can load these files (one as gold and one as guess) and compare them using whatswrong. Choose the format CoNLL 2006 (CoNLL 2006 is the same as CoNLL-X).
This python port of whatswrong is a little unstable, but also basically seems to work: https://github.com/ppke-nlpg/whats-wrong-python
Both of them seem to assume that we have gold POS tags, though, so that comparison isn't shown automatically. You could also concatenate the POS columns to be able to see both (just like with the tokens) since you really need the POS tags to understand why the parses are different.
For both the token pairs and the POS pairs, I think it would be easy to modify either the original implementation or the python port to show both alternatives separately in additional rows so you don't have to do the hacky concatenation.
Try using doc.similarity() function of spaCy.
For example:
import spacy
nlp = spacy.load('en_core_web_md') # make sure to use larger model!
tokens = nlp(u'dog cat banana')
for token1 in tokens:
for token2 in tokens:
print(token1.text, token2.text, token1.similarity(token2))
The result would be:
Refer from: https://spacy.io

How to fix indentation problem with haskell if statement

I have the following Haskell code:
f :: Int -> Int
f x =
let var1 = there in
case (there) of
12 -> 0
otherwise | (there - 1) >= 4 -> 2
| (there + 1) <= 2 -> 3
where there = 6
The function alone is garbage, ignore what exactly it does.
I want to replace the guards with if
f x =
let var1 = there in
case (there) of
12 -> 0
otherwise -> if (there - 1) >= 4 then 2
else if (there + 1) <= 2 then 3
where there = 6
I tried moving the if to the next line, the then to the next line, lining them up, unlining them, but nothing seems to work.
I get a parsing error and I don't know how to fix it:
parse error (possibly incorrect indentation or mismatched brackets)
|
40 | where there = 6
| ^
You have a few misunderstandings in here. Let's step through them starting from your original code:
f x =
A function definition, but the function never uses the parameter x. Strictly speaking this is a warning and not an error, but most code bases will use -Werror so consider omitting the parameter or using _ to indicate you are explicitly ignoring the variable.
let var1 = there in
This is unnecessary - again you are not using var1 (the below used there) so why have it?
case (there) of
Sure. Or just case there of, not need for excessive parens cluttering up the code.
12 -> 0
Here 12 is a pattern match, and it's fine.
otherwise ->
Here you used the variable name otherwise as a pattern which will uncondtionally match the value there. This is another warning: otherwise is a global value equal to True so it can be used in guards, such as function foo | foo < 1 = expr1 ; | otherwise = expr2. Your use is not like that, using otherwise as a pattern shadows the global value. Instead consider the catch all pattern with underscore:
_ -> if (there - 1) >= 4
then 2
else if (there + 1) <= 2
then 3
where there = 6
Ok... what if there was equal to 3? 3-1 is not greater than 4. 3+1 is not less than 2. You always need an else with your if statement. There is no if {} in Haskell instead there is if ... else ... much like the ternary operator in C, as explained in the Haskell wiki.

Creating a string from the permutations of an ArrayBuffer[String] elements in Scala

I have
val a: String = "E"
val y: ArrayBuffer[String] = new ArrayBuffer("I", "G", "S")
I am trying to make a string, such that:
"(E <=> (I | G | S)) & (~I | ~G) & (~I | ~S) & (~G | ~S)"
Currently, for the first part of string (first clause) (E <=> (I | G | S)), I have this which is functional:
s"($a <=> (${y.mkString(" | ")}))" // & (~${y.mkString(" | ~")})"
For the second part, where are the permutations of elements in y, i.e., for (~I | ~G) & (~I | ~S) & (~G | ~S), how I can improve (fix) the part within comments to create it?
I am trying to use the y.permutations, to create another string and then to concatenate with this one, but can it be "generated" here - within the same string in some way?
Thanks.
It seems from your example that what you need is combinations, not permutations.
So to have a term for every pair of elements from y you can find all combinations of length 2 using combinations method. Then you can wrap each pair in brackets in the necessary format, and finally build the whole second part with mkString:
y.combinations(2).map { case Seq(a, b) => s"(~$a | ~$b)" }.mkString(" & ")
You can integrate this expression into the string interpolation:
s"($a <=> (${y.mkString(" | ")})) & ${
y.combinations(2).map { case Seq(a, b) => s"(~$a | ~$b)" }.mkString(" & ")}"

Resources