Stata test if string contains same character

Stata test if string contains same character - string

I want to automatically test if the string contains only one type of character, with the result in a true/false variable "check"
input str11 contactno
"aaaaaaaaaaa"
"bbbbbbbbbbb"
"aaaaaaaaaab"
end
my attempt
gen check = .
//loop through dataset
local db =_N
forval x = 1/`db'{
dis as error "obs `x'"
//get first character in string
local f = substr(contactno, 1, 1) in `x'
//loop through each character in string
capture drop check_*
forvalues i = 1/11 {
quietly gen check_`i'=.
local j = substr(contactno, `i', 1) in `x'
//Tag characters that match
if "`j'" == "`f'" {
local y = 1
replace check_`i'= 1 in `x'
}
else {
local y= 0
replace check_`i'= 0 in `x'
}
}
Expected results the first two observations should be true and the third false.

You can achieve this in one line of code as follows:
Take the first character of contactno.
Find all instances of this character in contactno and replace with an empty string (i.e., "").
Test whether the resulting string is empty.
gen check = missing(subinstr(contactno,substr(contactno,1,1),"",.))
+---------------------+
| contactno check |
|---------------------|
1. | aaaaaaaaaaa 1 |
2. | bbbbbbbbbbb 1 |
3. | aaaaaaaaaab 0 |
+---------------------+
So we are leveraging the fact that if all characters are not equal to the first character, then the string cannot contain only one (type of) character.

Here's another way to do it.
clear
input str11 contactno
"aaaaaaaaaaa"
"bbbbbbbbbbb"
"aaaaaaaaaab"
end
gen long id = _n
save original_data, replace
expand 11
bysort id : gen character = substr(contactno, _n, 1)
bysort id (character) : gen byte OK = character[1] == character[_N]
drop character
bysort id : keep if _n == 1
merge 1:1 id using original_data
list
+-------------------------------------+
| contactno id OK _merge |
|-------------------------------------|
1. | aaaaaaaaaaa 1 1 Matched (3) |
2. | bbbbbbbbbbb 2 1 Matched (3) |
3. | aaaaaaaaaab 3 0 Matched (3) |
+-------------------------------------+

Related

Exception: Invalid_argument "String.sub / Bytes.sub"

I wrote a tail recursive scanner for basic arithmetic expressions in OCaml
Syntax
Exp ::= n | Exp Op Exp | (Exp)
Op ::= + | - | * | /
type token =
| Tkn_NUM of int
| Tkn_OP of string
| Tkn_LPAR
| Tkn_RPAR
| Tkn_END
exception ParseError of string * string
let tail_tokenize s =
let rec tokenize_rec s pos lt =
if pos < 0 then lt
else
let c = String.sub s pos 1 in
match c with
| " " -> tokenize_rec s (pos-1) lt
| "(" -> tokenize_rec s (pos-1) (Tkn_LPAR::lt)
| ")" -> tokenize_rec s (pos-1) (Tkn_RPAR::lt)
| "+" | "-" | "*" | "/" -> tokenize_rec s (pos-1) ((Tkn_OP c)::lt)
| "0" | "1" | "2" | "3" | "4" | "5" | "6" | "7" | "8" | "9" ->
(match lt with
| (Tkn_NUM n)::lt' ->
(let lta = Tkn_NUM(int_of_string (c^(string_of_int n)))::lt' in
tokenize_rec s (pos-1) lta)
| _ -> tokenize_rec s (pos-1) (Tkn_NUM (int_of_string c)::lt)
)
|_ -> raise (ParseError ("Tokenizer","unknown symbol: "^c))
in
tokenize_rec s (String.length s) [Tkn_END]
During execution I get
tail_tokenize "3+4";;
Exception: Invalid_argument "String.sub / Bytes.sub".

Your example case is this:
tail_tokenize "3+4"
The first call will look like this:
tokenize_rec "3+4" 3 Tkn_END
Since 3 is not less than 0, the first call inside tokenize_rec will look like this:
String.sub "3+4" 3 1
If you try this yourself you'll see that it's invalid:
# String.sub "3+4" 3 1;;
Exception: Invalid_argument "String.sub / Bytes.sub".
It seems a little strange to work through the string backwards, but to do this you need to start at String.length s - 1.

From the error message it's clear that String.sub is the problem. Its arguments are s, pos and 1 with the last being a constant and the two others coming straight from the function arguments. It might be a good idea to run this in isolation with the arguments substituted for the actual values:
let s = "3+4" in
String.sub s (String.length s) 1
Doing so we again get the same error, and hopefully it's now clear why: You're trying to get a substring of length 1 from the last character, meaning it will try to go past the end of the string, which of course it can't.
Logically, you might try to subtract 1 from pos then, so that it takes a substring of length 1 starting from before the last character. But again you get the same error. That is because your terminating condition is pos < 0, which means you'll try to run String sub s (0 - 1) 1. Therefore you need to adjust the terminating condition too. But once you've done that you should be good!

How to search for a word in a string in FreeBASIC

I am trying to create an internal function in my FreeBASIC program where i want to check for the word "echo" in the string variable "line0" and if "echo" is part of the string, i want it to echo the input (except "echo")

BASIC's Instr function can search through a string and find out if it contains a certain substring. It can do so starting at the first position of the string or any other position if we mention the value in the parameter list. The result that Instr returns is the character position of the find or else zero to denote that the substring was not found.
It's the optional mentioning of the start position that makes all the difference in writing an algorithm that has to find all occurences of a certain substring.
Once Start = 1 Position = Instr(Start, MyString, MySubString) has found the first substring, we can move the start position to just past the find and start over again. We keep doing so until the Instr function returns zero which tells us there are no more occurences of the substring.
Echo asked: What is this echo that echoes in my ear?
^ ^ ^ ^ ^ ^ ^
1 | | | | | |
+4 5 | | | | |
26 | | | |
+4 30 | | |
36 | |
+4 40 |
0 "Not found"
A function that prints its result directly
This SkipText function expects from 1 to 3 parameters. The second and third parameters are optional because of the mention of a default value in the parameter list.
param1 is the string to search (in)
param2 is the string to search for
param3 can limit the number of removals
Declare Function SkipText (a As String, b As String = "", c As Integer = 1) As Integer
Dim As String s
s = "Echo asked: What is this echo that echoes in my ear?"
Print "The function outputs directly"
Print " Unmodified: ";
SkipText(s)
Print " Modified*1: ";
SkipText(s, "echo", 1)
Print " Modified*2: ";
SkipText(s, "echo", 2)
Print " Modified*3: ";
SkipText(s, "echo", 3)
GetKey ' So you can inspect the output
Function SkipText (a As String, b As String, c As Integer) As Integer
Dim As Integer h, i, j, k
h = IIf(c < 1, 1, c) ' Guard against bogus input
i = 1
j = 1 + Len(a) - Len(b)
Do While i <= j
k = InStr(i, a, b) ' Case-sensitive
If k = 0 Then Exit Do
Print Mid(a, i, k-i);
i = k + Len(b)
h -= 1
If h = 0 Then Exit Do
Loop
Print Mid(a, i)
Return 0
End Function
A function that returns a string that the caller can then print
Next SkipText function expects from 1 to 3 parameters. The second and third parameters are optional because of the mention of a default value in the parameter list.
param1 is the string to search (in)
param2 is the string to search for
param3 can limit the number of removals
If you tried the above code snippet, you will have seen that the first "Echo", the one that starts with a capital, was not removed. This happens because FreeBasic's Instr always works 'case-sensitive'. The simple solution to remove in a 'case-insensitive' way is to use the UCase function like in:
Position = Instr(Start, UCase(MyString), UCase(MySubString))
Declare Function SkipText (a As String, b As String = "", c As Integer = 1) As String
Dim As String s
s = "Echo asked: What is this echo that echoes in my ear?"
Print "The function returns a (reduced) string"
Print " Unmodified: "; SkipText(s)
Print " Modified*1: "; SkipText(s, "echo", 1)
Print " Modified*2: "; SkipText(s, "echo", 2)
Print " Modified*3: "; SkipText(s, "echo", 3)
GetKey ' So you can inspect the output
Function SkipText (a As String, b As String, c As Integer) As String
Dim As String t = ""
Dim As Integer h, i, j, k
h = IIf(c < 1, 1, c) ' Guard against bogus input
i = 1
j = 1 + Len(a) - Len(b)
Do While i <= j
k = InStr(i, UCase(a), UCase(b)) ' Case-insensitive
If k = 0 Then Exit Do
t = t + Mid(a, i, k-i)
i = k + Len(b)
h -= 1
If h = 0 Then Exit Do
Loop
Return t + Mid(a, i)
End Function
() Because these are tiny code snippets, I wasted no time choosing sensible identifiers. In longer programs you should always pick meaningful names for any identifiers.
() FreeBasic comes with a nice, comprehensive manual. If anything isn't clear, first consult the manual, then maybe ask a question on this forum.

did you do any kind of research? Sorry, but I assume you did not.
The answer is, there is already a function for this task builtin in Basic language.
The function you are searching for is "INSTR". Please read the available documentation for FreeBasic. If you then decide to try to write your own INSTR function (if you need a feature which is not provided by the builtin function), try to do your coding, and if you stuck, we´ll try to help.
Your described task will therefore include the following functions:
INSTR ' check if the string is here
LEN ' to know the length of your search string
MID ' to create the 'reduced' output (maybe you will to have it used twice)

Justify text, how to divide the spaces between the words?

I'm creating a new function to justify text. I know there are a few plugins but they don't do what I want to do so I decided to create a function myself.
before:
text_text_text_text_____
after:
text__text___text___text
What I did first is to find the number of words and the number of non space characters (for every line in my text):
let #e = '' | redir #e | silent exe i.'s/'.cols.'\(\S\+\)/&/gne' | redir END
if matchstr(#e, 'match') != '' | let nrwords = matchstr(#e, '\d\+\ze') | else | continue | endif
let #e = '' | redir #e | silent exe i.'s/'.cols.'\S/&/gne' | redir END
if matchstr(#e, 'match') != '' | let nonspaces = matchstr(#e, '\d\+\ze') | else | let nonspaces = 0 | endif
Then to find the spaces:
let spaces = textwidth_I_want - nonspaces
I have to divide the spaces between the words:
let SpacesBetweenWords = spaces/(str2float(nrwords)-1)
However often it is a float number.
p.e. spaces = 34
nr.words-1 = 10
SpacesBetweenWords = 3.4
What I want to do is to divide the spaces between the word, like this:
Spaces between words:
4 3 3 4 3 3 4 3 3 4
and put them in a list 'SpaceList'
and then insert them between the words
for m in range(1,len(SpaceList))
exe i 's/^'.cols.'\(\S\+\zs\s\+\ze\S\+\)\{'.m.'}/'.repeat(' ', SpaceList[m-1]).'/'
endfor
(cols = my block selection OR entire line)
It is easy to create a list with all integers p.e. in my example
3 3 3 3 3 3 3 3 3 3 (Spaces = 30)
but there are still 4 spaces to divide between the words.
My problem is "how can I divide these spaces between the number of words"?

The result is 3.4, it means, you cannot make all distances between words with same length. You have to either adjust the textwidth_I_want value to make the result an integer, or you set the distance as 3, and calculate how many spaces you still need (10*(3.4-3)=4). You can add these 4 spaces to 4 distances.

Found the solution using python
let r = float2nr(SpacesBetweenWords)
let places = nrwords-1
"the extra spaces to put between words
let extraspaces2divide = float2nr(spaces-r*places)
"find randomly with the help of python a serie of (unique) numbers for every space to divide, numbers between 1 and the number of words
exe "py import vim,random; Listvalue = random.sample(xrange(1,".(places+1).",1),".extraspaces2divide.");vim.command(\"let randomlist = '%s'\"% Listvalue)"
"the result of python is a string like this [1, 3, 6, 9]: remove the brackets and spaces and split the result to a list
let randomlist = substitute(randomlist, '[\[\] ]', '', 'g')
"this list is the serie of random numbers (= the place between words) where to add an extra space
let list = sort(split(randomlist, ','))
for s in range(1,nrwords-1)
if (index(list, ''.s.'') != -1)
let r2 = r+1
else
let r2 = r
endif
call add(SpaceList, r2)
endfor

String Matching with newline character in Haskell

Here I am trying to find the index of '-' followed by '}' in a String.
For an input like sustringIndex "abcd -} sad" it gives me an output of 10
which is giving me the entire string length.
Also if I do something like sustringIndex "abcd\n -} sad" it gives me 6
Why is that so with \n. What am I doing wrong. Please correct me I'm a noob.
substrIndex :: String -> Int
substrIndex ""=0
substrIndex (s:"") = 0
substrIndex (s:t:str)
| s== '-' && t == '}' = 0
| otherwise = 2+(substrIndex str)

Your program has a bug. You are checking every two characters. But, what if the - and } are in different pairs, for example S-}?
It will first check S and - are equal to - and } respectively.
Since they don't match, it will move on with } alone.
So, you just need to change the logic a little bit, like this
substrIndex (s:t:str)
| s == '-' && t == '}' = 0
| otherwise = 1 + (substrIndex (t:str))
Now, if the current pair doesn't match -}, then just skip the first character and proceed with the second character, substrIndex (t:str). So, if S- doesn't match, your program will proceed with -}. Since we dropped only one character we add only 1, instead of 2.
This can be shortened and written clearly, as suggested by user2407038, like this
substrIndex :: String -> Int
substrIndex [] = 0
substrIndex ('-':'}':_) = 0
substrIndex (_:xs) = 1 + substrIndex xs

Compare one string to all elements of an array

How do I get the result when '#' exist in the string will return 1 else 0. Now, I get the results of 0 0, although second string contain the character of '#'.
A = {'#'};
B = {'http://www.mathworks.com/help/matlab/ref/strcmpi.html',
'http://www.mathworks.com/help/matlab/ref/strcmpi#dfvfv.html'};
match = strcmpi(A,B)
Output:
match =
0
0
Desire Output
match =
0
1
Edit2:
why do i use the same concept as above but i get the wrong results? I want to check whether the file that store in 'data14' got 'javascript' & 'disableclick' at the same time. But the results return me all '1'.
for i = 1:4
A14 = {'javascript'};
B14 = {'disableclick'};
data14 = importdata(strcat('f14data/f14_data', int2str(i)));
feature14_data=any(cellfun(#(n) isempty(n), strfind(data14, A14{1}))) & any(cellfun(#(n) isempty(n), strfind(data14, B14{1})))
feature14(i)=feature14_data
end

This can be used to get desired output:
cellfun(#(n) ~isempty(n), strfind(B, A{1}))

You could use ismember iteratively:
cellfun(#(x)ismember('#',x), B)

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Stata test if string contains same character - string

Related

Exception: Invalid_argument "String.sub / Bytes.sub"

How to search for a word in a string in FreeBASIC

Justify text, how to divide the spaces between the words?

String Matching with newline character in Haskell

Compare one string to all elements of an array

Categories

Resources