redefine length.character in R - string

Since length is a generic method, why can't I do
length.character <- nchar
? It seems that strings are treated special in R. Is there a reason for that? Would you discourage defining functions like head.character and tail.character?

If you look at the help page for InternalMethods (mentioned in the details portion of the help page for length) it states that
For efficiency, internal dispatch only
occurs on objects, that
is those for which ‘is.object’ returns true.
Vectors are not objects in the same sense as other objects are, so the method dispatch is not being done on any basic vectors (not just character). if you really want to use this type of dispatch you need a defined object, e.g.:
> tmp <- state.name
> class(tmp) <- 'mynewclass'
> length.mynewclass <- nchar
> length(tmp)
[1] 7 6 7 8 10 8 11 8 7 7 6 5 8 7 4 6 8 9 5 8 13 8 9 11 8
[26] 7 8 6 13 10 10 8 14 12 4 8 6 12 12 14 12 9 5 4 7 8 10 13 9 7
>

My 2c:
Strings are not treated specially in R. If length did the same thing as nchar, then you would get unexpected results if you tried to compute length(c("foo", "bazz")). Or to put it another way, would you expect the length of a numeric vector to return the number of digits in each element of the vector or the length of the vector itself?
Also creating this method might side-effect other functions which expect the normal string behavior.

Now I found a reason not to define head.character: it changes the way how head works. For example:
head.character <- function(s,n) if(n<0) substr(s,1,nchar(s)+n) else substr(s,1,n)
test <- c("abc", "bcd", "cde")
head("abc", 2) # works fine
head(test,2)
Without the definition of head, the last line would return c("abc", "bcd"). Now, with head.character defined, this function is applied to each element of the list and returns c("ab", "bc", "cd").
But I have a strhead and a strtail function now.. :-)

Related

How to split data and assign it into designated variables?

I have data in Stata regarding the feeling of the current situation. There are seven types of feeling. The data is stored in the following format (note that the data type is a string, and one person can respond to more than 1 answer)
feeling
4,7
1,3,4
2,5,6,7
1,2,3,4,5,6,7
Since the data is a string, I tried to separate it by
split feeling, parse (,)
and I got the result
feeling1
feeling2
feeling3
feeling4
feeling5
feeling6
feeling7
4
7
1
3
4
2
5
6
7
1
2
3
4
5
6
7
However, this is not the result I want. which is that the representative number of feelings should go into the correct variable. For instance.
feeling1
feeling2
feeling3
feeling4
feeling5
feeling6
feeling7
4
7
1
3
4
2
5
6
7
1
2
3
4
5
6
7
I am not sure if there is any built-in command or function for this kind of problem. I am thinking about using forval in looping through every value in each variable and try to juggle it around into the correct variable.
A loop over the distinct values would be enough here. I give your example in a form explained in the Stata tag wiki as more helpful and then give code to get the variables you want as numeric variables.
* Example generated by -dataex-. For more info, type help dataex
clear
input str13 feeling
"4,7"
"1,3,4"
"2,5,6,7"
"1,2,3,4,5,6,7"
end
forval j = 1/7 {
gen wanted`j' = `j' if strpos(feeling, "`j'")
gen better`j' = strpos(feeling, "`j'") > 0
}
l feeling wanted1-better3
+---------------------------------------------------------------------------+
| feeling wanted1 better1 wanted2 better2 wanted3 better3 |
|---------------------------------------------------------------------------|
1. | 4,7 . 0 . 0 . 0 |
2. | 1,3,4 1 1 . 0 3 1 |
3. | 2,5,6,7 . 0 2 1 . 0 |
4. | 1,2,3,4,5,6,7 1 1 2 1 3 1 |
+---------------------------------------------------------------------------+
If you wanted a string result that would be yielded by
gen wanted`j' = "`j'" if strpos(feeling, "`j'")
Had the number of feelings been 10 or more you would have needed more careful code as for example a search for "1" would find it within "10".
Indicator (some say dummy) variables with distinct values 1 or 0 are immensely more valuable for most analysis of this kind of data.
Note Stata-related sources such as
this FAQ
this paper
and this paper.

Sort range Linux

everyone. I have some questions about sorting in bash. I am working with Ubuntu 14.04 .
The first question is: why if I have file some.txt with this content:
b 8
b 9
a 8
a 9
And when I type this :
sort -n -k 2 some.txt
the result will be:
a 8
b 8
a 9
b 9
which means that the file is sorted first to the second field and after that to the first field, but I thought that is will stay stable i.e.
b 8
a 8
...
...
Maybe if two rows are equal it is applied lexicographical sort or what ?
The second question is: why the following doesn`t working:
sort -n -k 1,2 try.txt
The file try.txt is like this:
8 2
8 11
8 0
8 5
9 2
9 0
The third question is not actally for sorting, but it appears when I try to do this:
sort blank.txt > blank.txt
After this the blank.txt file is empty. Why is that ?
Apparently GNU sort is not stable by default: add the -s option
Finally, as a last resort when all keys compare equal, sort compares entire lines as if no ordering options other than --reverse (-r) were specified. The --stable (-s) option disables this last-resort comparison so that lines in which all fields compare equal are left in their original relative order.
(https://www.gnu.org/software/coreutils/manual/html_node/sort-invocation.html)
There's no way to answer your question if you don't show the text file
Redirections are handled by the shell before handing off control to the program. The > redirection will truncate the file if it exists. After that, you are giving an empty file to sort
for #2, you don't actually explain what's not working. Expanding your sample data, this happens
$ cat try.txt
8 2
8 11
9 2
9 0
11 11
11 2
$ cat try.txt
8 2
8 11
9 2
9 0
11 11
11 2
I assume you want to know why the 2nd column is not sorted numerically. Let's go back to the sed manual:
‘-n’
‘--numeric-sort’
‘--sort=numeric’
Sort numerically. The number begins each line and consists of ...
Looks like using -n only sorts the first column numerically. After some trial and error, I found this combination that sorts each column numerically:
$ sort -k1,1n -k2,2n try.txt
8 2
8 11
9 0
9 2
11 2
11 11

DES: (Using sbox 2) to show that Two output bits from each S-box affect middle bits of the next round and the other two affect the end bits

Data Encryption Standard (DES) algorithm : (Using sbox 2) to show that Two output bits from each S-box affect middle bits of the next round and the other two affect the end bits.
The permutation table P is defined in the following table.
16 7 20 21 29 12 28 17 [END BITS]
1 15 23 26 5 18 31 10 [MIDDLE BITS]
2 8 24 14 32 27 3 9 [MIDDLE BITS]
19 13 30 6 22 11 4 25 [END BITS]
From the table above you can see that bits 7 and 6 refer to the end bits and 5 and 8 refer to the middle bits.
However am not sure if this is correct because if we consider E table the 5,6 are end bits and 7,8 affecting middle bit. What is correct ?
Don't fully understand the question but your first statement about bits 7,6,5 and 8 is true, but remember that the "cascade effect" will make all the changes made by the P-table will go to the "right side" of the equation; but at the same time these will interact in the next round in the left side!
To fully understand the process check out this link: http://www.cronos.est.pr/DES.php

How to find all Common substrings which is 3 chars or longer

Are there an efficient algorithm to search and dump all common substrings (which length is 3 or longer) between 2 strings?
Example input:
Length : 0 5 10 15 20 25 30
String 1 : ABC-DEF-GHI-JKL-ABC-ABC-STU-MWX-Y
String 2 : ABC-JKL-MNO-ABC-DEF-PQR-DEF-ZWX-Y
Example output:
In string 1 2
---------------------------
ABC-DEF 0 12
ABC-DE 0 12
BC-DEF 1 13
:
-ABC- 15,19 11
-JKL- 11 3
-DEF- 3 15
-JKL 11 3
JKL- 12 4
-DEF 3 15,23
DEF- 4 16
WX-Y 29 29
ABC- 0,16,20 0,12
-ABC 15,19 11
DEF- 4 16,24
DEF 4 16,24
ABC 0,16,20 0,12
JKL 12 4
WX- 29 29
X-Y 30 30
-AB 15,19 11
BC- 1,17,21 1,13
-DE 3 15,23
EF- 5 17,25
-JK 11 3
KL- 13 5
:
In the example, "-D", "-M" is also a common substring but is not required, because it's length is only 2. (There might be some missing outputs in example because there are so many of them...)
You can find all common substrings using a data structure called a Generalized suffix tree
Libstree contains some example code for finding the longest common substring. That example code can be modified to obtain all common substrings.

linux/shell script

I have written a program which generates parameter index for 2 variables. Say, a and b in steps of 5. like this I have to do for 23 variables. So I don't want to write 23 for-loops to run, how can I make it into a single for-loop which is common for all 23 variables. I hope it can be done with an array, but i don't know how to implement it via program.
Could you please help me?
Program:
int z, p
float a, b
float a0, an, s, a1, b0, bn, b1
str var
s=5; a0=1; an=10; b0=8; bn=13 // s= steps, a0, b0= initial value, an,bn=final value
z=0
a1=(an-a0)/s
b1=(bn-b0)/s
for (a=(a1+a0);a<=an;a=a+a1)
for (b=(b1+b0);b<=bn;b=b+b1)
echo {z} {a} {b} -format "%25s" >> /home/genesis/genesis-2.3/genesis/Scripts/kinetikit/dhanu19.txt
z=z+1
end
end
output : dhanu19.txt
0 2.8 9
1 2.8 10
2 2.8 11
3 2.8 12
4 2.8 13
5 4.6 9
6 4.6 10
7 4.6 11
8 4.6 12
9 4.6 13
10 6.4 9
11 6.4 10
12 6.4 11
13 6.4 12
14 6.4 13
15 8.2 9
16 8.2 10
17 8.2 11
18 8.2 12
19 8.2 13
20 10 9
21 10 10
22 10 11
23 10 12
24 10 13
Have you considered writing either a script or a program to write the script for you? Generating shell-scripts, then running them can sometimes be a powerful solution to problems.
Which Shell are you referring to? Declaring Arrays has some syntactical differences between zsh, bash or so...
Let's assume you write the 23 for loop.
If you have 5 steps for each loop, you will end up with 5^23 parameter !
Let's suppose each loop outputs 1 byte, you still need to store something like 10^16 bytes, or ten thousand terabytes.
I think you should reconsider your problem, or reformulate your question
Edit :
This is not a forums (and aven in forums you can edit your post).
Please edit your question instead of posting new answer, I think it is interesting

Resources