Levels - need exact format - object

How do I get such a format?
example
[1] 1 1 1 2 2 2
Levels: 1 2
where:
mode(example)
[1] "numeric"
levels(example)
[1] "1" "2"
mode(levels(example))
[1] "character"
I try to create a numeric object and attribute the values and the levels as.character, but i get something different and i need exactly this format… Still learning R basics...
Thanks!

So i managed to obtain the desired result as follows:
> class3<-as.numeric(c("1","2","3"))
> class3.f<-factor(class3)
> class3.f
[1] 1 2 3
Levels: 1 2 3
If you have any better advice feel free to post… Cheers

You can also do:
example <- c(1,1,1,2,2,2)
levels(as.factor(example))
[1] "1" "2"

Related

How to generate a Dataframe whose length equals to the product of all columns lengths?

I am looking for a quick way to generate a long dataframe. For example, the input is:
Column "color": [1,2,3] (length: 3)
Column "weekday": [0,1] (length: 2)
The expected output is:
color weekday
1 0
2 0
3 0
1 1
2 1
3 1
And this output dataframe has the length as 2*3 = 6.
Is there a quick way to generate such dataframes based on the series as the input? And it is possible that there are many columns. Thanks.

Find missing instances of a sequence

How can I find in Stata the missing instances of a sequence?
input seq
1
2
4
5
6
7
9
10
end
E.g. 3 and 8 are missing in the sequence 1 to 10.
How can they be found?
My attempt
list seq if !inrange(seq, 1,10)
However, this does not work.
Stata uses missing to mean values present in the data with a missing value code.
Here the problem is to identify values that might have been (should have been?) in the dataset, but are, to use a different word, absent.
Here are two approaches to your problem:
clear
input seq
1
2
4
5
6
7
9
10
end
numlist "1/10"
local expected `r(numlist)'
levelsof seq, local(observed)
local absent : list expected - observed
di "`absent'"
forval j = 1/10 {
quietly count if seq == `j'
if r(N) == 0 local ABSENT `ABSENT' `j'
}
di "`ABSENT'"

Understanding solution to online test

The question is in the following link:
http://www.spoj.com/problems/AEROLITE/
Input:
1 1 1 1
0 0 6 3
1 1 1 2
[and 7 test cases more]
Output:
6
57
8
[and 7 test cases more]
How does the output come from the input?
Consider the outputs corresponding to the following letters:
a. 1 1 1 1 = 6
b. 0 0 6 3 = 57
c. 1 1 1 2 = 8
Restating the definitions from the problem in a more tactical way, the 4 inputs correspond to the following:
The number of "{}" pairs
The number of "[]" pairs
The number of "()" pairs
The max depth when generating the output
The output is a single number representing the number of regular expressions that match the input parameters (how much depth can be used with the pairs) and how many combinations of the 3 pairs can be generating matching the prioritization rules that "()" cannot contain "{}" or "[]" and "[]" cannot contain "{}".
The walkthrough below shows how to arrive at the outputs, but it doesn't try to break the sub-problems or anything down. Hopefully, it will at least help you connect the numbers and start to find the problems to break down.
Taking those examples explicitly, start with "a" for 1 1 1 1 = 6:
The inputs mean that only do a depth of 1 and use 1 pair each of "{}", "[]", "()". This is a permutation how many arrangements of 3 can be made as permutations, so 3! = 6.
Actual: {}, {}()[], []{}(), {}, (){}[], ()[]{}
Then go to "b" for 1 1 1 2 = 8
This is just like "a" with exception that we must now allow for another level of depth (d = 2 instead of 1)
Therefore, this is 6 from "a" + any additional combinations of depth = 2
** Additional = {[()]}, {} (only 2 additional cases meet the rules)
"a" + (additional for d = 2) = 8
Finally, consider "b" where we are exploring only the d = 3 of 6 "()".
We must break down and add the depth (d) of 1, 2, and 3
Because only parenthesis exist here, this is just a Catalan number Cn where n = 6, but limited to a depth of no more than 3 levels of parenthesis (For more on this: https://en.wikipedia.org/wiki/Catalan_number) C(6) = 132, but once you exclude all the Catalan numbers for depths more than 3, you are left with 57 matches.
Alternatively and much more tediously, you can iterate over all the combinations of parenthesis that are depth of 3 or less to get to 57 records:
** Start with d = 1, so just ()()()()()()
** Then d = 2, so examples like (())()()()(), ()(())()()(), ()()(())()(), ()()()(())(), ()()()()(()), and so on
** Then d = 3, so examples like ((()))()()(), ()((()))()(), ()()((()))(), ()()()((())), and so on

How to convert different levels of strings into numeric responses in R?

First, I have read some similar questions. My question is very similar to those which have been already solved. But the slight difference causes some problems for me.
In my question, I have a column of data frame with five different levels of strings: "10-20%" "100+%" "21-40%" "41-70%" "71-100%". I have tried both function, as.numeric and as.integer. These two functions did change the strings into numeric responses. The problem is that I want to convert these strings by following the numerically sequence. For example, "10-20%" "100+%" "21-40%" "41-70%" "71-100%", each of the string is corresponding to the strings is 1,2,3,4,5.
But the thing I want is to "10-20%" is 1, "21-40%" is 2, "41-70%" is 3, "71-100%" is 4 and "100+%" is 5.
Do I have to change the sequence of levels of these strings Manually if I want to achieve my goal?
Appendix:
levels(dataset$PercentGrowth)
[1] "" "10-20%" "100+%" "21-40%" "41-70%" "71-100%"
head(as.integer(dataset$PercentGrowth))
[1] 1 4 3 1 3 4
head(as.numeric(dataset$PercentGrowth))
[1] 1 4 3 1 3 4
head((dataset$PercentGrowth))
[1] 21-40% 100+% 100+% 21-40%
Levels: 10-20% 100+% 21-40% 41-70% 71-100%
You should create a factor from your strings assigns the levels in the good order:
x = c("10-20%", "100+%" ,"21-40%" ,"41-70%", "71-100%")
as.integer(factor(x,levels=x))
[1] 1 2 3 4 5
as.numeric(factor(df$string.var,
levels = c("10-20%", "21-40%", "41-70%", "71-100%", "100+%"))
?factor
Sample data would help.
Edited to add levels.
You may try:
x <- c("10-20%", "100+%" ,"21-40%" ,"41-70%", "21-40%", "71-100%", "10-20%")
library(gtools)
match(x,unique(mixedsort(x)))
#[1] 1 5 2 3 2 4 1
##
as.numeric(factor(x, levels=unique(mixedsort(x))))
#[1] 1 5 2 3 2 4 1
Suppose your vector is: (Not a general solution)
x1 <- c("less than one year", "one year", "more than one year","one year", "less than one year")
?gsub2() From R: replace characters using gsub, how to create a function?
gsub2 <- function(pattern, replacement, x, ...) {
for(i in 1:length(pattern))
x <- gsub(pattern[i], replacement[i], x, ...)
x
}
x1[mixedorder(gsub2(c("less","^one","more"), c(0,1,2), x1))]
[1] "less than one year" "less than one year" "one year"
[4] "one year" "more than one year"

How to count occurrence of unknown strings in column?

I have another question. Thanks for everyone's help and patience with an R newbie!
How can I count how many times a string occurs in a column? Example:
MYdata <- data.frame(fruits = c("apples", "pears", "unknown_f", "unknown_f", "unknown_f"),
veggies = c("beans", "carrots", "carrots", "unknown_v", "unknown_v"),
sales = rnorm(5, 10000, 2500))
The problem is that my real data set contains several thousand rows and several hundred of the unknown fruits and unknown veggies. I played around with "table()" and "levels" but without much success. I guess it's more complicated than that. Great would be to have an output table listing the name of each unique fruit/veggie and how many times it occurs in its column. Any hint in the right direction would be much appreciated.
Thanks,
Marcus
If I understand your question, the function table() should work just fine. Here is how:
table(MYdata$fruits)
apples pears unknown_f
1 1 3
table(MYdata$veggies)
beans carrots unknown_v
1 2 2
Or use table inside lapply:
lapply(MYdata[1:2], table)
$fruits
apples pears unknown_f
1 1 3
$veggies
beans carrots unknown_v
1 2 2
The following gives you a data frame of counts which you might find easier to use or may suit your purposes better:
tabs=lapply(MYdata[-3], table)
out=data.frame(item=names(unlist(tabs)),count=unlist(tabs)[],
stringsAsFactors=FALSE)
rownames(out)=c()
print(out)
item count
1 fruits.apples 1
2 fruits.pears 1
3 fruits.unknown_f 3
4 veggies.beans 1
5 veggies.carrots 2
6 veggies.unknown_v 2
Maybe something like
summary(MYdata$fruits)

Resources