How to count occurrence of unknown strings in column? - string

I have another question. Thanks for everyone's help and patience with an R newbie!
How can I count how many times a string occurs in a column? Example:
MYdata <- data.frame(fruits = c("apples", "pears", "unknown_f", "unknown_f", "unknown_f"),
veggies = c("beans", "carrots", "carrots", "unknown_v", "unknown_v"),
sales = rnorm(5, 10000, 2500))
The problem is that my real data set contains several thousand rows and several hundred of the unknown fruits and unknown veggies. I played around with "table()" and "levels" but without much success. I guess it's more complicated than that. Great would be to have an output table listing the name of each unique fruit/veggie and how many times it occurs in its column. Any hint in the right direction would be much appreciated.
Thanks,
Marcus

If I understand your question, the function table() should work just fine. Here is how:
table(MYdata$fruits)
apples pears unknown_f
1 1 3
table(MYdata$veggies)
beans carrots unknown_v
1 2 2
Or use table inside lapply:
lapply(MYdata[1:2], table)
$fruits
apples pears unknown_f
1 1 3
$veggies
beans carrots unknown_v
1 2 2

The following gives you a data frame of counts which you might find easier to use or may suit your purposes better:
tabs=lapply(MYdata[-3], table)
out=data.frame(item=names(unlist(tabs)),count=unlist(tabs)[],
stringsAsFactors=FALSE)
rownames(out)=c()
print(out)
item count
1 fruits.apples 1
2 fruits.pears 1
3 fruits.unknown_f 3
4 veggies.beans 1
5 veggies.carrots 2
6 veggies.unknown_v 2

Maybe something like
summary(MYdata$fruits)

Related

substitute a value in a cell with its class, Excel

I got two Excel database. One, regarding only cities and their land, has two main columns:
database1
city land
1 1
2 1
3 2
4 2
The other database shows observations and the city where they happened.
database2
- city
observation1 4
observation2 3
observation3 1
observation4 1
I'd like to substitute each cities with the corresponding land, to get something like that:
- city land
observation1 4 2
observation2 3 2
observation3 1 1
observation4 1 1
How do you think I can achieve that?
solved with xlookup (or cerca.x in italian), I'll show you how:
landdatabase2 =XLOOKUP(citydatabase2;citydatabase1;landdatabase1)
wonderful! thanks everyone

How to swap characters around inside a column in EXCEL?

Specifically, I know ahead of time I only need to swap position 1 and 2 with 4 and 5.
2 Examples:
HEART
New output:
RTAHE
12734
New output:
34712
There is probably more than a handful of ways to do this. If you're interested in a formula, here is one way to go about it:
=RIGHT(A3,2)&MID(A3,3,LEN(A3)-4)&LEFT(A3,2)
Seems to be working on some test data I threw together.
A bit more robust, as suggested by #Rafalon:
=MID(A3,4,2)&MID(A3,3,1)&LEFT(A3,2)&MID(A3,6,LEN(A3))
Produces following results:
Input
1
12
123
1234
12345
123456
1234567
Output
1
12
312
4312
45312
453126
4531267

Dynamically sort list based off associated values with tie-breaker values

I'm trying to sort students based off frequency of participation. I have a table that is automatically generated totaling up how often a student has participated in the last few days.
I want it to do 2 things that I can't figure out.
I want it to ignore students that are at 0 removing them from the resulting rankings.
The first number is most important but I want it to reference the next value in the result of a tie.
Short example of table:
Andy - 1 1 2 3
Brad - 0 1 2 3
Cade - 1 2 3 4
Dane - 1 1 1 2
Desired result:
Cade - 1
Andy - 1
Dane - 1
The tie-breaker isn't that important and I figure I can have conditional formatting to remove children at 0, but I still can't seem to figure it out.
The closest formulas I have found in my searching are:
=INDEX($A$10:$A$9,MATCH(ROWS($C$1:C1),$C$1:$C$9,0))
This one doesn't work because it returns #N/A for pretty much all students who are tied.
=IFERROR(INDEX($C$1:$C$9,MATCH(SMALL(NOT($C$1:$C$9="")*IF(ISNUMBER($C$1:$C$9),COUNTIF($C$1:$C$9,"<="&$C$1:$C$9),COUNTIF($C$1:$C$9,"<="&$C$1:$C$9)+SUM(--ISNUMBER($C$1:$C$9))),ROWS($C$1:C1)+SUM(--ISBLANK($C$1:$C$9))),NOT($C$1:$C$9="")*IF(ISNUMBER($C$1:$C$9),COUNTIF($C$1:$C$9,"<="&$C$1:$C$9),COUNTIF($C$1:$C$9,"<="&$C$1:$C$9)+SUM(--ISNUMBER($C$1:$C$9))),0)),"")
I had this formula that can handle ties but it needs to be OFFSET but I don't know how since it is an array formula. Also, with both these formulas it reverses the ranks with the lowest values at the top. If anyone could assist me I would greatly appreciate it. I'm doing this so that I can give all students a chance to participate equally.
Use a helper column. In that column put the following formula:
=IF(B1=0,"n/a",SUMPRODUCT(B1:E1/10^(COLUMN(B1:E1)-MIN(COLUMN(B1:E1)))))
This will return a single number based on the rankings.
Then in your output column use:
=IFERROR(INDEX(A:A,MATCH(LARGE(F:F,ROW(1:1)),F:F,0)),"")
Then a simple VLOOKUP to return the first number:
=IF(I1<>"",VLOOKUP(I1,A:B,2,FALSE),"")

Count occurrences of strings just once per row in Google Sheets

I have strings of spreadsheet data that need counting by 'type' but not instance.
A B C D
1 Lin 1 2 1
2 Tom 1 4 2
3 Sue 3 1 4
The correct sum of students assigned to teacher 1 is 3, not 4. That teacher 1 meets Lin in lessons B and D is irrelevant to the count.
I borrowed a formula which works in Excel but not in Google Sheets where I and others need to keep and manipulate the data.
F5=SUMPRODUCT(SIGN(COUNTIF(OFFSET(B$2:D$2, ROW($2:$4)-1, 0), E5)))
A B C D E
2 Lin 1 2 1
3 Tom 1 4 2
4 Sue 3 1 4
5 1 [exact string being searched for, ie a teacher name]
I don't know what is not being understood by Google Sheets in that formula. Does anyone know the correct expression to use, or a more efficient way to get the accurate count I need, without duplicates within rows inflating the count?
So this is the mmult way, which works by finding the row totals of students assigned to teacher 1 etc., then seeing how many of the totals are greater than 0.
=ArrayFormula(sum(--(mmult(n(B2:D4=E5),transpose(column(B2:D4)))>0)))
or
=ArrayFormula(sum(sign(mmult(n(B2:D4=E5),transpose(column(B2:D4))))))
Also works in Excel if entered as an array formula without the ArrayFormula wrapper.
A specific Google Sheets one can be quite short
=ArrayFormula(COUNTUNIQUE((B2:D4=E5)*row(B2:D4)))-1
counting the unique rows containing a match.
Note - I am subtracting 1 in the last formula above because I am assuming there is at least one zero (non-match) which should be ignored. This would fail in the extreme case where all students in all classes are assigned to the same teacher so you have a matrix (e.g.) of all 1's. This would be more theoretically correct:
=ArrayFormula(COUNTUNIQUE(if(B2:D4=E5,row(B2:D4),"")))

COUNTING IN VBA

I have a number of clients in an excel spreadsheet (by client name), each associated with a particular item. For example
12345 1
12345 2
12345 2
23451 1
23451 3
55667 1
55667 2
89001 3
99999 1
99999 2
I need to count the number of distinctly different items for each client - in the above example, client 12345 has bought 3 items (output is 3); client 23451 has bought 2 items (output 2); client 89001 has bought one item (output 1). I'm sure it's a COUNT feature which looks to the previous column A and breaks/restarts the count if the client number changes, but I'm having a devil of a time doing it. Any help would be deeply appreciated.
Have you considered the SUM function instead? The COUNT function counts the amount of cells that are being used, the SUM adds up the integers within the cells.
Check out this link -> Here
Yes, or a better suggestion from David in the comments, the SumIf function.
Thanks David!

Resources