As I remember, the count of combinations equal n!
But, for I example I have string "abc". I want to get all the combinations with different registry: aBc or ABc etc
So, abc is 3 chars. 3! = 1 * 2 * 3 = 6.
But, if I shall try manually do this work - I'll get 8 variations:
1 abc
2 Abc
3 aBc
4 abC
5 ABc
6 aBC
7 AbC
8 ABC
So, it looks like, that answer it 2^3 = 8, but what is 2 ? 3 - is the count of registries in string. what is 2 ? count of registry variants?
If I understand correctly, you want to know for a fixed string how many possible combinations there are with respect to writing the fixed string in mixed capitalization. You are not interested in real permutations of the source string, i.e. you don't wan't to take into account that for abc there is also acb, cab, cba etc. Yes?
If so, for 1 letter we have
a A
for two letters
ab Ab aB AB
and for three letters
abc Abc aBc abC ABc aBC AbC ABC
and so on. If that's the case, then the solution is quite simple if you choose the right underlying model. As you may have noticed, the outcome is regardless of the character sequence we choose - so why not choose all a's:
a A
aa aA Aa AA
aaa aaA aAa aAA Aaa AaA AAa AAA
The pattern is that for each character we have two choices available, either uppercase or lowercase, either set or not set... either 1 or 0 - simply replace a with 0 and A with 1 to get:
0 1
00 01 10 11
000 001 010 011 100 101 110 111
That is actually binary counting! So for n letters the number of possible combinations will be equal to 2^n.
Ah, I think I see what you're saying. If I understand you correctly, you are looking to find all possible ways of capitalizing the letters in a string such that all of the letters aren't of the same case - that is, given abc, you'd produce
abC aBc aBC Abc AbC ABc
But not
abc ABC
Because all letters in those versions have the same case.
If this is what you'd like, the number of ways you can do this in a nonempty string of length n is given by 2n - 2. Intuitively, the rationale behind this is as follows. Given a string of n letters, there are 2n different ways to capitalize all of the letters in that string, since for each character independently of the rest, that letter can be in one of two states (upper case or lower case). If you consider all of those combinations, there are exactly two that you want to disallow - the version where all letters are upper-case, and the version where all letters are lower-case.
In your question, you mentioned that the number of combinations of an n-element sequence is n!. This is not quite right. There are n! permutations of an n-element sequence (assuming each element is distinct). For example, there are 3! = 6 permutations of the sequence abc:
abc acb bac bca cab cba
The fact that there are six ways to capitalize a three-letter sequence without giving all letters the same capitalization and that there are six permutations of abc is a complete coincidence. If you look at more terms of the series, you can see that they only match at two locations (2 and 3):
n = 1 2 3 4 5 6
Permutations (n!) 1 2 6 24 120 720
Mixed-case capitalizations (2^n - 2) 0 2 6 14 30 62
If you allow for more cases than just upper and lower (say, k different versions), then you can generalize this to get the value kn - k, since there are kn different combinations, of which k of them will all have the same capitalization.
Hope this helps!
Related
Given a dataset as follows:
id vector_name
0 1 01,02,03,04
1 2 001,002,003
2 3 01,02,03
3 4 A, B, C
4 5 s01, s02, s02
5 6 E2702-2703,E2702-2703
6 7 03,05,06
7 8 05-08,09,10-12, 05-08
How could I write a regex to filter out the string rows in column vector_name which are not composed by two digits values: the correct format should be 01, 02, 03, ... etc. Otherwise, returns invalid vector name for check column.
The expected result will be like this:
id vector_name
0 1 01,02,03,04
1 2 invalid vector name
2 3 01,02,03
3 4 invalid vector name
4 5 invalid vector name
5 6 invalid vector name
6 7 03,05,06
7 8 05-08,09,10-12, 05-08
The pattern I used: (\d+)(,\s*\d+)*, but it consider 001,002,003 as valid.
How could I do that? Thanks.
You can use
^\d{2}(?:-\d{2})?(?:,\s*\d{2}(?:-\d{2})?)*\Z
See the regex demo. Details
^ - start of string
\d{2} - two digits
(?:-\d{2})? - an optional sequence of - and two digits
(?:,\s*\d{2}(?:-\d{2})?)* - zero or more repetitions of
, - a comma
\s* - 0 or more whitespaces
\d{2}(?:-\d{2})? - two digits and an optional sequence of - and two digits
\Z - the very end of the string.
Python Pandas test:
import pandas as pd
df = pd.DataFrame({
'id':[1,2,3,4,5,6,7,8],
'vector_name':
[
'01,02,03,04',
'1002003',
'01,02,03',
'A, B, C',
's01, s02, s02',
'E2702-2703,E2702-2703',
'03,05,06',
'05-08,09,10-12, 05-08'
]
})
pattern = r'^\d{2}(?:-\d{2})?(?:,\s*\d{2}(?:-\d{2})?)*\Z'
df.loc[~df['vector_name'].str.contains(pattern), "check"] = "invalid vector name"
>>> df
id vector_name check
0 1 01,02,03,04 NaN
1 2 1002003 invalid vector name
2 3 01,02,03 NaN
3 4 A, B, C invalid vector name
4 5 s01, s02, s02 invalid vector name
5 6 E2702-2703,E2702-2703 invalid vector name
6 7 03,05,06 NaN
7 8 05-08,09,10-12, 05-08 NaN
As far as I know, it is impossible to perform array operations on numbers in J; e.g.
NB. In J, this won't work:
m =: 234
+/ m
9
*/ m
24
Since I can't do it directly, is there a way to split a number into a list and back again, like this?:
splitFunction 234
2 3 4
+/ (splitFunction 234)
9
|. (splitFunction 234)
4 3 2
concatenateFunction (4 3 2)
432
If it is not possible, is there a way to turn a number into a string, and back again? (since J treats strings as character arrays) e.g.
|. (toString 234)
432
Well, there is a little bit to unpack here in what your expectations would be. Let's start with
m=:234 NB. m is the integer 234
+/ m NB. +/ sums across the items - only item is 234
234
*/ m NB. */ product across the items - only item is 234
234
so there seems to be confusion between the digits of the integer 234, which would be 2 3 4 and the fact that 234 is an atom that has only one item which has a value of 234.
Moving on from that, you can deconstruct your integer using 10 & #. ^: _1 which consists of the inverse (^:_1) of Base (#.) with a left argument of 10 which allows the break up to be done in base 10. J's way of inverting a primitive is to use the Power conjunction (^:) raised to the negative 1 (_1)
splitFunction =: 10 & #.^:_1
concatenateFunction =: 10 & #.
splitFunction 234
2 3 4
+/ splitFunction 234
9
*/ splitFunction 234
24
|. splitFunction 234
4 3 2
concatenateFunction 2 3 4
234
concatenateFunction splitFunction 234
234
I think that this will do what you want, but you may want to spend a bit more time thinking about what you would have expected +/ 234 to do and whether this would be useful behaviour.
In Excel I am trying to allocate classes to pupils based on their ranking in school. The set of data I have looks like this:
S/N Name LevelPosition
1 Andrea 10
2 Bryan 25
3 Catty 5
4 Debbie 26
5 Ellie 30
6 Freddie 28
I would like to have a formula that could sort the pupils based on the LevelPosition and allocate the class in order of this sequence - A,B,C,C,B,A. Hence the result would be:
S/N Name LevelPosition AllocatedClass
3 Catty 5 A
1 Andrea 10 B
2 Bryan 25 C
4 Debbie 26 C
6 Freddie 28 B
5 Ellie 30 A
This was the sort of thing I had in mind.
Column D is just a ranking from bottom to top:-
=RANK(C2,C$2:C$7,1)
Colum D is adjusted for any ties:-
=D2+COUNTIF(D$1:D1,D2)
Column E is based on the #pnuts formula:-
=CHOOSE(MOD(E2-1,6)+1,"A","B","C","C","B","A")
I've put some ties in to show what would happen. The last two students' allocations are reversed because the second to last has the higher mark.
If i have a column and row like so
strings numbers check Total
abc 23 abc
abc 12 abb
abb 9 aba
aba 9
aba 12
I need to get the total with a forumla
So total for abc would be 23 + 12
abb would be 9
and aba would be 9+12
how would i get this with a forumla?
Use this formula:
=SUMIF(A:A,"=abc",B:B)
A more better generalized approach would be:
=SUMIF(A:A,D2,B:B)
Where A is your strings column, B is your numbers column and D is your check column. And you are inserting this formula on second row because first row is headers, you can drag it down from there and it will update the references.
The result will look like this:
strings numbers check Total
abc 23 abc 35
abc 12 abb 9
abb 9 aba 21
aba 9 0
aba 12 0
Use this in any cell but not in column A,B or D because that would give circular referencing error.
try using sumif(). you can get more info about it from help in excel
I have a table like the following
0 1 2 3
4 5 6 7
8 9 10 11
and I want to make the following structure.
┌──────┬──┐
│0 1 2│ 3│
│4 5 6│ 7│
│8 9 10│11│
└──────┴──┘
Could anyone please help me?
And in J there is always another way!
]a=. i. 3 4
0 1 2 3
4 5 6 7
8 9 10 11
('' ;1 0 0 1) <;.1 a
┌──────┬──┐
│0 1 2│ 3│
│4 5 6│ 7│
│8 9 10│11│
└──────┴──┘
This uses the dyadic cut conjunction (;.) with the general form of x u ;. n y
y is the argument that we would like to partition, x specifies where the partitions are to be put, n is positive if we would like the frets (the partition positions) included in the result and a value of 1 means that we work from left to right, and u is the verb that we would like to apply to the partition.
One tricky point:
x is ('';1 0 0 1) because we want the entire first dimension of the array (rows) after which the 1's indicate the partition start. In this case we take all the rows and make the first partition the first 3 columns, and the final 1 makes the last partition its own column.
There is much going on in this solution, and that allows it to be used in many different ways, depending on the needs of the programmer.
The title of your question ("slicing table into two parts and box it afterwards") suggests that the example you sketch may not reflect what you want to learn.
My impression is that you think of your resulting noun as a two-axis table boxed into two sections. The main problem with that interpretation is that boxes divide their contents very thoroughly. It takes special effort to make the numbers in your second box look like they've been trimmed from the structure in the first box. Such effort is rarely worthwhile.
If it is natural to need to take the 3 7 11 and remove it as a unit from the structure in which it occurs, there is an advantage to making it a row of the table, rather than a column. A 2-axis table is always a list of 1-axis lists. If your problem is a matter of segregating items, this orientation of the atoms makes it simpler to do.
Putting this into practice, here we deal with rows instead of columns:
aa=: |:i.3 4
aa
0 4 8
1 5 9
2 6 10
3 7 11
(}: ; {:) aa
+------+------+
|0 4 8|3 7 11|
|1 5 9| |
|2 6 10| |
+------+------+
The program, in parentheses, can be read literally as "curtail link tail". This is the sort of program I'd expect from the title of your question.
Part of effective J programming is orienting the data (nouns) so that they are more readily manipulated by the programs (verbs).
Here is one way:
]a=: i. 3 4
0 1 2 3
4 5 6 7
8 9 10 11
3 ({."1 ; }."1) a
┌──────┬──┐
│0 1 2│ 3│
│4 5 6│ 7│
│8 9 10│11│
└──────┴──┘
In other words "take the first 3 items in each row of a and Link (;) with the result of dropping the first 3 items in each row of a"
Other methods and/or structures may be more appropriate depending on the exact use case.