How many combinations can i get if the random word use's - combinatorics

How many combinations can i get if the random word use's
10 characters long
Numeric digits (0-9)
Uppercase letters (A-Z)
Lowercase letters (a-z)
and Each string should be unique

You could create a program to go through all the combinations. Here is an example in python:
import itertools
combos=itertools.permutations(['1','2','3','4','5','6','7','8','9','0','a','b'....])
for combo in combos: print combo
From a mathematical perspective, I believe its 62^10

Related

filter list based on string pattern of hyphen separated values

I have a list of strings with varying formats. Some of them start with a hyphen separated date time followed by a string that's a mixture of 16 numbers of letters. I would like to filter for only the strings that match this format. I've provided input and out put examples below. I'm not a regex expert, could someone please suggest a slick way to do this with python?
Input:
example_list=['2022-05-05-16-59-25-5840ZQ37F231D95W',
'wereD/22fdas/',
'mnkljlj/124kljf/oaahreljah',
'2022-09-11-16-59-25-5840XY37F231D95Z']
output:
['2022-05-05-16-59-25-5840ZQ37F231D95W',
'2022-09-11-16-59-25-5840XY37F231D95Z']
update:
using the suggestion below with re.match and list comprehension worked fine, thanks!
import re
[x for x in example_list if re.match("^\d{4}(-\d\d){5}-[A-Z\d]{16}$",x)]
Try this:
^\d{4}(-\d\d){5}-[A-Z\d]{16}$
See live demo.
Regex breakdown:
^ start of input
\d{4} 4 digits
(-\d\d){5} 5 lots of a dash then 2 digits
[A-Z\d]{16} 16 of a caps letter or a digit
$ end of input

What does it mean by length of palandrome in theory of automata? even palindrome? odd palindrome?

Palindrome is a language in automata. But i am unable to understand the following paragraph. I have calculated many things, and tried my best to estimate, but i couldnot.
Length of palindroma:
As we know that string is of length n and numbfer of symbols in the alphabet is 2, which shows that there are as many palindromes of length 2n as there are the strings of lenth n i.e. the required number of palindromes are 2^n.
A palindrome is a word that is the same read from the left as read from the right. So the first half determines completely the letters in the second half. This is why the number of palindromes of length $2n$ is equal to the number of words of length $n$ - the latter are all the possible first halfs of words of length $2n$.
Over an alphabet of two letters you have two choices for each of the $n$ positions, so there are $2^n$ distinct words of length $n$.

Generate partial strings which have predefined minimum lengths (Matlab)

I have an initial string Init={ABCDEFGH}. How can I generate 100 partial strings (randomly) from Init string which have these conditions:
A pre-defined minimum lengths.
The order of elements in each partial string should be from 'A' to 'Z'.
No repeated characters in each partial strings
The expected output should be as follows: 100 partial strings, minimum length of each partial string is 5
Output = {'BCEGH';'ACEFG';'ABCDEF';'BCFGH';'BCDEG';....;'ABEFH';'ABCEGH'}
numel(Output) = 100
To do this, I started by generating random numbers for the length of each partial string. Then I generated random numbers corresponding to each letter in each string. Then I transferred those numbers into their corresponding letters. The comments should explain the rest.
n=100 %// how many samples to take
C='ABCDEFGH' %// take samples from these letters
maxL=numel(C) %// the longest string
minL=5 %// the shortest string
len=randi([minL maxL],[n 1]) %// generate length of each partial string
arrayfun(#(l) C(randsample(1:8,l)),len,'uni',0) %// randomly sample letters to give strings of correct length
and n=4 gives, for example
ans =
'CFHABEDG'
'CFHABE'
'FAHBE'
'DGHFABE'
I'm not sure this is truly random because it assumes that there are the same number of strings of each length, but I don't think this is true. I think len should be weighted with respect to the number of strings of each length. I think (but I'm not sure) that this should fix that:
for i=1:(maxL-minL+1)
w(i)=factorial(minL-1+i)*nchoosek(maxL,minL-1+i);
end
len=minL-1+randsample(1:(maxL-minL+1),n,true,w./sum(w))

Lexicographically larger strings

I'm trying to understand the concept of lexicographically larger or smaller strings. My book gives some examples of strings that are lexicographically larger or smaller than each other and an intermediary string that is between the two in size.
string 1: a
string 2: c
intermediary string: b
string 1: aaa
string 2: zzz
intermediary string: yyy
string 1: abcdefg
string 2: abcdefh
intermediary string: (none)
I'm not sure what the requirement is for a string to be lexicographically in between the two strings. Is it that every letter of the intermediary string has to have a larger ASCII value than the corresponding letter of the first string and smaller ASCII value of the corresponding letter of the second string?
For example, "bcdefg" is the intermediary string between "abcdef" and "cdefgh". Can "stuvx" be the intermediary between "stuvw" and "stuvy"?
Lexicographical ordering simply means dictionary ordering. I say "simply" but there may actually be all sorts of wonderful edge cases such as how you treat apostrophes, what you do with diphthongs, whether you "fold" accented letters into the unaccented ones, such as transforming {À,Á,Â,Ã,Ä} -> A. All these rules on how you collate letters will affect the ordering of words as well.
English is fairly easy if you restrict yourself to the twenty-six actual letters of the alphabet. You can consider a word to be "lesser" than another word if, in the first character position that is different between the two, the character from the first word comes before that of the second.
And, in fact, there is a solution to the third option provided it doesn't have to be the same length as the others, that of:
string 1: abcdefg
string 2: abcdefh
intermediary string: abcdefga

Space-efficient way to encode numbers as sortable strings

Starting with a list of integers the task is to convert each integer into a string such that the resulting list of strings will be in numeric order when sorted lexicographically.
This is needed so that a particular system that is only capable of sorting strings will produce an output that is in numeric order.
Example:
Given the integers
1, 23, 3
we could convert the to strings like this:
"01", "23", "03"
so that when sorted they become:
"01", "03", "23"
which is correct. A wrong result would be:
"1", "23", "3"
because that list is sorted in "string order", not in numeric order.
I'm looking for something more efficient than the simple zero-padding scheme. In order to cover all possible 32 bit integers we'd need to pad to 10 digits which is inefficient.
For integers, prefix each number with the length. To make it more readable, use 'a' for length 1, and 'b' for length 2. Example:
non-encoded encoded
1 "a1"
3 "a3"
23 "b23"
This scheme is a bit simpler than prefixing each digit, but only works with numbers, not numbers mixed with text. It can be made to work for negative numbers as well, and even BigDecimal numbers, using some tricks. I wrote an implementation in Apache Jackrabbit 2.x, to make BigDecimal indexable (sortable) as text. For that, I used a format that only uses the characters '0' to '9' and consists of:
one character for: signum(value) + 2
one character for: signum(exponent) + 2
one character for: length(exponent) - 1
multiple characters for: exponent
multiple characters for: value (-1 if inverted)
Only the signum is encoded if the value is zero. The exponent is not encoded if zero. Negative values are "inverted" character by character (0 => 9, 1 => 8, and so on). The same applies to the exponent.
Examples:
non-encoded encoded
0 "2"
2 "322" (signum 1; exponent 0; value 2)
120 "330212" (signum 1; exponent signum 1, length 1, value 2; value 12)
-1 "179" (signum -1, rest inverted; exponent 0; value 1 (-1, inverted))
Values between BigDecimal(BigInteger.ONE, Integer.MIN_VALUE) and BigDecimal(BigInteger.ONE, Integer.MAX_VALUE) are supported.
TL;DR
Encode digits according to their order of magnitude (OM) and other characters so they sort as desired, relative to numbers: jj-a123 would be encoded zjzjz-zaC1B2A3
Longer explanation
This would depend somewhat upon the sorting algorithm that will finally be used to sort and how one would want any given punctuation characters to be sorted in relation to letters and numbers, but if it's "ascii-betical" or similar, you could encode each digit of a number to represent its order of magnitude (OM) in the number, while encoding other characters such that they would sort according to your desired sort order.
For simplicity, I would suggest beginning with encoding every non-numeric character with a "high" value (e.g. lower case z or even ~ if final value is ASCII), so that it sorts after encoded digits. Then cache each digit encountered until another non-numeric is encountered, then encode each cached digit with a value representing its OM. If the number 12945 was encountered in between non-numerics, you would output an E to encode an OM of 5, then the digit that is that order of magnitude, 1, followed by the next OM of 4 (D) and its associated digit, 2. Continue until all numeric digits have been flushed, then continue with non-numerics.
Non-numerics would be treated individually and ranked relative to the OM of digits. If it is desired for them to sort "above" numbers (perhaps the space character or certain others deemed special) they would be encoded by prepending a low-value character (like the space character, if final value will be treated and sorted as ASCII). When/if another numeric is encountered, begin caching and encode according to OM once all consecutive numerics are cached.
Alternately, processing the string in reverse order would preclude the need to cache numbers except for a single "is it a digit?" test and "is the last character a digit?" test. If the first is not true, then use (one of?) the "non-digit" OM character(s). If the first test is true then use the lowest-OM "digit" character (A in my examples). If both tests are true, then increment your OM character (A -> B or E -> F) before use.
Certain levels of additional filtering - or even translation - could be applied. If one wanted to allow accurate sorting based upon Roman numerals, one could encode them as decimal (or even hexadecimal) numbers with an appropriate OM.
Treating decimal points (either periods or commas, depending) as actual decimal separators, and distinct from other punctuation would probably be beyond the true utility of this encoding scheme, as alphanumeric fields seldom use a period or comma as a decimal separator. If it is desired to use them that way, the algorithm would simply detect a decimal separator (either period or comma as appropriate, in between digits) and not encode the numeric portion after that separator as anything but normal text. Fractional portions are actually sorted correctly during a normal ASCII based sort, because more digits represents greater precision - not greater magnitude.
Examples
non-encoded encoded
----------- -------
12345 E1D2C3B4A5
a100 zaC1B0A0
a20 zaB2A0
a2000 zaD2C0B0A0
x100.5 zxC1B0A0z.A5
x100.23 zxC1B0A0z.B2A3
1, 23, 3 A1z,z B2A1z,z A3
1, 2, 3 A1z,z A2z,z A3
1,2,3 A1z,A2z,A3
Potential advantages
Going somewhat beyond simple numeric sorting, some advantages to this encoding method would be several aspects of flexibility with final effective sort order - you are essentially encoding a category for each character - digits get a category based upon their position within the greater string of digits known as a number, while other characters are simply told to sort in their normal way (e.g. ASCII), but after numbers. Any exceptions that should sort before numbers or in other orders would be in one or more additional categories. ASCII can effectively be re-encoded to sort in a non-ASCII way:
You could encode lower case letters to sort before or along with upper case letters. To switch the lower and upper cases, you encode lower case letters with a y and upper case letters with a z. For a pseudo-case-insensitive sort, categorizing both A and a with the same encoding character would sort both of them before B and b, though A would nonetheless always sort before a
If you want Extended ASCII characters (e.g. with diacritics) to sort along with their ASCII cousins, you encode À, Á, Â, Ã, Ä, Å, and Æ along with A by using an a as the OM character, encode B, C, and Ç with a b, and E, È, É, Ê, and Ë with a c, etc. The same intra-category sort order caveat still applies, and some decisions need to be made on characters like capital Eth, and to a certain extent others like Thorn, and Sharp S (Ð, Þ, and ß respectively) as to whether they will sort based on similarities in appearance or pronunciation, or instead more properly perhaps, alphabetical order.
Small advantage of being basically human-readable, with effort
Caveats
Though this allows many 'categories' of characters to be defined, be sure to remember that each order of magnitude for digits is its own category - you need to know that the data will not contain numbers that are greater in OM than approximately 250, depending upon how many other categories you wish to define (ASCII 0 is reserved for storing strings, and there needs to be at least one other character to indicate "not a digit" - at least for alphanumeric data - making the maximum perhaps 254 orders of magnitude), but that should be plenty for any situation I can imagine. I'm not sure what other issues quantum computing will bring about, but there's probably a quantum solution to it, whatever it is.
Finally, if the hyphen is encoded as a non-numeric character, and all non-numerics are encoded with a higher OM than digits, negative numbers would be encoded as greater than any positive number. The hyphen should be encoded as a lower-than-digit-OM (perhaps only when preceding a digit) if negative numbers need to be sorted correctly according to magnitude.
Since the ASCII code of A is greater than 9, you could encode them as hexadecimal strings.
The integers
1, 23, 3
can be encoded as
00000001, 00000017, 00000003
and 32-bit integers can always be encoded as 8-character strings. (assume unsigned)

Resources