How to propery set the paratemter in CLDR so that icu collation case upper and numeric ordering both working - collation

Table 2. Comparison Levels
Level Description Examples
L1 Base characters role < roles < rule
L2 Accents role < rôle < roles
L3 Case/Variants role < Role < rôle
L4 Punctuation role < “role” < Role
Ln Identical role < ro□le < “role”
CREATE COLLATION combined (provider = icu, locale = 'en-u-kr-latn-digit-kf-upper-kn-true');
`select test_kr from icu order by test_kr collate combined;`
return
+---------+
| test_kr |
+---------+
| a 7 |
| A 11 |
| A 19 |
| A 70 |
| a 70 |
| a 117 |
| Π1 |
| 1 a |
| 8 p |
+---------+
I seems get it. kn attribute is on level1, the kf attribute case upper or lower is third level. So 'a' and 'A' will be grouped compared together, since level1 have more weight than level3.
So is there anyway to create an collation, namely properly set the language tag parameter settings so that the desired result is:
+---------+
| test_kr |
+---------+
| A 11 |
| A 19 |
| A 70 |
| a 7 |
| a 70 |
| a 117 |
| Π1 |
| 1 a |
| 8 p |
+---------+

Related

How does efficient_apriori package code itemsets?

Doing association rules mining using efficient_apriori package in python.Found a very useful answer on how to convert the output to a dataframe
However, struggling with the Itemset output and hoping someone can help me parse that correctly. Guessing LHS is an index value, but struggling with the decimal values in RHS. Does anyone know how the encoding is done? I have tried the same with SKU descriptions, and get the same output.
Input dataframe looks like this:
| SKU | Count | Percent |
|----------------------------------------------------------------------|-------|-------------|
| "('000000009100000749',)" | 110 | 0.029633621 |
| "('000000009100000749', '000000009100000776')" | 1 | 0.000269397 |
| "('000000009100000749', '000000009100000776', '000000009100002260')" | 1 | 0.000269397 |
| "('000000009100000749', '000000009100000777', '000000009100002260')" | 1 | 0.000269397 |
| "('000000009100000749', '000000009100000777', '000000009100002530')" | 1 | 0.000269397 |
Output looks like this:
| | lhs | rhs | count_full | count_lhs | count_rhs | num_transactions | confidence | support |
|---|-----------------------------|-----------------------------|------------|-----------|-----------|------------------|------------|-------------|
| 0 | "(1,)" | "(0.00026939655172413793,)" | 168 | 168 | 168 | 297 | 1 | 0.565656566 |
| 1 | "(0.00026939655172413793,)" | "(1,)" | 168 | 168 | 168 | 297 | 1 | 0.565656566 |
| 2 | "(2,)" | "(0.0005387931034482759,)" | 36 | 36 | 36 | 297 | 1 | 0.121212121 |
| 3 | "(0.0005387931034482759,)" | "(2,)" | 36 | 36 | 36 | 297 | 1 | 0.121212121 |
| 4 | "(3,)" | "(0.0008081896551724138,)" | 21 | 21 | 21 | 297 | 1 | 0.070707071 |
Could someone help me understand what's outputting in the LHS and RHS columns, and how to join that back to the 'SKU'? Ideally would like the output to have the 'SKU' instead of whatever is showing up.
I have looked at the documentation and it is quite sparse.

Split column on condition in dataframe

The data frame I am working on has a column named "Phone" and I want to split in on / or , in a way such that I get the data frame as shown below in separate columns. For example, the first row is 0674-2537100/101 and I want to split it on "/" into two columns having values as 0674-2537100 and 0674-2537101.
Input:
+-------------------------------+
| Phone |
+-------------------------------+
| 0674-2537100/101 |
| 0674-2725627 |
| 0671 – 2647509 |
| 2392229 |
| 2586198/2583361 |
| 0663-2542855/2405168 |
| 0674 – 2563832/0674-2590796 |
| 0671-6520579/3200479 |
+-------------------------------+
Output:
+-----------------------------------+
| Phone | Phone1 |
+-----------------------------------+
| 0674-2537100 | 0674-2537101 |
| 0674-2725627 | |
| 0671 – 2647509 | |
| 2392229 | |
| 2586198 | 2583361 |
| 0663-2542855 | 0663-2405168 |
| 0674 – 2563832 | 0674-2590796 |
| 0671-6520579 | 0671-3200479 |
+-----------------------------------+
Here I came up with a solution where I can take out the length of strings on both sides of the separator(/). Take out their difference. Copy the substring from the first column from character position [:difference-1] to the second column.
So far my progress is,
df['Phone'] = df['Phone'].str.replace(' ', '')
df['Phone'] = df['Phone'].str.replace('–', '-')
df[['Phone','Phone1']] = df['Phone'].str.split("/",expand=True)
df["Phone1"].fillna(value=np.nan, inplace=True)
m2 = (df["Phone1"].str.len() < 12) & (df["Phone"].str.len() > 7)
m3 = df["Phone"].str.len() - df["Phonenew"].str.len()
df.loc[m2, "Phone1"] = df["Phone"].str[:m3-1] + df["Phonenew"]
It gives an error and the column has only nan values after I run this. PLease help me out here.
Considering you're only going to have 2 '/' in the 'Phone' column. Here's what you can do:
'''
This fucntion takes in rows of a dataframe as an input and returns row with appropriate values.
'''
def split_phone_number(row):
split_str=row['Phone'].split('/')
# Considering that you're only going to have 2 or lesser values, update
# the passed row's columns with appropriate values.
if len(split_str)>1:
row['Phone']=split_str[0]
row['Phone1']=split_str[1]
else:
row['Phone']=split_str[0]
row['Phone1']=''
# Return the updated row.
return row
# Making a dummy dataframe.
d={'Phone':['0674-2537100/101','0674-257349','0671-257349','257349','257349/100','101/100','5688343/438934']}
dataFrame= pd.DataFrame(data=d)
# Considering you're only going to have one extra column. adding that column to dataframe.
dataFrame=dataFrame.assign(Phone1=['' for i in range(dataFrame.shape[0])])
# applying the split_phone_number function to dataframe.
dataFrame=dataFrame.apply(split_phone_number,axis=1)
# Prinitng dataframe.
print(dataFrame)
Input:
+---------------------+
| Phone |
+---------------------+
| 0 0674-2537100/101 |
| 1 0674-257349 |
| 2 0671-257349 |
| 3 257349 |
| 4 257349/100 |
| 5 101/100 |
| 6 5688343/438934 |
+---------------------+
Output:
+----------------------------+
| Phone Phone1 |
+----------------------------+
| 0 0674-2537100 101 |
| 1 0674-257349 |
| 2 0671-257349 |
| 3 257349 |
| 4 257349 100 |
| 5 101 100 |
| 6 5688343 438934 |
+----------------------------+
For further reading:
dataframe.apply()
Hope this helps. Cheers!

Projecting each day of the week in KQL

Within an Azure dashboard I'm wanting to create a tile which shows exceptions over the last 7 days, however the KQL below will obviously only return a data point where there has been an exception on a particular day.
How do I get it to return zero on a day when there were no exceptions?
exceptions
| where (* has 'insights')
and timestamp >= ago(7d)
| summarize Count=count()
by Date2=bin(timestamp, 1d)
| project Date=(format_datetime(Date2 , 'dd-MM-yyyy')), Count
| sort by Date
Use make_series instead of summarize.
Here's an example:
range x from 0 to 99 step 1
| where x !between(60 .. 79)
| make-series count() on x step 10
| mv-expand count_, x
| project x, count_
Output:
| x | count_ |
|----|--------|
| 0 | 10 |
| 10 | 10 |
| 20 | 10 |
| 30 | 10 |
| 40 | 10 |
| 50 | 10 |
| 60 | 0 |
| 70 | 0 |
| 80 | 10 |
| 90 | 10 |

Filter filter criteria and then apply in countif statement in Excel

I have a table of filter criteria like this:
+----------+----------+------+------+------+
| Category | SpecName | Spec | Pass | Fail |
+----------+----------+------+------+------+
| A | S1 | 3 | | |
| A | S2 | 4 | | |
| B | S1 | 5 | | |
| C | S1 | 2 | | |
+----------+----------+------+------+------+
I have a table I want to apply the filter criteria to like this:
+----------+----+----+
| Category | S1 | S2 |
+----------+----+----+
| A | 5 | 3 |
| B | 4 | |
| A | 5 | 5 |
| C | 2 | |
| A | 2 | 6 |
+----------+----+----+
I want to fill the Pass and Fail columns in the filter criteria table with a count of items in second table with values >= the corresponding spec, like so.
+----------+----------+------+------+------+
| Category | SpecName | Spec | Pass | Fail |
+----------+----------+------+------+------+
| A | S1 | 3 | 1 | 2 |
| A | S2 | 4 | 1 | 2 |
| B | S1 | 5 | 0 | 1 |
| C | S1 | 2 | 1 | 0 |
+----------+----------+------+------+------+
Here are steps for how I might do it in a scripting language:
Filter first table to get all spec filter criteria for the Category on that row, as follows for the first row.
+----------+----------+------+
| Category | SpecName | Spec |
+----------+----------+------+
| A | S1 | 3 |
| A | S2 | 4 |
+----------+----------+------+
Copy table 2 to a variable iTable
+----------+----+----+
| Category | S1 | S2 |
+----------+----+----+
| A | 5 | 3 |
| B | 4 | |
| A | 5 | 5 |
| C | 2 | |
| A | 2 | 6 |
+----------+----+----+
For each spec name:
Find column in iTable with spec name
Filter spec name column in iTable by spec
After all filters applied, we would have:
+----------+----+----+
| Category | S1 | S2 |
+----------+----+----+
| A | 5 | 5 |
+----------+----+----+
Then just count the rows in iTable and assign to the cell in Pass column of the criteria table
Is this possible with Excel formulas?
If not, does anyone know how to do it with VBA?
Looking at an alternative layout for you spec criteria. Expand you columns to suit your need.
With each spec criteria being its own column life gets really easy. You just need to adjust your formula to match the number of criteria you have.
Based on the table at the end for layout, place the following formula in D3 and copy down as required.
=SUMPRODUCT(($G$2:$G$6=A3)*($H$2:$H$6>=B3)*($I$2:$I$6>=C3))
That will give you a count of passing all criteria. Its also a function that performs array like calcs. It could be repeated in the next column but in order to reduce dependency on array calculation and potentially speed things up depending on the amount of data to check, place the following in the top of the fail column and copy down as required:
=COUNTIF($G$2:$G$6,A3)-D3
Basically it subtracts the passes from the total count. This assumes you can only have PASS and FAIL as options.

Excel: Give scores based on range, where max = 1 and min = 10

I have following problem:
I want to give scores to a range of numbers from 1-10 for example:
| | A | B |
|---|------|----|
| 1 | 1209 | 1 |
| 2 | 401 | 7 |
| 3 | 123 | 9 |
| 4 | 49 | 10 |
| 5 | 30 | 10 |
(Not sure if B is 100% correct but roughly)
I got the B values with
=ABS(CEILING(A1;MAX($A$1:$A$32)/10)*10/MAX($A$1:$A$32)-11)
It seems to work but if I for example take numbers like
| | A | B |
|---|------|----|
| 1 | 100 | 1 |
| 2 | 90 | 2 |
| 3 | 80 | 3 |
| 4 | 70 | 4 |
| 5 | 50 | 6 |
But I want 50 to be 10.
I would like to have it scalable so I can do it with a 1-10 or 1-100 or 5-27 or whatever scale and with however many numbers in the list and whatever numbers to score from.
Thanks!
Use this formula:
=$E$1 + ROUND((MIN($A:$A)-A1)/((MAX($A:$A)-MIN($A:$A))/($E$1-$E$2)),0)
It is scalable. You put the max and min in E1 and E2.

Resources