Optical differences between characters within a string of equal length - excel

I'm having a data set with different length of string and they get concatenated into a separate column to be made equal via LEN(), TRIM() and REPT().
The formulas I used can be seen in the last row for each column (B:E).
Althought the length of the final string is equal, one can see that the strings within the "Name with equal length" column are not optically identical/ of "same" length.
As I want to use this column for making new file names via VBA, I wanted to explicitly have file names with "optically smooth names". (I hope you get what I mean.)
How can I achieve this? Do I have to calculate the pixel differences within (case-sensitive) letters? If so, how can I do this?
Text
Place
Length of String
Needed Spaces
Name with equal length
Length of Name
SaMPLE_TEXT
P 1
12
2
SaMPLE_TEXT--P 1_.pdf
22
SaMPLE_TexT
P 2
13
1
SaMPLE_TexT-P 2_.pdf
22
SaMPLE_text
P 3
13
1
SaMPLE_text-P 3_.pdf
22
sample_TEXT
P 4
12
2
sample_TEXT--P 4_.pdf
22
SaMPLE_TEXT
P 5
12
2
SaMPLE_TEXT--P 5_.pdf
22
=LEN(TRIM(B1))
=MAX($D$1:$D$6)-LEN(TRIM(B2))+1
=TRIM(A2)&REPT("-";D2)&TRIM(B2)&"_.pdf"
=LEN(E2)

Related

excel concatenate with "²" or squared symbol

I would like to use a concatenate formula in Excel that also includes the superscript number 2 (squared symbol), but this does not work with the following input. The formula is roughly like this:
=CONCATENATE("13";"²")
i want to get an output like this:
13²
I tried using superscript functions on the font
but it didn't work when the text is inside the concatenation.
I have to use concatenate because I want to automate.
Is it possible to do with Excel 2019 there? or is it possible but would require VBA?
Try
=CONCATENATE("13";CHAR(178))
Using superscript digits via udf
Superscript digits ranging from 0 to 9 have code values in different unicode sections. Thus using the following user defined function (udf) allows you to get each digit from 0 to 9 as superscript character without the need to detect & memorize each code value again and again.
The hexadecimal and decimal codes are commented in the function below:
Function sup(ByVal digit As Long)
Dim n As Long
Select Case digit
Case 0: n = &H2070 ' dec 8304
Case 1: n = &HB9 ' dec 185
Case 2: n = &HB2 ' dec 178
Case 3: n = &HB3 ' dec 179
Case Else ' <super> 4 .. 9
n = &H2074 + digit - 4 ' dec 8308 .. 8313
End Select
sup = ChrW(n)
End Function
Some examples (~~> output x³)
="x" & sup(3)
=CONCAT("x",sup(3))
=CONCAT("x",sup(Sheet1!A2)) ' assuming a digit value of 3 in Sheet1!A2
or within VBA assignments like e.g.
Sub ExampleCall()
Dim i As Long
For i = 0 To 9
Sheet1.Range("A2").Offset(i).Value = "x" & sup(i)
Next
End Sub
I also found a way to make this work without using concatenate, it just requires making a table to map to superscript numbers.
Step 1: Characters Table
Somewhere else, build a simple table with a map from "Digit" numbers to "Superscript" numbers.
In the first column, "Digit", just type something like:
-9
-8
-7
-6
-5
-4
-3
-2
-1
0
1
2
3
4
5
6
7
8
9
In the second column, "Superscript", go to Excel Ribbon Insert > Symbol. In the tab "Symbol", change the comboboxes "Font" to "(normal text)" and "Subset" to "Superscripts and Subscripts". Insert now each character in its corresponding cell be sure to use "Superscript Minus" for negative numbers. Superscripts 1, 2 and 3 are located in the subset "Latin-1 Supplement".
-9 ⁻⁹
-8 ⁻⁸
-7 ⁻⁷
-6 ⁻⁶
-5 ⁻⁵
-4 ⁻⁴
-3 ⁻³
-2 ⁻²
-1 ⁻¹
0 ⁰
1 ¹
2 ²
3 ³
4 ⁴
5 ⁵
6 ⁶
7 ⁷
8 ⁸
9 ⁹
And so on. Please notice that these are not "formatted" numbers, but special symbols with no numeric value.
Step 2: Vlook Function
Put the digit you want displayed normally in a column, set up like it is in 'A1' below, i.e.:
Cell A1: 13
Then, put the number you want to be displayed in superscript format, i.e.:
Cell B1: 3
Then, use this function to create the final output.
Cell C1: =A1&VLOOKUP(B1,Sheet2!$A$1:$B$20,2,FALSE)

regular expression using pandas string match

Input data:
name Age Zodiac Grade City pahun
0 /extract 30 Aries A Aura a_b_c
1 /abc/236466/touchbar.html 20 Leo AB Somerville c_d_e
2 Brenda4 25 Virgo B Hendersonville f_g
3 /abc/256476/mouse.html 18 Libra AA Gannon h_i_j
I am trying to extract the rows based on the regex on the name column. This regex extracts the numbers which has 6 as length.
For example:
/abc/236466/touchbar.html - 236466
Here is the code I have used
df=df[df['name'].str.match(r'\d{6}') == True]
The above line is not matching at all.
Expected:
name Age Zodiac Grade City pahun
0 /abc/236466/touchbar.html 20 Leo AB Somerville c_d_e
1 /abc/256476/mouse.html 18 Libra AA Gannon h_i_j
Can anyone tell me where am I doing wrong?
str.match only searches for a match at the start of the string.
Use str.contains with a regex like
df=df[df['name'].str.contains(r'/\d{6}/')]
to find entries containing / + 6 digits + /.
Or, to make sure you just match 6 digit chunks and not 7+ digit chunks:
df=df[df['name'].str.contains(r'(?<!\d)\d{6}(?!\d)')]
where
(?<!\d) - makes sure there is no digit on the left
\d{6} - any six digits
(?!\d) - no digit on the right is allowed.
You are almost there, use str.contains instead:
df[df['name'].str.contains(r'\d{6,}')]

Compare row with all other previous string in one column and change value of another column in Python

I have a csv file named namelist.csv, it includes:
Index String Size Name
1 AAA123000DDD 10 One
2 AAA123DDDQQQ 20 One
3 AAA123000DDD 25 One
4 AAA123D 20 One
5 ABA 15 One
6 FFFrrrSSSBBB 60 Two
7 FFFrrrSSSBBB 30 Two
8 FFFrrrSS 50 Two
9 AAA12 70 Two
I want to compare row in column String of each name group: if the string in each row is match or is substring of all above rows then remove the previous rows and sum the value of Size column to the value of subtring row.
Example: i take row 3rd: AAA123000DDD, i compare it to 2 row 1st and 2nd, it see that it is a match with 1st row, it will remove the 1st row then sum value of the 1st row column Size to the 3rd row column Size .
then the table will be like:
Index String Size Name
2 AAA123DDDQQQ 20 One
3 AAA123000DDD 35 One
4 AAA123D 20 One
...
the final result will be:
Index String Size Name
3 AAA123000DDD 35 One
4 AAA123D 40 One
5 ABA 15 One
8 FFFrrrSS 140 Two
9 AAA12 70 Two
i think of using groupby of pandas to group all Name column, but i don't know how to apply the comparison of String column and sum of Size column.
I am new to Python so any help I will very appreciate.
Assuming Name is distinct with String, here's how you would do the aggregation. I kept Name so that it also shows in the final DataFrame.
df_group = df.groupby(['String', 'Name'])['Size'].sum().reset_index()
Edit:
To match the substrings (and using the example above that it appears that a substring will not match with multiple strings), you can make a mapping of substrings to full strings and then group by the full string column as before:
all_strings = set(df['Strings'])
substring_dict = dict()
for row in df.itertuples():
for item in all_strings:
if row.String in item:
substring_dict[row.String] = item
def match_substring(x):
return substring_dict[x]
df['full_strings'] = df.String.apply(match_substring)
df_group = df.groupby(['full_strings', 'Name'])['Size'].sum().reset_index()

How to match two sets of data by dates which do not synchronise and include missing values in Excel

Please forgive any errors or shortcomings in this question, it's my first on stackoverflow.
I have two sets of data in Excel of differing lengths and frequency, and would like to be able to place a value of 0 for where they don't synchronise, and match the rest.
For example, dataset 1 could be:
Date Set1
01-01-2010 10
01-03-2010 4
01-04-2010 8
01-05-2010 5
01-06-2010 10
01-09-2010 12
01-10-2010 9
01-11-2010 4
And dataset 2 could be:
Date Set2
01-03-2010 102
01-06-2010 104
01-10-2010 102
I'm looking for an output table that displays the values alongside each other for dates matching, 0 otherwise, like so:
Date Set1 Set2
01-01-2010 10 0
01-03-2010 4 102
01-04-2010 8 0
01-05-2010 5 0
01-06-2010 10 104
01-09-2010 12 0
01-10-2010 9 102
01-11-2010 4 0
I can't seem to be able to crack this with my limited knowledge and the lack of synchronisation in the data. Any help would be much appreciated, thanks.
You can do this using a VLOOKUP nested in an IFERROR statement.
The two equations used (and dragged down to last unique date row) are:
H3 = IFERROR(VLOOKUP(G3,A:B,2,0),0)) & I3 = IFERROR(VLOOKUP(G3,D:E,2,0),0))
This will not work if you have duplicate dates in the same data set with varying values since VLOOKUP will always return the first matched value (reading top down).
Place Set1 in A1:B9 (header in row 1). Add a column of zeros next to it in column C, so A2:A9 is dates, B2:B9 is values and C2:C9 is zeros.
Place Set2 (without the header) in A10:B12; move the Set2 data to column C and put zeros in column B, so A10:A12 is dates, B10:B12 is zeros, C10:C12 is values.
Sort the range A2:C12 by Date (column A).
Easier to show with a screenshot but newbies are not allowed to post images.

Find duplicates and count numbers at the same time

I have rows of data that contains numbers from 1 to 15, however these numbers can be in any sequence. For example here:
3 2 1 12 13 5 6 7 9 15 10 8 4 15 11
I know from a visual count these numbers above are all correct; as there are no duplicates, and all the numbers have values from 1 to 15. An example of a row of data I found to be wrong:
3 2 1 12 12 5 6 7 9 15 10 8 4 15
You can see this line has duplicated numbers i.e. 12, and number 11 is missing, so this row only has 14 elements in all.
However, I have many rows of data and it is impossible to visually check each row. I need to ensure in each row: there are 15 elements; there are no duplicates, and that the row contains values from 1 to 15 and find which rows are faulty to check these against the original paper data.
Is there a command or function that I can use in Excel to make this process easier?
You could find a set of conditions, each of which is true for rows that contain exactly those 15 numbers in any order and then test several of them. For example, if the row is in A5:O5:
=AND(COUNT(A5:O5)=15,SUM(A5:O5)=120,MIN(A5:O5)=1,MAX(A5:O5)=15,
AVERAGE(A5:O5)=8,ROUND(STDEV(A5:O5),3)=4.472)
This will show TRUE for a row that contains the integers 1 to 15 in any order, and is very unlikely (it could very well be impossible - I haven't checked) to show TRUE for a row that contains any different set of integers.
I'm pretty sure that the only way 15 positive integers less than 16 can add up to 120 other than by all being different is with duplication, so :
Check there are 15 numbers
Check their total is 120
Check the maximum is 15
Check not negative (nor zero):
=IF(OR(COUNT(A5:O5)<>15,SUM(A5:O5)<>120,MAX(A5:O5)>15,MIN(A5:O5)<1),"Error","Plausible")
then check for duplication with Conditional Formatting using a rule such as :
=COUNTIF($A5:$O5,A5)>1
and a distinctive format. Filter to select "Plausible" and then anything with a distnctive format is non compliant.

Resources