Check field position in an Excel - excel

How to check field position in an Excel? I have to check length of an ASCii File and field positions. I have checked the length but not sure how to check the positions of field.
Example I have:
Account Number Len Institution Len Cost Center Len
830226579 9 268 3 8924 4
830168953 9 268 3 8904 4
830255130 9 268 3 8904 4
830065638 9 268 3 8924 4
830065620 9 268 3 8924 4
Thank you.

You can choose the cell boundaries with the Text Import Wizard.

Related

Python - group by + transform + substring

I'm trying to extract values from a string in a pandas data frame using other two columns as indices.
My data looks like this.
Address second_dot third_dot
0 1.273.1735.0 5 10
1 1.263.48.0 5 8
2 1.273.1341.0 5 10
3 1.273.1527.0 5 10
4 1.273.1379.0 5 10
5 1.273.1094.0 5 10
6 1.273.845.0 5 9
7 1.273.1393.0 5 10
8 1.275.988.0 5 9
9 1.273.973.0 5 9
In columns second_dot and third_dot I've stored the position of within column 'address' of the '.' characters. What I would like to do is to extract from each rows all characters between second and third dot.
The result should be like this:
Result
273
263
273
273
273
273
273
273
275
273
I've already managed to do it by using apply on axis 1 with custom function, but it takes too long (I've got millions or records in my data frame. Considering that the address are repeated over lines, I'm trying to group by the calculation, hoping to speed up.
This is my last attempt, but it does not work.
df.groupby(['Address']).transform(lambda x :
x['Address'].str[x['first_dot']:x['second_dot']])
I get error --> KeyError: ('Address', 'occurred at index MachineIdentifier').
MachineIdentifier is the first column of my df (not the index, a normal column)
Thanks a lot in advance

generate normalized discrete values for feature engineering

There is a dataframe, with one columns store the discrete values, shown as follows. I would like to create another column storing the normalized values. For instance, for 4050, the corresponding entry will be 4. Are there any efficient ways to do that instead of writing my own function? In Sklearn, are there any functions to generating normalized values?
Based on your comment:
there are around 20 different values, and the range is from 1000 to 9999, so I would like to use every 1000 as a category
This isn't really normalization in the strict sense of the word. However, to do that, you can easily use floor division (//):
df['new_column'] = df['values']//1000
For example:
>>> df
values
0 2021
1 8093
2 9870
3 4508
4 2645
5 1441
6 8888
7 8921
8 7292
9 8571
df['new_column'] = df['values']//1000
>>> df
values new_column
0 2021 2
1 8093 8
2 9870 9
3 4508 4
4 2645 2
5 1441 1
6 8888 8
7 8921 8
8 7292 7
9 8571 8

Variable string formatting in python 3

Input is a number, e.g. 9 and I want to print decimal, octal, hex and binary value from 1 to 9 like:
1 1 1 1
2 2 2 10
3 3 3 11
4 4 4 100
5 5 5 101
6 6 6 110
7 7 7 111
8 10 8 1000
9 11 9 1001
How can I achieve this in python3 using syntax like
dm, oc, hx, bn = len(str(9)), len(bin(9)[2:]), ...
print("{:dm%d} {:oc%s}" % (i, oct(i[2:]))
I mean if number is 999 so I want decimal 10 to be printed like ' 10' and binary equivalent of 999 is 1111100111 so I want 10 like ' 1010'.
You can use str.format() and its mini-language to do the whole thing for you:
for i in range(1, 10):
print("{v} {v:>6o} {v:>6x} {v:>6b}".format(v=i))
Which will print:
1 1 1 1
2 2 2 10
3 3 3 11
4 4 4 100
5 5 5 101
6 6 6 110
7 7 7 111
8 10 8 1000
9 11 9 1001
UPDATE: To define field 'widths' in a variable you can use a format-within-format structure:
w = 5 # field width, i.e. offset to the right for all octal/hex/binary values
for i in range(1, 10):
print("{v} {v:>{w}o} {v:>{w}x} {v:>{w}b}".format(v=i, w=w))
Or define a different width variable for each field type if you want them non-uniformly spaced.
Btw. since you've tagged your question with python-3.x, if you're using Python 3.6 or newer, you can use Literal String Interpolation to simplify it even more:
w = 5 # field width, i.e. offset to the right for all octal/hex/binary values
for v in range(1, 10):
print(f"{v} {v:>{w}o} {v:>{w}x} {v:>{w}b}")

Excel Rank Multiple Columns

I'm facing a issue with ranking in Excel particularly in regards to tie breaking. I tried several options but i guess they don't fit my issue. Its quite simple really, I'll explain:
The Data:
1 2 3 4 5 6 7 8 9 10
87 83 74 95 69 90 73 0 74 85
121 121 96 121 121 121 121 83 121 121
As you can see its easy for me to rank the first line (I'm working in columns instead of rows for the data). When i do a Rank Function gives the following result:
3 5 6 1 9 2 8 10 6 4
Which is correct.
The problem arises in the second line. There are ties because all of them reach the maximum of 121:
1 1 9 1 1 1 1 10 1 1
What i would like to do is take the first row as a tie breaker. So even if there is a tie the first line which was firstly text but now is a sequence from 1 to 10 could provide as secondary criteria to order the rank, thus giving the following ranking line:
1 2 9 3 4 5 6 10 7 9
Could one achieve this result?
Thank You very much in advance.
You need a helper row to break the tie. You can add a fraction of the first row to the second row to create a new row & use the new row to rank
A4 = A3+(A2/(MAX($A$2:$J$2)+1))
Using the MAX I ensure the fraction is less than 1 which is adequate to break ties in this case.
A6 = RANK(A4,$A$4:$J$4)
You can hide the helper row if you dont want to show it.

excel formula for same data but highest number

I need to know the formula for my data like this :
A B C
234 5
234 4
234 2
255 6
255 3
266 2
266 1
I want to mark same column C with 1 to those having same number in A but highest number in B. thank you.
I think this is what you are looking for
find max min value based on criteria

Resources