How do I convert types in J?
For example, how to I convert an array of strings like "4" "78" "0" "_1" to an array of numbers like 4 78 0 _1
Note that what you call a string is, in fact, a byte list, so a string array is just a byte array of higher dimension.
There is a J primitive to interpret a byte array as a number, which is number (dyadic ".). It's dyadic because you also have to provide a default value in case a string cannot be interpreted as a number, or if some padding has to be done.
The usage is very simple: __".'2 -3 4e 5.6 _ .7' gives 2 _3 __ 5.6 _ 0.7 (see the documentation). As per the note, this generalizes to higher dimension arrays:
__".'2 -3 4e 5.6 _ .7',:'1 7 9 2 4 1'
2 _3 __ 5.6 _ 0.7
1 7 9 2 4 1
Related
Given a dataset as follows:
id vector_name
0 1 01,02,03,04
1 2 001,002,003
2 3 01,02,03
3 4 A, B, C
4 5 s01, s02, s02
5 6 E2702-2703,E2702-2703
6 7 03,05,06
7 8 05-08,09,10-12, 05-08
How could I write a regex to filter out the string rows in column vector_name which are not composed by two digits values: the correct format should be 01, 02, 03, ... etc. Otherwise, returns invalid vector name for check column.
The expected result will be like this:
id vector_name
0 1 01,02,03,04
1 2 invalid vector name
2 3 01,02,03
3 4 invalid vector name
4 5 invalid vector name
5 6 invalid vector name
6 7 03,05,06
7 8 05-08,09,10-12, 05-08
The pattern I used: (\d+)(,\s*\d+)*, but it consider 001,002,003 as valid.
How could I do that? Thanks.
You can use
^\d{2}(?:-\d{2})?(?:,\s*\d{2}(?:-\d{2})?)*\Z
See the regex demo. Details
^ - start of string
\d{2} - two digits
(?:-\d{2})? - an optional sequence of - and two digits
(?:,\s*\d{2}(?:-\d{2})?)* - zero or more repetitions of
, - a comma
\s* - 0 or more whitespaces
\d{2}(?:-\d{2})? - two digits and an optional sequence of - and two digits
\Z - the very end of the string.
Python Pandas test:
import pandas as pd
df = pd.DataFrame({
'id':[1,2,3,4,5,6,7,8],
'vector_name':
[
'01,02,03,04',
'1002003',
'01,02,03',
'A, B, C',
's01, s02, s02',
'E2702-2703,E2702-2703',
'03,05,06',
'05-08,09,10-12, 05-08'
]
})
pattern = r'^\d{2}(?:-\d{2})?(?:,\s*\d{2}(?:-\d{2})?)*\Z'
df.loc[~df['vector_name'].str.contains(pattern), "check"] = "invalid vector name"
>>> df
id vector_name check
0 1 01,02,03,04 NaN
1 2 1002003 invalid vector name
2 3 01,02,03 NaN
3 4 A, B, C invalid vector name
4 5 s01, s02, s02 invalid vector name
5 6 E2702-2703,E2702-2703 invalid vector name
6 7 03,05,06 NaN
7 8 05-08,09,10-12, 05-08 NaN
I'm trying to convert a column in my dataframe, the name of the column is 'Weight', it has value in str format
eg: "175lbs"
I'm using a mapper to convert all these values to float
I tried lambda function using a mapper
def WeightConverter(w):
return float(w[:len(w)-3])
df1['Weight'] = df1.Weight.map(lambda x : WeightConverter(x))
df1['Weight'].head()
However, type(df1['Weight'][0]) returns str. Expected result: 175.0 of type float
Try using regex to extract your numbers first, then you can do any further operations.
df = pd.DataFrame({'Weight' : np.random.randint(0,250,size=500)})
df['Weight'] = df['Weight'].astype(str) + 'lbs'
print(df.head(10))
Weight
0 224lbs
1 11lbs
2 218lbs
3 132lbs
4 55lbs
5 87lbs
6 62lbs
7 4lbs
8 38lbs
9 218lbs
then use (\d+) with str.extract
df['Weight_Float'] = df['Weight'].str.extract(r'(\d+(?:\.\d+)?)').astype(float)
print(df.head(10))
Weight Weight_Float
0 119lbs 119.0
1 7lbs 7.0
2 241lbs 241.0
3 85lbs 85.0
4 144lbs 144.0
5 219lbs 219.0
6 160lbs 160.0
7 173lbs 173.0
8 166lbs 166.0
9 35lbs 35.0
Explanation:
( start a capture group
\d a shorthand character class, which matches all numbers; it is the
same as [0-9]
+ one or more of the expression
) end a capture group
As far as I know, it is impossible to perform array operations on numbers in J; e.g.
NB. In J, this won't work:
m =: 234
+/ m
9
*/ m
24
Since I can't do it directly, is there a way to split a number into a list and back again, like this?:
splitFunction 234
2 3 4
+/ (splitFunction 234)
9
|. (splitFunction 234)
4 3 2
concatenateFunction (4 3 2)
432
If it is not possible, is there a way to turn a number into a string, and back again? (since J treats strings as character arrays) e.g.
|. (toString 234)
432
Well, there is a little bit to unpack here in what your expectations would be. Let's start with
m=:234 NB. m is the integer 234
+/ m NB. +/ sums across the items - only item is 234
234
*/ m NB. */ product across the items - only item is 234
234
so there seems to be confusion between the digits of the integer 234, which would be 2 3 4 and the fact that 234 is an atom that has only one item which has a value of 234.
Moving on from that, you can deconstruct your integer using 10 & #. ^: _1 which consists of the inverse (^:_1) of Base (#.) with a left argument of 10 which allows the break up to be done in base 10. J's way of inverting a primitive is to use the Power conjunction (^:) raised to the negative 1 (_1)
splitFunction =: 10 & #.^:_1
concatenateFunction =: 10 & #.
splitFunction 234
2 3 4
+/ splitFunction 234
9
*/ splitFunction 234
24
|. splitFunction 234
4 3 2
concatenateFunction 2 3 4
234
concatenateFunction splitFunction 234
234
I think that this will do what you want, but you may want to spend a bit more time thinking about what you would have expected +/ 234 to do and whether this would be useful behaviour.
Is it possible to align the spaces and characters of two strings perfectly?
I have two functions, resulting in two strings.
One just adds a " " between a list of digits:
digits = 34567
new_digits = 3 4 5 6 7
The second function takes the string and prints out the index of the string, such that:
digits = 34567
index_of_digits = 1 2 3 4 5
Now the issue that I am having is when the length of the string is greater than 10, the alignment is off:
I am supposed to get something like this:
Please advice.
If your digits are in a list, you can use format to space them uniformly:
L = [3,4,2,5,6,3,6,2,5,1,4,1]
print(''.join([format(n,'3') for n in range(1,len(L)+1)]))
print(''.join([format(n,'3') for n in L]))
Or with f-string formatting (Python 3.6+):
L = [3,4,2,5,6,3,6,2,5,1,4,1]
print(''.join([f'{n+1:3}' for n in range(len(L))]))
print(''.join([f'{n:3}' for n in L]))
Output:
1 2 3 4 5 6 7 8 9 10 11 12
3 4 2 5 6 3 6 2 5 1 4 1
Ref: join, format, range, list comprehensions
The rather verbose fork I came up with is
({. , (>:#[ }. ]))
E.g.,
3 ({. , (>:#[ }. ])) 0 1 2 3 4 5
0 1 2 4 5
Works great, but is there a more idiomatic way? What is the usual way to do this in J?
Yes, the J-way is to use a 3-level boxing:
(<<<5) { i.10
0 1 2 3 4 6 7 8 9
(<<<1 3) { i.10
0 2 4 5 6 7 8 9
It's a small note in the dictionary for {:
Note that the result in the very last dyadic example, that is, (<<<_1){m , is all except the last item.
and a bit more in Learning J: Chapter 6 - Indexing: 6.2.5 Excluding Things.
Another approach is to use the monadic and dyadic forms of # (Tally and Copy). This idiom of using Copy to remove an item is something that I use frequently.
The hook (i. i.##) uses Tally (monadic #) and monadic and dyadic i. (Integers and Index of) to generate the filter string:
2 (i. i.##) 'abcde'
1 1 0 1 1
which Copy (dyadic #) uses to omit the appropriate item.
2 ((i. i.##) # ]) 0 1 2 3 4 5
0 1 3 4 5
2 ((i. i.##) # ]) 'abcde'
abde