data =: '"MARY","PATRICIA","LINDA","BARBARA","ELIZABETH"`
];._1 data
MARY
,
PATRICIA
,
LINDA
,
BARBARA
,
ELIZABETH
(one more blank line here)
So I want just skip every second line:
]`(0&#);._1 data
MARY
PATRICIA
LINDA
BARBARA
ELIZABETH
(one more blank line here)
Doesn't work as I thought.
And I encounter this situation quite often. For example when using code like this:
]`(Do nothing. Just skip. Just SKIP!!!)#.(some condition)
You could do it this way, by appending a ,',' (important that this is a list and not an atom) after removing the last LF and then using Cut ;. again this time based on the end value.
( ];._2) #: ((,','),~ }:) #: (];._1) data
MARY
PATRICIA
LINDA
BARBARA
ELIZABETH
The blank lines are not really blanks but an artifact of the shape.
$(];._2) #: ((,','),~ }:) #: (];._1) data
5 1 9
Insert Append ,/ cleans this up
,/ #: (];._2) #: ((,','),~ }:) #: (];._1) data
MARY
PATRICIA
LINDA
BARBARA
ELIZABETH
$,/ #: (];._2) #: ((,','),~ }:) #: (];._1) data
5 9
This doesn't really solve your "do nothing" issue with skipping, but there is irony in wanting to do nothing and at the same time wanting to SKIP! Skipping would be doing something, wouldn't it? That might suggest another approach such as Copy # may be better than Agenda #. but one would have to know the specific case.
if you want to skip every second element of a list, you can use (_2 {.\ ])
In other words: break the list up into pairs and get the first element of each pair.
A couple of options to filter out every 2nd element of a list:
mylist=: ];._1 data
(#~ 2 -.#| i.##) mylist NB. retain even indexed items
MARY
PATRICIA
LINDA
BARBARA
ELIZABETH
(_2 {.\ ]) mylist NB. get first item of pairs
MARY
PATRICIA
LINDA
BARBARA
ELIZABETH
A couple of options to parse this data:
> '","' splitstring }. }: data
MARY
PATRICIA
LINDA
BARBARA
ELIZABETH
require 'csv'
> (',' ; '"') fixdsv data,LF
MARY
PATRICIA
LINDA
BARBARA
ELIZABETH
Related
I have an excel sheet where I have a record containing a travel request, but I need to process this out so I can see all the combinations I need to book.
The original record entry looks like this
ID Family Father Mother Children Destinations
KT1 Smith John Joan John,Mary London,New York
and I need the final result to look like this
ID Family Father Mother Children Destinations
KT1 Smith John Joan John London
KT1 Smith John Joan Mary London
KT1 Smith John Joan John New York
KT1 Smith John Joan Mary New York
(there may be multiple entries under any of the Children and destinations , and possibly other fields which would be needed as well )
I am really unsure of how to do this and would love some advice
Use PowerQuery.
Google its usage and where to find it if you're not familiar, a quick search will produce a lot of results ...
https://www.howtoexcel.org/power-query/the-complete-guide-to-power-query/
https://support.microsoft.com/en-us/office/create-load-or-edit-a-query-in-excel-power-query-ca69e0f0-3db1-4493-900c-6279bef08df4
From there, you can transform your data.
That will achieve your outcome with little to no fuss.
I have a pandas dataframe with a name column as below
name
Dr. Maso Guilani
Paul Dupey
Mrs. Sarah Kant
Cathay Pane
Canine Paul
I want to remove strings like "Dr. , Mrs." from that "name" column
I tried as below.
df['name']=df.name.replace({"Mrs.": ""},regex=True).replace({"Dr.": ""},regex=True)
But I want to generalize this as I am not sure how many prefixes like "Dr. , Mrs." are
available in the huge dataset. Basically I want to remove all the prefix with dots. Thanks.
Expected output:
name
Maso Guilani
Paul Dupey
Sarah Kant
Cathay Pane
Canine Paul
With your shown samples, please try following. Using str.replace function of Pandas here. Simple explanation of regex would be: replacing everything from starting of value(with a lazy match) till first dot followed by 1 or more spaces with NULL in name column.
df['name'].str.replace(r'^.*?\.\s+','')
Output will be as follows.
Maso Guilani
Paul Dupey
Sarah Kant
Cathay Pane
Canine Paul
One way of doing this:
Via split() and apply() method:
df['name']=df['name'].str.split('.',1).apply(lambda x:x[1] if len(x)>1 else x[0])
Output of df:
0 Maso Guilani
1 Paul Dupey
2 Sarah Kant
3 Cathay Pane
4 Canine Paul
I want to impute following transformations in the values:
The 'Name' column to show only the titles (for ex:Miss,Mr).
The 'Cabin' column to contain only the 1st letter (for ex:'C' instead of the whole 'C54'.
Please help me with a general solution lastly for such similar problems. Thank you.(This was in a jupyter notebook and I didn't know to properly present the code)
categoric.head()
output:
Name Cabin
0 Braund, Mr. Owen Harris A23
1 Cumings, Mrs. John Bradley (Florence Briggs Th... C85
2 Heikkinen, Miss. Laina C54
3 Futrelle, Mrs. Jacques Heath (Lily May Peel) C123
4 Allen, Mr. William Henry B231
pandas has an entire set of methods related to String Handling for Series.
The cabins requires you to slice the first letter:
categoric.Cabin.str[0]
#0 A
#1 C
#2 C
#3 C
#4 B
To get the titles, you can use .str.extract, with a capturing group with all different values separated by the vertical bar. Since . has a special meaning in patterns, need to escape it by preceding it with \:
categoric.Name.str.extract('(Mr\.|Mrs\.|Miss\.)')
# 0
#0 Mr.
#1 Mrs.
#2 Miss.
#3 Mrs.
#4 Mr.
categoric.Name= categoric.Name.apply(lambda x: x.split(', ')[1].split('.')[0])
categoric.Cabin = categoric.Cabin.slice(0,1)
I've had some luck modifying formulas I've found on this site to separate names in a spreadsheet but I need some help. Can anyone suggest the best way to achieve my goal?
I have a "Tenant" column where each row contains from 1 -5 names, separated by commas and "&".
My data is pretty consistent so there is no need for error routines and it looks like these examples with the max # names being 5:
John Doe, Mary Smith, Rachel Reyes & Ben Thompson
or
John Doe & Mary Smith
or
John Doe, Mary Smith & Rachel Reyes
What I really want to do is separate each name into it's own column and then separate each first name into a another column. I would have a total of 5 columns for full names and 5 more for first names for up to 5 max names if that makes sense.
So for this data: John Doe, Mary Smith, Rachel Reyes & Ben Thompson
Column:
|A|B|C|D|E|F|G|H|I|J|
John Doe|John|Mary Smith|Mary|Rachel Reyes|Rachel|Ben Thompson|Ben|
Any help is appreciated.
I have 3 columns
a b c
jon ben 2
ben jon 2
roy jack 1
jack roy 1
I'm trying to retrieve all unique permutations e.g. ben and jon = jon and ben so they should only appear once. Expected output:
a b c
jon ben 2
roy jack 1
Any ideas of a function that could do this? The order in the output does not matter. I've tried concatenating and then removing duplicates, but obviously this only considers the string order.
I've created a fourth column by joining all three columns together =a1&","&b1&","&c1 and used excel's built in remove duplicates function. This doesnt work as the order of the strings are different.
In your forth column use the formula
=if(A1<B1,A1&","&B1&","&C1,B1&","&A1&","&C1)
Which should join A and B in alphabetical order, then you can remove duplicates as you have done.