Replace the Nth occurrence of a character on each line in VIM - vim

I have the following data and I want to replace the 3th occurrence of the | symbol with nothing.
ABC | DEF | GHI | XYZ | 123
ABC | DEF | GHI | XYZ | 123
ABC | DEF | GHI | XYZ | 123
Final output should be:
ABC | DEF | GHI XYZ | 123
ABC | DEF | GHI XYZ | 123
ABC | DEF | GHI XYZ | 123

You can run the following:
:%norm 3f|r
This means:
:%norm on every line, run the following normal commands
3f| move cursor to the 3rd occurrence of |
r replace it with a space
You could of course do:
:%norm 3f|x
To delete the | completely.
Another way would be to use visual block mode (see :help visual-block).
Although this will only work if all the | are lined up (i.e. in the same
column).

Related

Removing unwanted characters in python pandas

I have a pandas dataframe column like below :
| ColumnA |
+-------------+
| ABCD(!) |
| <DEFG>(23) |
| (MNPQ. ) |
| 32.JHGF |
| "QWERT" |
Aim is to remove the special characters and produce the output as below :
| ColumnA |
+------------+
| ABCD |
| DEFG |
| MNPQ |
| JHGF |
| QWERT |
Tried using the replace method like below, but without success :
df['ColumnA'] = df['ColumnA'].str.replace(r"[^a-zA-Z\d\_]+", "", regex=True)
print(df)
So, how can I replace the special characters using replace method in pandas?
Your solution is also for get numbers \d and _, so it remove only:
df['ColumnA'] = df['ColumnA'].str.replace(r"[^a-zA-Z]+", "")
print (df)
ColumnA
0 ABCD
1 DEFG
2 MNPQ
3 JHGF
4 QWERT
regrex should be r'[^a-zA-Z]+', it means keep only the characters that are from A to Z, a-z
import pandas as pd
# | ColumnA |
# +-------------+
# | ABCD(!) |
# | <DEFG>(23) |
# | (MNPQ. ) |
# | 32.JHGF |
# | "QWERT" |
# create a dataframe from a list
df = pd.DataFrame(['ABCD(!)', 'DEFG(23)', '(MNPQ. )', '32.JHGF', 'QWERT'], columns=['ColumnA'])
# | ColumnA |
# +------------+
# | ABCD |
# | DEFG |
# | MNPQ |
# | JHGF |
# | QWERT |
# keep only the characters that are from A to Z, a-z
df['ColumnB'] =df['ColumnA'].str.replace(r'[^a-zA-Z]+', '')
print(df['ColumnB'])
Result:
0 ABCD
1 DEFG
2 MNPQ
3 JHGF
4 QWERT
Your suggested code works fine on my installation with only extra digits so that you need to update your regex statement: r"[^a-zA-Z]+" If this doesn't work, then maybe try to update your pandas;
import pandas as pd
d = {'Column A': [' ABCD(!)', '<DEFG>(23)', '(MNPQ. )', ' 32.JHGF', '"QWERT"']}
df = pd.DataFrame(d)
df['ColumnA'] = df['ColumnA'].str.replace(r"[^a-zA-Z]+", "", regex=True)
print(df)
Output
Column A
0 ABCD
1 DEFG
2 MNPQ
3 JHGF
4 QWERT

Trim addtional whitespace between the names in PySpark

How to trim the additional spaces present between the names in PySpark dataframe?
Below is my dataframe
+----------------------+----------+
|name |account_id|
+----------------------+----------+
| abc xyz pqr | 1 |
| pqm rst | 2 |
+----------------------+----------+
Output I want
+-------------+----------+
|name |account_id|
+-------------+----------+
| abc xyz pqr | 1 |
| pqm rst | 2 |
+-------------+----------+
I tried using regex_replace, but it trims the space completely. Is there any other way to implement this ? Thanks a lot!
I tried using 'regexp_replace(,'\s+',' ')' and I got the output.
df=df.withColumn("name",regexp_replace(col("name"),'\s+',' '))
Output
+-----------+----------+
| name |account_id|
+-----------+----------+
|abc xyz pqr| 1 |
| pqm rst| 2 |
+-----------+----------+

Splitting a column with str.split while retaining original column

I'm using pandas to manipulate a CSV file. My data looks like this:
Col1 | Col2 | Street
------|----------|----------------------------
abc | 11092019 | abc,def,ghi,jkl,mno,pqr
def | 11092019 | abc,def,ghi,jkl
ghi | 11092019 | abc,def,ghi,jkl,mno
jkl | 11092019 | abc,def,ghi
mno | 11092019 | abc,def,ghi,jkl
pqr | 11092019 | abc,def
I am splitting the Street column by comma and as in the example this can return different number of columns once split.
My searching has got me to this point where I have the following code
i = df.columns.get_loc('Street')
df2 = (df['Street'].str.split(',', expand=True).rename(columns=lambda x: f"Street{x+1}"))
pd.concat([df.iloc[:, :i], df2, df.iloc[:, i+1:]], axis=1)
This would yield the following result
Col1 | Col2 | Street1 | Street2 | Street3 | Street4 | Street5 | Street6
------|----------|---------|---------|---------|---------|---------|---------
abc | 11092019 | abc | def | ghi | jkl | mno | pqr
def | 11092019 | abc | def | ghi | jkl | |
ghi | 11092019 | abc | def | ghi | jkl | mno |
jkl | 11092019 | abc | def | ghi | | |
mno | 11092019 | abc | def | ghi | jkl | |
pqr | 11092019 | abc | def | | | |
This is so close to what I want but I want to retain the original split column so the column Street in this example. I just can't figure out out to keep that original column in the output. Can someone point me in the right direction?
Thanks!

Merging rows macro vba

I've got a list of customers where the customers are repeated across multiple rows. I'd like to merge cells that are similar in Column A, but not touch anything else. If I could even format bold borders between customers, that'd be great.
Basically,
1 | abc | abc
1 | abc | def
1 | def  | xyz
2 | abc |
2 | abc | def
3 |        | xyz
4 | abc | qrs
4 | abc | def
5 | mni | xyz
To
1 | abc | abc
   | abc | def
   | def  | xyz
2 | abc |
   | abc | def
3 |        | xyz
4 | abc | qrs
   | abc | def
5 | mni | xyz
You don't have to merge cells. In fact, I recommend against it, that causes more problems than it solves. What you could do is hide column "A", then insert an empty column "B" put this formula in "B2", then auto-fill:
=IF(A2=A1,"",A2)
Also, this solution avoids macros, which can be difficult and problematic if you are new to them.
For that "customer separator" formatting that you're looking for, use conditional formatting. Here's a picture:
I got that tip from this web site: Conditional Formatting

How to search for the Nth match in a line in Vim?

I am editing a wiki file and would like to add a new column in between of two existing columns.
| *No* | *Issue* | *File* | *Status* |
| 1 | blah | foo | open |
| 2 | blah1 | foo1 | close |
Say, I want to insert a new column between the 3rd and 4th columns above. If I could search for the fourth match of the | character in a given line, I could replace that with | |. But how one can do that in Vim?
The end result would look like so:
| *No* | *Issue* | *File* | | *Status* |
| 1 | blah | foo | | open |
| 2 | blah1 | foo1 | | close |
How about recording a macro into register q by entering qq3f|a|<ESC>q in command mode (ESC means pressing the Escape key). Now you can apply this macro to each line by :%norm#q.
Additional bonus:
With this pattern you can add more complex actions, for example replicate the first column as column 3 (if cursor is at first column):
qqf yf|;;;p0q
Oh, and the answer to your question: Search 4th occurrence of | on a line is done by 3f| (if the cursor is at position 0 and on a | character as in your example).
Consider the following substitution command.
:%s/\%(.\{-}|\)\{4}\zs/ |/
:%s/\(|[^|]*\)\{3\}/&| /
Which means: on each line (%), find three occurrences (\{3\}) of a string that starts with | followed by any number of non-| ([^|]*), and replace that with itself (&) followed by |.
You can call sed in vim as a filter:
:%!sed 's/|/| |/4'

Resources