regarding counting the number of columns in gvim - linux

After opening a big csv file in gvim, how can I know how many columns are within this file?

The csv.vim plugin provides a lot of functionality to work with CSV data. It includes a :NrColumns command.

A quick dirty hack would be to do something like:
:s/,//gn
Which would give you the number of commas on a single row. Add one and you have your number of columns (assuming no trailing comma, of course).
I say this is quick and dirty because it doesn't take into account quoted columns which can contain commas. I'm sure there might be a way to take that into account with a regex but it's probably not trivial.

Related

How to search for items with multiple "-" in excel or VBA?

I have a list of item numbers (100K) like this:
Some of the items have format like SAG571A-244-4 (thousands) which need to be filtered so I can delete them and only keep the items that have ONE hyphen per SKU. How can I isolate the items that have two instances of "-" in it's SKU? I'm open to solutions within Excel or using VBA as well.
Native text filters don't seem to be capable of this. I'm stumped.
As per John Coleman's comment, "*-*-*" can be used to isolate strings that have at least two dashes in them.
I would add that if you're entering them as a custom text filter, you should lose the double quotes (so just *-*-*) as otherwise the field seems to interpret the quotes literally.
Seems to work for me.
If you want just an excel formula to verify this and give you a result of the number of hyphens (0, 1, or 2+), here is one:
=IF(ISERROR(SEARCH("-",A1)),"0",IF(ISERROR(SEARCH("-",A1,IFERROR(SEARCH("-",A1)+1,LEN(A1)))),"1","2+"))
Replace A1 with your relevant column, then fill down. This is kind of a terrible way to do this performance wise, but you avoid using VBA and possibly xlsm files.
The code first checks to see if there is one hyphen, then if there is it checks to see if there is another hyphen after the position the first one was found. Looking for multiple hyphens in this manner is cumbersome and I don't recommend it.

excel vba Delete entire row if cell contains the GREP search

I have a single column of text in Excel that is to be used for translating into foreign languages. The text is automatically generated from an InDesign File. I would like to clean it up for the translator by removing rows that simply contain a number ("20", 34.5" etc), or if they contain a measurement "5mm", "3.5 µm", etc. I've found many posts (see link below) on how to remove a row with specific string, but none that use search strings, such as those I typically use with GREP searches: "\d+" and "\d.\d µm"
How would I do this? I am on Mac iOS if that helps.
Note that I would need to delete the row if the cell only contains a number or a measurement, not if the number is contained within a phrase, sentence, or paragraph, etc.
https://stackoverflow.com/a/30569969
It may not be what you are looking for, but how about just sorting the column and remove the rows starting with numbers? It is a manual approach but from what I understand this translation process only happens from time to time. Am I right?
I see two possible issues in your question:
How to work with regular expressions in Excel?
How to delete rows in a loop?
Let me start with the second question: when you want to create a for-loop in order to remove items from a list, you MUST start at the end and go back to the beginning (it's a beginner's trick, but a lot of people trip over it.
About the first question: this is a very useful post about this subject, it's too large to even give a summary here.

Excel conditional formating based on the multiple cells and values

I am trying to implement various conditional formatting to a specific data base. Looked for answer around here but can not find anything similar. Might not be possible but it is worth a try.
I am preforming various data cleansing and validation.
Here is the case: (small sample, working with 100k data entries in this particular file)
Ultimately what I want is the formula that will compare the low-level Description characters after the last "UNDERSCORE" to the characters after last "UNDERSCORE" of the higher level(highlighted). If it does not match then highlight the cell?
Asking for too much, yes, no, maybe? I am open to any other suggestions on how can I perform various data cleaning and validation!
Thank you!
If you must use the last "UNDERSCORE" character, and can't depend on the suffixes being four characters, the formula becomes quite complex. For simplicity's sake, I assumed the higher level is always missing the last five characters of the lower level, if you must go by the last "DASH" character, then this will be a lot longer.
Use this formula to highlight the cells, defining the two names LEVELS and DESCRS to be the two columns:
=IFNA(MID(B2,FIND("[]",SUBSTITUTE(B2,"_","[]",LEN(B2)-LEN(SUBSTITUTE(B2,"_",""))))+1,999)<>MID(INDEX(DESCRS,MATCH(LEFT(A2,LEN(A2)-5),LEVELS,0),1),FIND("[]",SUBSTITUTE(INDEX(DESCRS,MATCH(LEFT(A2,LEN(A2)-5),LEVELS,0),1),"_","[]",LEN(INDEX(DESCRS,MATCH(LEFT(A2,LEN(A2)-5),LEVELS,0),1))-LEN(SUBSTITUTE(INDEX(DESCRS,MATCH(LEFT(A2,LEN(A2)-5),LEVELS,0),1),"_",""))))+1,999),FALSE)
This uses a very nice trick with SUBSTITUTE to find the last occurrence of a character.
BTW, I would probably write a Perl program to parse the data and find errors.

Deleting everything after whitespace in Excel

I have a massive list of dates that are in a few different formats. What I would like to do is get rid of anything past the first whitespace character, whether it be a space, newline, tab, etc. I've found a lot of answers detailing how to get rid of whitespace, but not much about deleting substrings based on the location of whitespace. Example below:
BEFORE AFTER
37893 37893
37801 37801
37710 37710
37620 37620
36980 36980
06/30/2014\nUSD 06/30/2014
03/31/2014\nUSD 03/31/2014
12/31/2013\nUSD 12/31/2013
09/30/2013\nUSD 09/30/2013
06/30/2013\nUSD 06/30/2013
03/31/2013\nUSD 03/31/2013
12/31/2012\nUSD 12/31/2012
etc...
For your example data, this would suffice:
LEFT(A1,10)
To format as dates, you could do this:
=TEXT(LEFT(A1,10),"mm/dd/yyyy")
Here is a possible formula solution.
=IFERROR(--REPLACE(A1, IFERROR(FIND(CHAR(10), A1),LEN(A1)+1),LEN(A1), ""),REPLACE(A1, IFERROR(FIND(CHAR(10), A1),LEN(A1)+1),LEN(A1), ""))
That might seem overly complex but it guards against cells that may or may not have a line feed as well as attempting to convert numbers to numbers and dates to dates while leaving text alone. You will have to format the cells to change returned values like 41820 to 6/30/2014.

VIM: How to count a pattern on a line restricting from column A to column B similar to :s/,//gn?

I have a line that looks like the following, which I am viewing in vim.
0,0,0,1.791759,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,5.278115,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
It is from a feature vector file, each row is an instance and each column is the feature value for that feature number. I would like to figure out which feature number 5.27 corresponds to. I know the
s/,//gn
will count the number of commas in the line, but how do I restrict the command to count the number of commas in the line up to the columns with the number 5.27?
I have seen these two posts that seem relevant but cannot seem to piece them together: How to compute the number of times word appeared in a file or in some range and Search and replace in a range of line and column
s/,\ze.*5\.27//gn
The interesting part is the \ze which sets the end of the match. See :h /\ze for more information
Select the wanted area with visual mode and do
:s/\v%V%(,)//gn
\v enables us to escape less operators with \
%V limits the search to matches that start inside the visual selection
%() keeps the search together if you include alternations with |
It's not pretty but it works. See help files for /\v, \%V and \%(
There are also several versions of a plugin called vis.vim, which offers easier commands that aim to do just the above. However I haven't gotten any of them to work so I'll not comment on that further.
try this
s/,.\{-}5.27//gn
it should work.

Resources