I'm trying to match the company name in this string delimited with tabs.
Below table does not have tabs when you copy it, but I have replaced tabs with two spaces, which I assume will work fine for testing.
1025164 HERBEX IBERIA, S.L.U. KY01 4600292091
1016379 DRISCOLL´S OF EUROPE B.V. KY01 4600322589
1008809 LANDGARD NORD OBST & GEMÜSE GM KY01 4600347315
1008835 C.A.S.I. : COOPERATIVA PROVINC KY01 4600348112
1019258 SYDGRÖNT EKONOMISK FÖRENING KY02 4600343422
(The second column of the above, between 7 digit number and KY0 above)
In real life the columns are not always in the same order since it's a user preference.
I just took a few examples but names could also include /éèáà()´, pretty much anything (sadly).
I found another question here Concrete Javascript Regex for Accented Characters (Diacritics)
When I use the regex patterns in that thread, example: "\t([A-zÀ-ÿ0-9\s\.\,\_\-\'\&]+)\t" (I know some characters are still missing) to match between two tabs it becomes greedy and matches the whole line.
Is there any pattern that could match any character in a company name between tabs (or two spaces as the example above)?
Instead of returning a matched part, I matched everything and replaced it with the 1st capture group. Hope it helps.
Sub Test()
Dim str As String: str = "1025164" & vbTab & "HERBEX IBERIA, S.L.U." & vbTab & "KY01" & vbTab & "4600292091"
With CreateObject("vbscript.regexp")
.Global = True
.Pattern = "(?:^|\t)(?:\d+|KY\d+|([^\t]+))(?=\t|$)"
Debug.Print .Replace(str, "$1")
End With
End Sub
Have a look at this online demo to test the pattern:
(?:^|\t) - Match either start line anchor or a tab. Unfortunately the VBA-regex object does not support lookbehinds.
(?: - Open a non-capture group to start matching all parts you don't want to capture first:
\d+ - match 1+ digits;
| - Or:
KY\d+ - Match "KY" followed by 1+ digits;
| - Or:
([^\t]+) - nest a capture group to capture 1+ non-tabs.
) - Close non-capture group.
(?=\t|$) - Positive lookahead to assert captured text is followed by either a tab or end-line anchor.
I would use a different attempt using the split-command. The following code assumes that you have Tabs as separator and that the company name is found if the column is not numeric (only digits) and does not start with 'KY'.
Function getCompanyName(line As String) As String
Const separator = vbTab ' Replace with " " if you need that.
Dim tokens() As String, i As Integer
tokens = Split(line, separator)
For i = 0 To UBound(tokens)
If Not IsNumeric(tokens(i)) And Left(tokens(i) <> "KY") Then
getCompanyName = tokens(i)
Exit Function
End If
Next
End Function
I wrote an export to CSV file in my vb.net application, and I then exported it into Outlook.
The issue I've got, is that when the CSV file is being written, my code is checking for a comma in the current field, but while doing this, it also mistakes a double space for a comma, or space followed by 'Enter' key being pressed (for multiline textboxes)
An example would be if in the notes section of the customer, there is 4 lines of text, and one ends in a space - The user has then pressed enter to go to the next line, however the program is taking the next line of text and creating a new record for it, as it thinks it's a comma...
What is the reason for this? This means that data has to be super validated (ie checking for no double spaces etc) before it can be exported, which is far too time consuming.
Hopefully this makes sense!
This is the code:
Dim result As Boolean = True
Try
Dim sb As New StringBuilder()
Dim separator As String = ","
Dim group As String = """"
Dim newLine As String = Environment.NewLine
For Each column As DataColumn In dtable.Columns
sb.Append(wrapValue(column.ColumnName, group, separator) & separator)
Next
sb.Append(newLine)
For Each row As DataRow In dtable.Rows
For Each col As DataColumn In dtable.Columns
sb.Append(wrapValue(row(col).ToString(), group, separator) & separator)
Next
sb.Append(newLine)
Next
The code for wrapValue
Function wrapValue(value As String, group As String, separator As String) As String
If value.Contains(separator) Then
If value.Contains(group) Then
value = value.Replace(group, group + group)
End If
value = group & value & group
End If
Return value
End Function
Based on the fact that it's shortening it by 430 lines, I'd suggest it's something to do with the fact you're adding a load of "" before and after the value variable.
If it's removing a value at the start, then it will be removing a " before the first column header. As to why it's importing one record as you mentioned in the comments, I'm not entirely sure, however, I would suggest the issue lies in your wrapValue code.
Can you try changing
value = group & value & group
to
value = value
and see if that changes anything?
I have many files containing such lines :
HUIHOJ OPKKA LK
ASOIJS AISJJ PL
AOSKSI ASIJD YA
I want to convert theses lines into something like this :
HUI;HOJ ;OPKKA ;L;K
ASO;IJS ;AISJJ ;P;L
AOS;KSI ;ASIJD ;Y;A
So the first field would be 3 characters, second would be 4, third 6, fourth 1 and fifth 1 character.
I know that it's possible to do it manually with excel, but I need to have automatically conversion solution, because I have many files with exactly the same structure.
VBA has a simple command to format text, so you can do this fairly easily, and with just a single line of code
Function SpFmt(S As String) As String
SpFmt = Format(S, "###\;####\;######\;#\;#")
End Function
If you want to use a worksheet function, you can do this with a nested replace formula on the worksheet:
=REPLACE(REPLACE(REPLACE(REPLACE(A1,4,0,";"),9,0,";"),16,0,";"),18,0,";")
Use the VBA Join Function after splitting (Split function) on a space or simply replace (Replace functoin) all of the spaces with a space & semi-colon.
dim str as string
str = range("A1").value2 'HUIHOJ OPKKA LK
range("A1") = Join(Split(str, char(32)), chr(32) & chr(59)) 'HUIHOJ ;OPKKA ;LK
str = range("A1").value2 'HUIHOJ OPKKA LK
range("A1") = Replace(str, chr(32), chr(32) & chr(59)) 'HUIHOJ ;OPKKA ;LK
If you are not bounded to Excel, you can use unix tools (also available for Windows) to do this very efficiently with just one command:
cut --output-delimiter=";" -c 1-3,4-7,8-13,14,15 fixed.txt > delimited.csv
The same command in a loop:
for f in *.txt ; do
cut --output-delimiter=";" -c 1-3,4-7,8-13,14,15 "${f}" > "${f}.csv"
done
Edit : the output delimiter option does not seem to work on every platform.
Alternatively, you can use sed :
sed "s/^\(.\{3\}\)\(.\{4\}\)\(.\{6\}\)\(.\)\(.\)/\1;\2;\3;\4;\5/" fixed.txt > delimited.csv
I'm trying to use a formula in Excel to separate a bunch of words in a cell with a comma. If there are more than 5 words in the cell, I just want to get the first 5 words. To get the first five words in a cell and separate them by a comma I use this:
=SUBSTITUTE(LEFT(A1,FIND("^",SUBSTITUTE(A1," ","^",5))-1), " ", ", ")
This works fine. But the problem with this, because of the number 5 here, if I a cell contains less than 5 words, I get an error. I tried to substitute the 5 with this:
LEN(TRIM(A1))-LEN(SUBSTITUTE(A1," ",""))+1
So my function becomes this:
=SUBSTITUTE(LEFT(A1,FIND("^",SUBSTITUTE(A1," ","^",LEN(TRIM(A1))-LEN(SUBSTITUTE(A1," ",""))+1))-1), " ", ", ")
But this doesn't work, it gives me an error. Any idea how I can do this please?
Also I would like to ignore the first word if its first character is "-" (without the quotes) and just start from the second word. So in other words, I want something like this:
I love my life very much should return I, love, my, life, very
- I love my life very much should return I, love, my, life, very (the "-" is ignored")
I love my should return I, love, my
Thanks in advance for any help
Here's a somewhat different approach. Aside from the "less than 5" issue, it also deals with the "5 words with no space at the end" issue:
=LEFT(A1,FIND("^",SUBSTITUTE(A1 & "^"," ","^",5))-1)
EDIT 1: I just noticed the part about the leading "- ". My addition isn't very elegant, but it deals with it, and also TRIMS any trailing spaces:
=TRIM(LEFT(IF(LEFT(A1,2)="- ",MID(A1,3,999),A1),FIND("^",SUBSTITUTE(IF(LEFT(A1,2)="- ",MID(A1,3,999),A1) & "^"," ","^",5))-1))
EDIT 2: Oh yeah, commas:
=SUBSTITUTE(TRIM(LEFT(IF(LEFT(A1,2)="- ",MID(A1,3,999),A1),FIND("^",SUBSTITUTE(IF(LEFT(A1,2)="- ",MID(A1,3,999),A1) & "^"," ","^",5))-1))," ",",")
Try this:
=TRIM(LEFT(SUBSTITUTE(SUBSTITUTE(TRIM(SUBSTITUTE(A1,"-"," "))," ",","),",",REPT(" ",99),5),99))
This will work even if there is not a space after the dash or if there are extra spaces in the text. Often I find that input is not very clean.
=SUBSTITUTE(LEFT(SUBSTITUTE(TRIM(SUBSTITUTE(A1,"-","",1)),
" ","*",5),IFERROR(FIND("*",SUBSTITUTE(TRIM(SUBSTITUTE(A1,"-","",1)),
" ","*",5))-1,999))," ",",")
Edit: After commenting on István's, I made mine flawless too.
=SUBSTITUTE(LEFT(SUBSTITUTE(TRIM(SUBSTITUTE(LEFT(TRIM(A1),1),"-"," ",1)
&MID(TRIM(A1),2,999))," ","*",5),IFERROR(FIND("*",SUBSTITUTE(
TRIM(SUBSTITUTE(LEFT(TRIM(A1),1),"-","",1)&MID(TRIM(A1),2,999))," ","*",5))-1,999))," ",",")
But I think his is more elegant.
Try this:
=SUBSTITUTE(LEFT(SUBSTITUTE(SUBSTITUTE(TRIM(SUBSTITUTE(A1,"- ","",1))&" "," ",", "),", ","|",MIN(LEN(SUBSTITUTE(TRIM(SUBSTITUTE(A1,"- ","",1))&" "," ",", "))-LEN(SUBSTITUTE(SUBSTITUTE(TRIM(SUBSTITUTE(A1,"- ","",1))&" "," ",", ")," ","")),5)),FIND("|",SUBSTITUTE(SUBSTITUTE(TRIM(SUBSTITUTE(A1,"- ","",1))&" "," ",", "),", ","|",MIN(LEN(SUBSTITUTE(TRIM(SUBSTITUTE(A1,"- ","",1))&" "," ",", "))-LEN(SUBSTITUTE(SUBSTITUTE(TRIM(SUBSTITUTE(A1,"- ","",1))&" "," ",", ")," ","")),5)))-1),",,",",")
The formula works by taking the following steps:
Remove any leading dash-space
Trim any leading or trailing spaces
Insert comma-spaces in place of spaces and add a trailing comma-space
Calculate the lesser of 5 and the number of words in the string
Put in "|" in place of either the fifth comma-space or the trailing comma-space if the string is less than five words
Determine the position of the "|"
Strip off the "|" and all characters to the right of it
Remove any doubled commas due to any single embedded commas in the initial string
If you are willing to consider a VBA solution, this complex expression can be replaced by a user-defined function:
Function words5(InputString As String) As String
Dim wordArray As Variant
wordArray = Split(Trim(Replace(InputString, _ 'remove "-", put words into array
"-", "", , 1)), " ")
ReDim Preserve wordArray(LBound(wordArray) To _ 'drop all but the first 5 words
WorksheetFunction.Min(UBound(wordArray), 5 - 1))
words5 = Replace(Join(wordArray, ", "), ",,", ",") 'rejoin the words with ", "
End Function 'separator
On the plus side of using this code is its maintainability compared to the worksheet formula, which impossible to understand or safely alter without access to the original building blocks that were combined into the single expression.
The code would have to be installed in the workbook in which it is used or in either the standard Personal.xlsb workbook or an addin workbook.
To use the function, copy and paste it into a standard module, which can be inserted into a workbook via the VBA editor. You can open the editor with the Visual Basic button on the `Developer tab of the ribbon.
Figured I'd throw my hat in the ring also. I think this formula should cover the bases:
=SUBSTITUTE(TRIM(LEFT(SUBSTITUTE(TRIM(SUBSTITUTE(A1&" ","- ",""))," ",REPT(" ",99)),99*5))," ",",")