I figured out how to copy the cell directly from a Word table to a Excel Cell.
The cell in Word may contain multiple lines separated by pressing enter. So you have one line, press enter, next line and so on.
I want to copy this exactly as it looks into Excel. When I copy it, the entire string is one line in the Excel cell.
First capture is from Word and the next is an Excel cell.
Below is the code for copying into the first column. The rest are not needed. I am working in the Outlook so that is why I have the Excel library and the Word library being used. The code will scrape emails with Word documents.
With wrd.Tables(1)
xlSht.Cells(j, 1).Value = WorksheetFunction.Clean(.Cell(2, 2).Range.Text)
xlSht.Cells(j, 2).Value = WorksheetFunction.Clean(.Cell(3, 2).Range.Text)
xlSht.Cells(j, 4).Value = Atmt.FileName
End With
I tried splitting the Excel cell with some logic but it is hard to detect where the enter needs to happen.
Note: The "and" will not be used in all the text. It varies so I can't use that to split the Excel cell.
To start, make sure that the "Wrap Text" option is enabled on the cell
you are writing to or it won't display line breaks properly even if
they exist in the text.
Now that this is cleared out of the way, there are 2 different reasons why your code doesn't preserve the line breaks from the Word table. The first is that you are using the CLEAN function. The second is that there's a problem with how data is passed from the Word table using VBA (some information is lost). Luckily, there are ways to solve those problems.
Avoid using the CLEAN function
When you use the CLEAN function, you remove all the non-printable characters from a string of text. The problem is that the "formatting" that you see in the Word table is actually caused by the presence of 2 non-printable characters (or at least one of them). Those characters are the carriage return (CR) and the line feed (LF) characters. By using the CLEAN function you are asking to remove those characters which removes the information indicating a line break.
So I tried to do the same as you without the CLEAN function and made a Word table
then I used the following code to write the content of the first cell to Excel.
Sub ReadFromWordTable()
Dim WordApp As Word.Application
Set WordApp = GetObject(, "Word.Application")
Dim WordDoc As Word.Document
Set WordDoc = WordApp.ActiveDocument
Dim xlSht As Worksheet
Set xlSht = ActiveSheet
Dim TempString As String
With WordDoc.Tables(1)
TempString = .Range.Text
End With
xlSht.Cells(1, 1).Value2 = TempString
'StringDrillDown TempString
End Sub
and saw that the line break does not appear (we'll come back to this later) and that there is some garbage characters at the end of my cell.
Now I see why you used the CLEAN function : to make those garbage characters go away! If only there was an out-of-the-box VBA function to remove those non-printable characters without removing CR and LF from the string!
Since there isn't any and that they only appear at the end, I would suggest to simply clean TempString using the following code which will remove all the non-printable characters starting from the right and stop as soon as it encounters a printable character.
Dim i As Long, NbOfCharacter As Long
NbOfCharacter = Len(TempString)
For i = Len(TempString) To 1 Step -1
If Asc(Mid(TempString, i, 1)) < 32 Then
NbOfCharacter = NbOfCharacter - 1
Else
Exit For
End If
Next
TempString = Left(TempString, NbOfCharacter)
Note here that I'm using the Asc function. It returns the Extended ASCII (aka. ANSI) character code (a number from 1 to 255) that uniquely identifies a character. In our case, all non-printable characters return a value below 32 so we can easily filter them out.
Make sure the line feed character is present in the string you write to the cell
As you saw when we used the value of .Range.Text directly, the line break didn't get passed through correctly. To understand the problem, we might want to drill down on the different characters that make up our TempString variable. For that you could use a procedure like this:
Sub StringDrillDown(str As String)
Dim ws As Worksheet
With ActiveWorkbook
Set ws = .Sheets.Add(AFTER:=.Sheets(.Sheets.Count))
End With
ws.Range("A1") = "Character"
ws.Range("B1") = "Ascii Code"
Dim i As Long
For i = 1 To Len(str)
ws.Cells(i + 1, 1).Value2 = Mid$(str, i, 1)
ws.Cells(i + 1, 2).Value2 = Asc(Mid$(str, i, 1))
Next i
End Sub
Giving us this:
What we notice is that the only character we have between "and" and "some" is the character number 13 which corresponds to CR (this seems to be a quirk of how string data is transferred between Word and Excel). So, we are missing the LF requiered to make it clear to Excel that we want a line break between those 2 words.
To solve this issue, we could use the following :
With WordDoc.Tables(1)
TempString = Replace(.Cell(1, 1).Range.Text, Chr(13), Chr(13) & Chr(10))
End With
This code will replace all lonely CR by a CRLF (note that the character code for LF is 10).
A cautionary note: If there were already CRLF characters in the string, the line of code above would double them but that's not the case here.
Finally, our initial code example would now be the following:
Sub ReadFromWordTable()
Dim WordApp As Word.Application
Set WordApp = GetObject(, "Word.Application")
Dim WordDoc As Word.Document
Set WordDoc = WordApp.ActiveDocument
Dim xlSht As Worksheet
Set xlSht = ActiveSheet
Dim TempString As String
With WordDoc.Tables(1)
TempString = Replace(.Cell(1, 1).Range.Text, Chr(13), Chr(13) & Chr(10))
End With
Dim i As Long, NbOfCharacter As Long
NbOfCharacter = Len(TempString)
For i = Len(TempString) To 1 Step -1
If Asc(Mid(TempString, i, 1)) < 32 Then
NbOfCharacter = NbOfCharacter - 1
Else
Exit For
End If
Next
TempString = Left(TempString, NbOfCharacter)
xlSht.Cells(1, 1).Value2 = TempString
'StringDrillDown TempString
End Sub
Related
Here is my code as text:
Function NurZahl(ByVal Text As String) As Long
Dim i%, tmp
Dim Val As String
For i = 1 To Len(Text)
Val = Mid(Text, i, 1)
If(OR(IsNumeric(Val),Val=","),TRUE, FALSE) Then tmp = tmp & Mid(Text, i, 1)
Next i
NurZahl = tmp
End Function
Complete Beginner here:
What is my problem with the if?
Is there a possibility to show me the exact problem in excel?
The text is only highlighted with red color - if
i hover with the mouse-arrow above, there is no error message given.
This is my source of my knowledge for the structure of my if: Support Microsoft - Is this the wrong type of documentation for me?
Got the solution now with your help (thanks to everyone who replied) - I wanted to extract a number with decimal from a string:
Function CleanString(strIn As String) As String
Dim objRegex
Set objRegex = CreateObject("vbscript.regexp")
With objRegex
.Global = True
.Pattern = "[^\d,]+"
CleanString = .Replace(strIn, vbNullString)
End With
End Function
The "If" line is written like an Excel formula. This is what is should look like in basic.
If IsNumeric(Val) Or Val = "," Then tmp = tmp & Mid(Text, i, 1)
The red text is a syntax error. If you go to the Debug menu and click Compile VBA Project, you'll get the error message.
The link that you included is for functions that are typed into a cell. You need a VBA reference. Here's a link to MS's reference, but a decent book would make your life a lot easier. Just search for "Excel VBA".
https://learn.microsoft.com/en-us/office/vba/api/overview/
You can try something like this:
Function NurZahl (ByVal MyText As String) As Long
' Set up the variables
Dim i as Integer
Dim tmp as String
Dim MyVal As String
' Start tmp with an empty string
tmp = ""
' Loop through each character of the input MyText
For i = 1 To Len(MyText)
' Read the character
MyVal = Mid(MyText, i, 1)
' Check whether the character is a number or a comma
' and take reasonable action
If IsNumeric(MyVal) or MyVal = "," then
tmp = tmp & Mid(MyText, i, 1)
End if
Next i
NurZahl = tmp
End Function
You'll have to change the code above to do what you want to do. The illustration above is to show how VBA code can be written.
In your VBA editor, when you see a red color on a line that means the editor has detected some issue with it.
If you were writing this function in Excel, you would typically use that function in a cell like this: =NurZahl(A1)
First Things First: The macro is run from Excel VBA editor, but performs the biggest part of its job on the previously opened Word document, where it's goal is to find the full names of the people who are the contracting parties in the agreement being analized.
The issue I'm experiencing with the code is that it is variable number of words, that I need to pull from every consecutive paragraph. If the name is Will SMITH, then its two words I need to pull, when it's Carrie Ann MOSS, then it's three words, sometimes it can be Anna Nicole SMITH BURKE, than its four words but when it's Anna Nicole SMITH-BURKE, than its five words and so on.
The other idea to get this full name is, that it always ends with a coma, and this coma is always the first coma in this paragraph, where the full name appears.
ATTENTION !!! The Paragraphs we work with are not ListParagraphs. They are the normal/ordinary ones albeit indented and numbered. I get these contracts from people who don't care to use numbered list :-(
So for the last time: The numbered list is not enabled on those paragraphs we work with.
This is how it looks like in Word and the selected words are the names and surnames that the macro is supposed to extract from the document - excluding the coma after the last surname.
Sub FindNamesCleanDraftWithLoop()
'Variables declaration
Dim WordApp As Word.Application
Dim WordDoc As Word.Document
Dim ExcelApp As Excel.Application
Dim MySheet As Excel.Worksheet
Dim Para As Word.Paragraph
Dim Rng As Word.Range
Dim RngStart As Word.Range
Dim RngEnd As Word.Range
Dim TextToFind1 As String
Dim TextToFind2 As String
Dim firstName As String
Dim startPos As Long
Dim endPos As Long
Application.ScreenUpdating = False
'Assigning object variables
Set WordApp = GetObject(, "Word.Application")
Set ExcelApp = GetObject(, "Excel.Application")
Set WordDoc = WordApp.ActiveDocument
Set MySheet = Application.ActiveWorkbook.ActiveSheet
'Set MySheet = ExcelApp.ActiveWorkbook.ActiveSheet
Set Rng = WordApp.ActiveDocument.Content
TextToFind1 = "REGON 364061169, NIP 951-24-09-783,"
TextToFind2 = "- ad."
'InStr function returns a Variant (Long) specifying the position of the first occurrence of one string within another.
startPos = InStr(1, Rng, TextToFind1) - 1 'here we get 1421, we're looking 4 "TextToFind1"
endPos = InStr(1, Rng, TextToFind2) - 1 'here we get 2246, we're looking 4 "- ad."
If startPos = 0 Or endPos = 0 Then Exit Sub
Rng.SetRange Start:=startPos, End:=endPos
Debug.Print Rng.Paragraphs.Count
If startPos = 0 Or endPos = 0 Then
MsgBox ("Client's names were not found!")
Else
'somewhere here I need your help to write some lines that will
'recognize how many words need to be pulled to extract the full
'name/names + surname/surnames and nothing else - we end on the first coma.
For Each Para In Rng.Paragraphs
firstName = Trim$(Para.Range.Words(3))
Debug.Print Para.Range.Words(1) & Para.Range.Words(2) & _
Para.Range.Words(3) & Para.Range.Words(4) & _
Para.Range.Words(5) & Para.Range.Words(6)
Next Para
End If
End Sub
There in the For Each Para ... Next Para loop, I need your help to write some lines that will recognize how many words need to be pulled to extract the full name/names + surname/surnames and nothing else - we end on the first coma - that means excluding the coma after the last surname.
This might do what you want, if I have understood correctly.
If the Number and Dot are NOT part of the paragraph, then to extract the full name, you could use:
Debug.Print Left(Para, InStr(Para, ",") - 1)
If the Number and Dot ARE part of the paragraph, then:
Dim Start As Long, Length As Long
Start = InStr(Para, ".") + 1
Length = InStr(Para, ",") - Start
Debug.Print Trim(Mid(Para, Start, Length))
If you want to split the capitalized portion of the name from the rest, post several samples of the actual strings as TEXT as I am unable to copy/paste your screenshots into Excel.
This is actually question on Natural Language Processing - NLP (as scientific field) - problem of Tokenization you already solved, but you are now facing with: Part of Sentence (POS) tagging (and construction of semantic Chunk-s). Issue of Person Identity recognition is one of first since very dawn of the discipline, so, there are many libraries for this, but I'm skeptic you'll find anything you can use from VBA that easy. Also, for comprehensive/robust solution you'll need proper lexicon with names and lastnames (suitable resource would be a Morpho-sintactic Lexicon/Dictionary, extracted only for PName LName tags).
To solve this, you'll have to do some research on keywords mentioned above. I have no experience with VBA, but asume C# and .NET would be the easiest to integrate. Therefore, Standford CoreNLP for .NET would be starting point I recon: https://sergey-tihon.github.io/Stanford.NLP.NET/StanfordCoreNLP.html
Highly relevant lexical source for English:
https://wordnet.princeton.edu/
You can query WordNet lexicon using WordNet.NET. I've used both and its very good resource&library.
I am trying to create code in VBA that will search thru a module, find specific text, then add a string BEFORE that text in the same line. For example, every time it says "yo" in the module, I want it changed to say "Add This yo".
The code below successfully finds instances where it says "yo" in the module, but it doesn't add the text where I want it to. Instead, text is added at the very top of the module (not even inside a sub). How do I get this text to be added before "yo"?
Public Sub Edit()
Dim vb As VBComponent
Dim i As Long
Dim intFoundLine As Integer
Dim strSearchPhrase As String
Set vb = ThisWorkbook.VBProject.VBComponents("Module2")
strSearchPhrase = "yo"
intLinesNr = vb.CodeModule.CountOfLines
For i = 1 To intLinesNr
If vb.CodeModule.Find(strSearchPhrase, i, 1, -1, -1) Then
intFoundLine = i
MsgBox "Found at " & intFoundLine
vb.CodeModule.AddFromString ("Add This")
End If
Next
End Sub
Replace the line with a the new text:
vb.CodeModule.ReplaceLine i, "Add This" & vb.CodeModule.Lines(i, 1)
Based on Mathieu Guindon's answer, here is how I would handle all the instances of the search phrase:
Do While vb.CodeModule.Find(strSearchPhrase, i, 1, -1, -1)
vb.CodeModule.ReplaceLine i, "Add This" & vb.CodeModule.Lines(i, 1)
i = i + 1
Loop
'
Iterating all lines of a module seems a poor use of the Find method, which is capable of finding text anywhere in a module and takes ByRef arguments that, if the function returns True, will contain the exact location of the found text - that's a great use case for a user-defined Type:
Option Explicit
Private Type CodeStringLocation
StartLine As Long
EndLine As Long
StartColumn As Long
EndColumn As Long
End Type
Sub test()
Dim module As CodeModule
Set module = ThisWorkbook.VBProject.VBComponents("Module1").CodeModule
Dim foundAt As CodeStringLocation
If module.Find("test", foundAt.StartLine, foundAt.StartColumn, foundAt.EndLine, foundAt.EndColumn) Then
'L9C5-L9C9
Debug.Print "L" & foundAt.StartLine & "C" & foundAt.StartColumn & "-L" & foundAt.EndLine & "C" & foundAt.EndColumn
End If
End Sub
Now that you have the in-editor line number you want to rewrite, use CodeModule.ReplaceLine to rewrite it - for example by replacing the Debug.Print statement above with this:
Dim newLine As String
newLine = Replace(module.Lines(foundAt.StartLine, 1), "test", "Renamed")
module.ReplaceLine foundAt.StartLine, newLine
If you need to replace all occurrences of the search text in the module, simply run the search until CodeModule.Find returns False - like this:
Dim foundAt As CodeStringLocation
Do While module.Find("test", foundAt.StartLine, foundAt.StartColumn, foundAt.EndLine, foundAt.EndColumn)
Dim newLine As String
newLine = Replace(module.Lines(foundAt.StartLine, 1), "test", "Renamed")
module.ReplaceLine foundAt.StartLine, newLine
Loop
Key point being that everything but the search text is an output parameter; by hard-coding any of these arguments, you lose the reference to the value they return. If you want to limit the search to a specific scope or range of lines, the right way to do it would be to configure the foundAt values before running the search.
Dim foundAt As CodeStringLocation
foundAt.StartLine = 10
Do While module.Find("test", foundAt.StartLine, foundAt.StartColumn, ...
That way you leverage the actual bidirectional nature of the arguments, without losing the reference to the output values - and without iterating up to 10K lines of code when you don't need to.
Note that this is purely text-based search and takes absolutely zero syntactical considerations: the API will not care if the search string is found in an identifier, a comment, a string literal, or a keyword.
This seems obvious and straightforward, but it's not working. I'm working from in Excel VBA. I get the open Word application and extract text from a bookmarked spot within a table in Word. Then the trouble starts. The resulting string is 5 chr(32) spaces. But whatever I try, I can't get rid of the spaces. Why are the spaces not being trimmed or replaced?
Dim Wd As Word.Application
Set Wd = GetObject(, "Word.Application")
Dim doc As Word.Document
'Dim r As Word.Range
'Dim p As Word.Paragraph
Dim tb As Word.Table
Set doc = Wd.ActiveDocument
Set tb = doc.Tables(1)
'tb.Select
Dim Place As String
Place = Trim(doc.Bookmarks("County").Range.Text)
'outputs length 5
Debug.Print Len(Place)
'this outputs 32 5 times so I know we have chr(32) and not something else
Dim i As Integer
For i = 1 To Len(Place)
Debug.Print Asc(Mid(Place, i, 1))
Next
'try trim
Place = Trim(Place)
Debug.Print Len(Place)
'still 5 spaces
'try replace
Dim s As String
s = VBA.Replace(Place, Chr(32), "")
Debug.Print Len(Place)
'still 5 spaces
What is happening with my code?
Probably a unicode space, consider U2000 EN QUAD Whitespace:
x="W" & chrw(&h2000) & "W"
?x
W W
?asc(mid(x,2,1))
32 <= normalized
?ascw(mid(x,2,1))
8192 <= true character
So examine the char with ascw and replace with chrw
I tried to make a function to search for a string in a document and check what is the first char in the string that is colored in red.
for example I know that my document contains the string "bread water juice peach wine". Imagine that the bold text is red colored. I want the function to return the int 19 (first red char - p).
Function check(stringToCheck As String) As Integer
Dim oRng As Word.Range
Set oRng = ActiveDocument.Content
With oRng.Find
' to ensure that unwanted formats aren't included as criteria
.ClearFormatting
'You don't care what the text is
.Text = stringToCheck
'Loop for each match and set a color
While .Execute
MsgBox (oRng.Text)
For i = 1 To 40
'take the Nth char of the string an check if it's red
'the following msgBox is working
MsgBox (Mid(oRng, i, 1))
If Mid(Orng, i, 1).Font.Color = wdColorRed Then
'the following msgBox is not working which means the error is in the last line.
MsgBox ("made it")
check = i
Exit Function
End If
Next i
Wend
End With
End Function
every time I try to call the function I have the error "run time error 424 - object required".
I added some msgboxes to see when is the function interrupted and added a comment in that place.
what is the problem? how can I fix it?
First thing's first: Use Option Explicit at the beginning of your module. You'll quickly find that your code has compilation issues.
Do you mean to use oRng or myRange? This should be consistent.
Once you've done that...
Mid(myRange, i, 1) returns a string, not an object.
You may want to use If oRng.Characters(1).Font.Color = wdColorRed Then instead.
Here's your code modified that returns correctly:
Function check(stringToCheck As String) As Integer
Dim oRng As Word.Range
Set oRng = ActiveDocument.Content
Dim i As Integer
With oRng.Find
' to ensure that unwanted formats aren't included as criteria
.ClearFormatting
'You don't care what the text is
.Text = stringToCheck
'Loop for each match and set a color
While .Execute
MsgBox (oRng.Text)
For i = 1 To 40
'take the Nth char of the string an check if it's red
'the following msgBox is working
MsgBox oRng.Characters(i)
If oRng.Characters(i).Font.Color = wdColorRed Then
'the following msgBox is not working which means the error is in the last line.
MsgBox ("made it")
check = i
Exit Function
End If
Next i
Wend
End With
End Function