Copying all #mentions and #hashtags from column A to Columns B and C in Excel - excel

I have a really large database of tweets. Most of the tweets have multiple #hashtags and #mentions. I want all the #hashtags separated with a space in one column and all the #mentions in another column. I already know how to extract the first occurrence of a #hashtag and a #mention. But I don't know to get them all? Some of the tweets have as much as 8 #hashtags. Manually going through the tweets and copy/pasting the #hashtags and #mentions seem an impossible task for over 5,000 tweets.
Here is an example of what I want. I have Column A and I want a macro that would populate columns B and C. (I'm on Windows &, Excel 2010)
Column A
-----------
Dear #DavidStern, #spurs put a quality team on the floor and should have beat the #heat. Leave #Pop alone. #Spurs a classy organization.
Live broadcast from #Nacho_xtreme: "Papelucho Radio"http://mixlr.com nachoxtreme-radio … #mixlr #pop #dance
"Since You Left" by #EmilNow now playing on KGUP 106.5FM. Listen now on http://www.kgup1065.com  #Pop #Rock
Family Night #battleofthegenerations Dad has the #Monkeys Mom has #DonnieOsman #michaelbuble for me #Dubstep for the boys#Pop for sissy
#McKinzeepowell #m0ore21 I love that the PNW and the Midwest are on the same page!! #Pop
I want Column B to look like This:
Column B
--------
#DavidStern #Pop #Spurs
#mixlr #pop #dance
#Pop #Rock
#battleofthegenerations #Monkeys #DonnieOsman #Dubstep #Pop
#pop
And Column C to look like this:
Column C:
----------
#spurs #heat
#Nacho_xtreme
#EmilNow
#michaelbuble
#McKinzeepowell #m0ore21

Consider using regular expressions.
You can use regular expressions within VBA by adding a reference to Microsoft VBScript Regular Expressions 5.5 from Tools -> References.
Here is a good starting point, with a number of useful links.
Updated
After adding a reference to the Regular Expressions library, put the following function in a VBA module:
Public Function JoinMatches(text As String, start As String)
Dim re As New RegExp, matches As MatchCollection, match As match
re.pattern = start & "\w*"
re.Global = True
Set matches = re.Execute(text)
For Each match In matches
JoinMatches = JoinMatches & " " & match.Value
Next
JoinMatches = Mid(JoinMatches, 2)
End Function
Then, in cell B1 put the following formula (for the hashtags):
=JoinMatches(A1,"#")
In column C1 put the following formula:
=JoinMatches(A1,"#")
Now you can copy just the formulas all the way down.

you could convert text to columns using the other character #, then against for #s and then concatenate the rest of the text back together for column A, if you are not familiar with regular expressions see (#Zev-Spitz)

Related

How can I replace multiple string at once in Excel?

The function I expected
some_function(original_text, "search_text", "replacement_text")
The value of the second & third parameters will be multiple characters. For example. The result will replace the character based on the location of the character at the second & third parameters
some_function("9528", "1234567890", "abcdefghij")
1 -> a
2 -> b
3 -> c
...
8 -> h
9 -> i
0 -> j
The result of some_function will be iebh. The nested SUBSTITUTE function can archive the goal but I hope to compact the complexity.
The way you described your requirement is best written out via REDUCE(), a lambda-related helper function and recently announced to be in production:
=REDUCE("9528",SEQUENCE(10),LAMBDA(x,y,SUBSTITUTE(x,MID("1234567890",y,1),MID("abcdefghij",y,1))))
Needless to say, this would become more vivid when used with cell-references:
Formula in A3:
=REDUCE(A1,SEQUENCE(LEN(B1)),LAMBDA(x,y,SUBSTITUTE(x,MID(B1,y,1),MID(C1,y,1))))
Another, more convoluted way, could be:
=LET(A,9528,B,1234567890,C,"abcdefghij",D,MID(A,SEQUENCE(LEN(A)),1),CONCAT(IFERROR(MID(C,FIND(D,B),1),D)))
Or, as per the sceenshot above:
=LET(A,A1,B,B1,C,C1,D,MID(A,SEQUENCE(LEN(A)),1),CONCAT(IFERROR(MID(C,FIND(D,B),1),D)))
Function Multi_Replace(Original As String, Search_Text As String, Replace_With As String) As String
'intEnd represents the last character being replaced
Dim intEnd As Long: intEnd = WorksheetFunction.Min(Len(Search_Text), Len(Replace_With))
'necessary if Search text and replace text are different lengths;
Dim intChar As Long 'to track which character we're replacing
'Replace each character individually
For intChar = 1 To intEnd
Original = Replace(Original, Mid(Search_Text, intChar, 1), Mid(Replace_With, intChar, 1))
Next
Multi_Replace = Original
End Function
Maybe simpler if you do not have lambda yet: =TEXTJOIN(,,CHAR(96+MID(A1,SEQUENCE(LEN(A1)),1)))
*Note that this will not return 0 as the expected result.
Let's say you have a list of countries in column A and aim to replace all the abbreviations with the corresponding full names. you start with inputting the "Find" and "Replace" items in separate columns (D and E respectively), and then enter this formula in B2:
=XLOOKUP(A2, $D$2:$D$4, $E$2:$E$4, A2)
Translated from the Excel language into the human language, here's what the formula does:
Search for the A2 value (lookup_value) in D2:D4 (lookup_array) and return a match from E2:E4 (return_array). If not found, pull the original value from A2.
Double-click the fill handle to get the formula copied to the below cells, and the result won't keep you waiting:
Since the XLOOKUP function is only available in Excel 365, the above formula won't work in earlier versions. However, you can easily mimic this behavior with a combination of IFERROR or IFNA and VLOOKUP:
=IFNA(VLOOKUP(A2, $D$2:$E$4, 2, FALSE), A2)

Excel: How to find six different combinations of words in string?

I have been working for several days on this and have researched everything looking for this answer. I'd appreciate any help you can give.
In Excel I am searching a string of text in column A:
Bought 1 HD Sep 3 2021 325.0 Call # 2.75
I am detecting the first word (in this case "Bought") and detecting the last word before "#" symbol (in this case "Call").
I am then detecting the price following the "#" symbol (in this case "2.75"). This number will go into column B (header "Open") or column C (header "Close") depending on the combination of words found:
Sold/Put=Close
Sold/Call=Open
Bought/Put=Open
Bought/Call=Close
Sold (by itself)=Open
Sold (by itself)=Close.
Bought 1 HD Sep 3 2021 325.0 Call # 2.75
The combination found in the above string is: "Bought Call". Therefore the number at the end ("2.75"), goes into "Open" column.
Here's another example:
Sold 4 AI Sep 17 2021 50.0 Put # 1.5
The combination found in the above string is: "Sold Put". Therefore the number at the end ("1.5") goes into "Close" column.
I am currently using this formula to determine if the string contains "Sold" and "Call" and get the desired number and it does work:
=IF(AND(
ISNUMBER(SEARCH({"Sold","Call"},A10))),
TRIM(MID(A10,SEARCH("#",A10)+LEN("#"),255))," ")
But, I don't know how to search for all the other possible combinations.
The point behind this is to be able to paste the transaction from the broker and have most of the entry process automated. I'm sure many will benefit from this as I've not found anything like this.
I'd appreciate any help and if possible, an explanation of the formula so I can better learn.
Thanks!
I think you have the right idea, but would just extend the IF statement.
Something like the below might work for you:
=IF(ISNUMBER(SEARCH("Call", $A1)),
IF(ISNUMBER(SEARCH({"Bought","Sold"}, $A1)),
NUMBERVALUE(RIGHT($A1, LEN($A1)-SEARCH("#", $A1))),""),
IF(ISNUMBER(SEARCH({"!!!","!!!","Bought","Sold"}, $A1)),
NUMBERVALUE(RIGHT($A1, LEN($A1)-SEARCH("#", $A1))),""))
Just enter in column B and drag down; columns B through E should fill as needed.
For example:
Note that the search for "!!!" is just random characters, it can be anything that you don't think has a good chance of appearing in the string.
Here/screenshots refer:
(requires Office 365 compatible version Excel)
Main lookup
=LET(fn_1,MATCH("*"&$H$7:$H$12&"*",B4,0),fn_2,MATCH("*"&$I$7:$I$12&"*",B4,0),IFERROR(INDEX($J$7:$J$12,MATCH(1,IF($I$7:$I$12="",fn_1*ISNUMBER(fn_2),fn_1*fn_2),0)),))
EDIT:
Other Excel versions:
=IFERROR(INDEX($J$7:$J$12,MATCH(1,IF($I$7:$I$12="",MATCH("*"&$H$7:$H$12&"*",B4,0)*ISNUMBER(MATCH("*"&$I$7:$I$12&"*",B4,0)),MATCH("*"&$H$7:$H$12&"*",B4,0)*MATCH("*"&$I$7:$I$12&"*",B4,0)),0)),)
(all that falls away is the 'Let' formula, replacing fn_1 and fn_2 with respective functions in index formula within the let making first equation somewhat longer, but otherwise identical)
Example applications
Have provided 2 examples of how one might customize to insert numeric in one of the columns (the key part to this question is really how to do lookup in first instance, from thereon it's a matter of finetuning/taking appropriate action)...
Assuming calls/buys are "long" position and strike price go in first col (here, D), and puts/sales are "short" position with strike price going in 2nd col (here, E):
Long - insert strike price col D
=IF(LET(fn_1,MATCH("*"&$H$7:$H$12&"*",B4,0),fn_2,MATCH("*"&$I$7:$I$12&"*",B4,0),IFERROR(INDEX($K$7:$K$12,MATCH(1,IF($I$7:$I$12="",fn_1*ISNUMBER(fn_2),fn_1*fn_2),0)),))=1,MID(SUBSTITUTE(B4," ",""),SEARCH("#",SUBSTITUTE(B4," ",""))+1,LEN(SUBSTITUTE(B4," ",""))),"")
EDIT
Other Excel versions:
=IF(IFERROR(INDEX($K$7:$K$12,MATCH(1,IF($I$7:$I$12="",MATCH("*"&$H$7:$H$12&"*",B4,0)*ISNUMBER(MATCH("*"&$I$7:$I$12&"*",B4,0)),MATCH("*"&$H$7:$H$12&"*",B4,0)*MATCH("*"&$I$7:$I$12&"*",B4,0)),0)),)=1,MID(SUBSTITUTE(B4," ",""),SEARCH("#",SUBSTITUTE(B4," ",""))+1,LEN(SUBSTITUTE(B4," ",""))),"")
Short - insert strike price col E
=IF(LET(fn_1,MATCH("*"&$H$7:$H$12&"*",B4,0),fn_2,MATCH("*"&$I$7:$I$12&"*",B4,0),IFERROR(INDEX($K$7:$K$12,MATCH(1,IF($I$7:$I$12="",fn_1*ISNUMBER(fn_2),fn_1*fn_2),0)),))=2,MID(SUBSTITUTE(B4," ",""),SEARCH("#",SUBSTITUTE(B4," ",""))+1,LEN(SUBSTITUTE(B4," ",""))),"")
EDIT
Other Excel versions:
Follow same routine in previous Edits (remove Let, replace fn_1 & fn_2 with respective formulae...)
Note similarity in all 3 equations above: 2nd and 3rd contain 1st (effectively they just wrap a big old 'if' statement around 1st, use lookup_2 col (here, col K), and use mid/search to extract rate after the hashtag.
Assumes you don't have other hashtags in the sentence..
Customize as required.

How do I sum data based on a PART of the headers name?

Say I have columns
/670 - White | /650 - black | /680 - Red | /800 - Whitest
These have data in their rows. Basically, I want to SUM their values together if their headers contain my desired string.
For modularity's sake, I wanted to merely specify to sum /670, /650, and /680 without having to mention the rest of the header text.
So, something like =SUMIF(a1:c1; "/NUM & /NUM & /NUM"; a2:c2)
That doesn't work, and honestly I don't know what i should be looking for.
Additional stuff:
I'm trying to think of the answer myself, is it possible to mention the header text as condition for ifs? Like: if A2="/650 - Black" then proceed to sum the next header. Is this possible?
Possibility it would not involve VBA, a draggable formula would be preferable!
At this point, I may as well request a version which handles the complete header name rather than just a part of it as I believe it to be difficult for formula code alone.
Thanks for having a look!
Let me know if I need to elaborate.
EDIT: In regards to data samples, any positive number will do actually, damn shame stack overflow doesn't support table markdown. Anyway, for example then..:
+-------------+-------------+-------------+-------------+-------------+
| A | B | C | D | E |
+---+-------------+-------------+-------------+-------------+-------------+
| 1 |/650 - Black |/670 - White |/800 - White |/680 - Red |/650 - Black |
+---+-------------+-------------+-------------+-------------+-------------+
| 2 | 250 | 400 | 100 | 300 | 125 |
+---+-------------+-------------+-------------+-------------+-------------+
I should have clarified:
The number range for these headers would go from /100 - /9999 and no more than that.
EDIT:
Progress so far:
https://docs.google.com/spreadsheets/d/1GiJKFcPWzG5bDsNt93eG7WS_M5uuVk9cvkt2VGSbpxY/edit?usp=sharing
Formula:
=SUMPRODUCT((A2:D2*
(MID($A$1:$D$1,2,4)=IF(LEN($H$1)=4,$H$1&"",$H$1&" ")))+(A2:D2*
(MID($A$1:$D$1,2,4)=IF(LEN($I$1)=4,$I$1&"",$I$1&" ")))+(A2:D2*
(MID($A$1:$D$1,2,4)=IF(LEN($J$1)=4,$J$1&"",$J$1&" "))))
Apparently, each MID function is returning false with each F9 calculation.
EDIT EDIT:
Okay! I found my issue, it's the /being read when you ALSO mentioned that it wasn't required. Man, I should stop skimming!
Final Edit:
=SUMPRODUCT((RETURNSUM*
(MID(HEADER,2,4)=IF(LEN(Match5)=4,Match5&"",Match5&" ")))+(RETURNSUM*
(MID(HEADER,2,4)=IF(LEN(Match6)=4,Match6&"",Match6&" ")))+(RETURNSUM*
(MID(HEADER,2,4)=IF(LEN(Match7)=4,Match7&"",Match7&" ")))
The idea is that Header and RETURNSUM will become match criteria like the matches written above, that way it would be easier to punch new criterion into the search table. As of the moment, it doesn't support multiple rows/dragging.
I have knocked up a couple of formulas that will achieve what you are looking for. For ease I have made the search input require the number only as pressing / does not automatically type into the formula bar. I apologise for the length of the answer, I got a little carried away with the explanation.
I have set this up for 3 criteria located in J1, K1 and L1.
Here is the output I achieved:
Formula 1 - SUMPRODUCT():
=SUMPRODUCT((A4:G4*(MID($A$1:$G$1,2,4)=IF(LEN($J$1)=4,$J$1&"",$J$1&" ")))+(A4:G4*(MID($A$1:$G$1,2,4)=IF(LEN($K$1)=4,$K$1&"",$K$1&" ")))+(A4:G4*(MID($A$1:$G$1,2,4)=IF(LEN($L$1)=4,$L$1&"",$L$1&" "))))
Sumproduct(array1,[array2]) behaves as an array formula without needed to be entered as one. Array formulas break down ranges and calculate them cell by cell (in this example we are using single rows so the formula will assess columns seperately).
(A4:G4*(MID($A$1:$G$1,2,4)=IF(LEN($J$1)=4,$J$1&"",$J$1&" ")))
Essentially I have broken the Sumproduct() formula into 3 identical parts - 1 for each search condition. (A4:G4*: Now, as the formula behaves like an array, we will multiply each individual cell by either 1 or 0 and add the results together.
1 is produced when the next part of the formula is true and 0 for when it is false (default numeric values for TRUE/FALSE).
(MID($A$1:$G$1,2,4)=IF(LEN($J$1)=4,$J$1&"",$J$1&" "))
MID(text,start_num,num_chars) is being used here to assess the 4 digits after the "/" and see whether they match with the number in the 3 cells that we are searching from (in this case the first one: J1). Again, as SUMPRODUCT() works very much like an array formula, each cell in the range will be assessed individually.
I have then used the IF(logical_test,[value_if_true],[value_if_false]) to check the length of the number that I am searching. As we are searching for a 4 digit text string, if the number is 4 digits then add nothing ("") to force it to a text string and if it is not (as it will have to be 3 digits) add 1 space to the end (" ") again forcing it to become a text string.
The formula will then perform the calculation like so:
The MID() formula produces the array: {"650 ","670 ","800 ","680 ","977 ","9999","143 "}. This combined with the first search produces {TRUE,FALSE,FALSE,FALSE,FALSE,FALSE,FALSE} which when multiplied by A4:G4
(remember 0 for false and 1 for true) produces this array: {250,0,0,0,0,0,0} essentially pulling the desired result ready to be summed together.
Formula 2: =SUM(IF(Array)): [This formula does not work for 3 digit numbers as they will exist within the 4 digit numbers! I have included it for educational purposes only]
=SUM(IF(ISNUMBER(SEARCH($J$1,$A$1:$G$1)),A8:G8),IF(ISNUMBER(SEARCH($K$1,$A$1:$G$1)),A8:G8),IF(ISNUMBER(SEARCH($L$1,$A$1:$G$1)),A8:G8))
The formula will need to be entered as an array (once copy and pasted while still in the formula bar hit CTRL+SHIFT+ENTER)
This formula works in a similar way, SUM() will add together the array values produced where IF(ISNUMBER(SEARCH() columns match the result column.
SEARCH() will return a number when it finds the exact characters in a cell which represents it's position in number of characters. By using ISNUMBER() I am avoiding having to do the whole MID() and IF(LEN()=4,""," ") I used in the previous formula as TRUE/FALSE will be produced when a match is found regardless of it's position or cell formatting.
As previously mentioned, this poses a problem as 999 can be found within 9999 etc.
The resulting array for the first part is: {250,FALSE,FALSE,FALSE,FALSE,FALSE,FALSE} (if you would like to see the array you can highlight that part of the formula and calculate with F9 but be sure to highlight the exact brackets for that part of the formula).
I hope I have explained this well, feel free to ask any questions about stuff that you don't understand. It is good to see people keen to learn and not just fishing for a fast answer. I would be more than happy to help and explain in more depth.
I start this solution with the names in an array, you can read the header names into an array with not too much difficulty.
Sub test()
Dim myArray(1 To 4) As String
myArray(1) = "/670 - White"
myArray(2) = "/650 - black"
myArray(3) = "/680 - Red"
myArray(4) = "/800 - Whitest"
For Each ArrayValue In myArray
'Find position of last character
endposition = InStr(1, ArrayValue, " - ", vbTextCompare)
'Grab the number section from the string, based on starting and ending positions
stringvalue = Mid(ArrayValue, 2, endposition - 2)
'Convert to number
NumberValue = CLng(stringvalue)
'Add to total
Total = Total + NumberValue
Next ArrayValue
'Print total
Debug.Print Total
End Sub
This will print the answer to the debug window.

Microsoft Excel Conditional String Output Formula

I'm trying to create a formula that takes in data formatted thusly:
Ex.
A1 Excel Data --> (Useless Data - 'example')
Then, Grab the data after the 'hyphen' -> MID(A1,FIND("-",A1)+1,1)
Finally, look at the first letter of the selected data and sort it into four Categories -> IF(MID(A1,FIND("-",A1)+1,1)="e""E","Example"
Notes:
Needs to search for both upper and lower-case lettering.
If none of the statements are 'true' it should default to category 'other'
Ideally a cool way to remove formatting such as '(', ' ', ', ' would be cool.
Function so Far:
=IF(MID(A1,FIND("-",A1)+1,1)="a""A","Amex",IF(MID(A1,FIND("-",A1)+1,1)="C""c","Citi Bank",IF(MID(A1,FIND("-",A1)+1,1)="W""w","Wells Fargo", "Other")))
This function will search for both upper and lower case, and will default to other.
=IF(ISNUMBER(SEARCH("a",MID(A1,FIND("-",A1)+1,1))),"Amex",IF(ISNUMBER(SEARCH("c",MID(A1,FIND("-",A1)+1,1))),"Citi Bank",IF(ISNUMBER(SEARCH("w",MID(A1,FIND("-",A1)+1,1))),"Wells Fargo", "Other")))
However, please strongly consider putting a list together that you can reference using an INDEX and MATCH. Consider the following in columns D and E:
D E
-------
a Amex
c Citi Bank
w Wells Fargo
Then just use this formula:
=IFERROR(INDEX(E:E,MATCH(LOWER(MID(A1,FIND("-",A1)+1,1)),D:D,0)),"Other")
If you do need to remove formatting you can look into the SUBSTITUTE formula

force formatting imported text file in excel

I've got text file containing MCQ quizzes. Currently I have to edit all the quiz questions with delimiters (e.g TABS) before importing them into excel.
I need an automated way to format the imported file into these columns:
QuestionId, ExamType, Year, Subject, Question, Answer1, Answer2, Answer3, Answer4, Answer5, CorrectAnswer, Image
without having to manually edit the text first.
Here's an example of the text I'm currently importing into excel.
1.Government as an art of governing refers to the process of A.ruling people in the society b.establishing political parties C. providing free education D. acquiring social skills, 2. An essential feature of a State is A. availability of mineral resources B. developed infrastructure. C an organized system of laws D. developed markets.
I'd like to fit example 1 and 2 into the columns above. I have zipped up what I've been doing so that you can have a look. I also included the raw quiz data so that you can have an idea what it is that i'm trying to force format
Try This:
Open up your excel document and hit Alt+F11 to bring up the VBA editor, insert a new module (if one doesn't already exist), open it up, and paste in the following code (they're custom user defined functions that we'll use in a little bit)
Function LEFTDELIMIT(ByVal text As String, ByVal delimiter As String)
Dim position As Integer
Dim leftText As String
position = InStr(1, text, delimiter, vbTextCompare) - 1
leftText = Left(text, position)
LEFTDELIMIT = leftText
End Function
Function RIGHTDELIMIT(text, delimiter)
Dim position As Integer
Dim rightText As String
position = Len(text) - Len(delimiter) - InStr(1, text, delimiter, vbTextCompare) + 1
rightText = Right(text, position)
RIGHTDELIMIT = rightText
End Function
Function NOERROR(text)
If IsError(text) Then
NOERROR = ""
Else
NOERROR = text
End If
End Function
I'm guessing at this point that all of the text for your quizzes is in A1. Go ahead and delimit that cell by commas as I specified earlier to get each question in its own column. Since we want each of those questions to occupy its own row, highlight all of row 1 and copy and then paste special into A2 and select the option to transpose the values. Now each question has it's own row. Now what we'd like to do it give each answer choice its own column. We can use the custom functions from before which allow you to get all the text to the left or right of a custom delimiter.
If the value of A2 is
1.Government as an art of governing refers to the process of A.ruling people in the society b.establishing political parties C. providing free education D. acquiring social skills
Then we'll fill out the other columns with the following code:
B2: =LEFTDELIMIT(A2,"A.")
C2: =NOERROR(PROPER(TRIM(LEFTDELIMIT(RIGHTDELIMIT(A2,"A."),"B."))))
D2: =NOERROR(PROPER(TRIM(LEFTDELIMIT(RIGHTDELIMIT(A2,"B."),"C."))))
E2: =NOERROR(PROPER(TRIM(LEFTDELIMIT(RIGHTDELIMIT(A2,"C."),"D."))))
F2: =NOERROR(PROPER(TRIM(LEFTDELIMIT(RIGHTDELIMIT(A2,"C."),"D."))))
G2: =NOERROR(PROPER(TRIM(LEFTDELIMIT(RIGHTDELIMIT(A2,"D."),"E."))))
This is making certain assumptions about how the incoming data is formatted. If this doesn't work, please show all of you work in an excel file that you can upload and distribute through any online file sharing host to get a better look at what particular errors might be tripping you up

Resources