force formatting imported text file in excel - excel

I've got text file containing MCQ quizzes. Currently I have to edit all the quiz questions with delimiters (e.g TABS) before importing them into excel.
I need an automated way to format the imported file into these columns:
QuestionId, ExamType, Year, Subject, Question, Answer1, Answer2, Answer3, Answer4, Answer5, CorrectAnswer, Image
without having to manually edit the text first.
Here's an example of the text I'm currently importing into excel.
1.Government as an art of governing refers to the process of A.ruling people in the society b.establishing political parties C. providing free education D. acquiring social skills, 2. An essential feature of a State is A. availability of mineral resources B. developed infrastructure. C an organized system of laws D. developed markets.
I'd like to fit example 1 and 2 into the columns above. I have zipped up what I've been doing so that you can have a look. I also included the raw quiz data so that you can have an idea what it is that i'm trying to force format

Try This:
Open up your excel document and hit Alt+F11 to bring up the VBA editor, insert a new module (if one doesn't already exist), open it up, and paste in the following code (they're custom user defined functions that we'll use in a little bit)
Function LEFTDELIMIT(ByVal text As String, ByVal delimiter As String)
Dim position As Integer
Dim leftText As String
position = InStr(1, text, delimiter, vbTextCompare) - 1
leftText = Left(text, position)
LEFTDELIMIT = leftText
End Function
Function RIGHTDELIMIT(text, delimiter)
Dim position As Integer
Dim rightText As String
position = Len(text) - Len(delimiter) - InStr(1, text, delimiter, vbTextCompare) + 1
rightText = Right(text, position)
RIGHTDELIMIT = rightText
End Function
Function NOERROR(text)
If IsError(text) Then
NOERROR = ""
Else
NOERROR = text
End If
End Function
I'm guessing at this point that all of the text for your quizzes is in A1. Go ahead and delimit that cell by commas as I specified earlier to get each question in its own column. Since we want each of those questions to occupy its own row, highlight all of row 1 and copy and then paste special into A2 and select the option to transpose the values. Now each question has it's own row. Now what we'd like to do it give each answer choice its own column. We can use the custom functions from before which allow you to get all the text to the left or right of a custom delimiter.
If the value of A2 is
1.Government as an art of governing refers to the process of A.ruling people in the society b.establishing political parties C. providing free education D. acquiring social skills
Then we'll fill out the other columns with the following code:
B2: =LEFTDELIMIT(A2,"A.")
C2: =NOERROR(PROPER(TRIM(LEFTDELIMIT(RIGHTDELIMIT(A2,"A."),"B."))))
D2: =NOERROR(PROPER(TRIM(LEFTDELIMIT(RIGHTDELIMIT(A2,"B."),"C."))))
E2: =NOERROR(PROPER(TRIM(LEFTDELIMIT(RIGHTDELIMIT(A2,"C."),"D."))))
F2: =NOERROR(PROPER(TRIM(LEFTDELIMIT(RIGHTDELIMIT(A2,"C."),"D."))))
G2: =NOERROR(PROPER(TRIM(LEFTDELIMIT(RIGHTDELIMIT(A2,"D."),"E."))))
This is making certain assumptions about how the incoming data is formatted. If this doesn't work, please show all of you work in an excel file that you can upload and distribute through any online file sharing host to get a better look at what particular errors might be tripping you up

Related

Add a space after colored text

I'm Using Microsoft Excel 2013.
I have a lot of data that I need to separate in Excel that is in a single cell. The "Text to Columns" feature works great except for one snag.
In a single cell, I have First Name, Last Name & Email address. The last name and email addresses do not have a space between them, but the color of the names are different than the email.
Example (all caps represent colored names RGB (1, 91, 167), lowercase is the email which is just standard black text):
JOHN DOEjohndoe#acmerockets.com
So I need to put a space after DOE so that it reads:
JOHN DOE johndoe#acmerockets.com
I have about 20k rows to go through so any tips would be appreciated. I just need to get a space or something in between that last name and email so I can use the "Text to Columns" feature and split those up.
Not a complete answer, but I would do it way:
Step 1 to get rid of the formatting:
Copy all text that you have to the notepad
Then copy-paste text from Notepad to excel as text
I think this should remove all the formatting issues
Step 2 is to use VBA to grab emails. I assume that you have all your emails as lowercase. Therefore something like this should do the trick (link link2):
([a-z0-9\-_+]*#([a-z0-9\-_+].)?[a-z0-9\-_+].[a-z0-9]{2,6})
Step 3 is to exclude emails that you extracted from Step2 from your main text. Something like this via simple Excel function:
=TRIM(SUBSTITUTE(FULLTEXT,EMAIL,""))
Since you removed all the formatting in Step1, you can apply it back when you done
You can knock this out pretty quickly taking advantage of a how Font returns the Color for a set of characters that do not have the same color: it returns Null! Knowing this, you can iterate through the characters 2 at a time and find the first spot where it throws Null. You now know that the color shift is there and can spit out the pieces using Mid.
Code makes use of this behavior and IsNull to iterate through a fixed Range. Define the Range however you want to get the cells. By default it spits them out in the neighboring two columns with Offset.
Sub FindChangeInColor()
Dim rng_cell As Range
Dim i As Integer
For Each rng_cell In Range("B2:B4")
For i = 1 To Len(rng_cell.Text) - 1
If IsNull(rng_cell.Characters(i, 2).Font.Color) Then
rng_cell.Offset(0, 1) = Mid(rng_cell, 1, i)
rng_cell.Offset(0, 2) = Mid(rng_cell, i + 1)
End If
Next
Next
End Sub
Picture of ranges and results
The nice thing about this approach is that the actual colors involved don't matter. You also don't have to manually search for a switch, although that would have been the next step.
Also your neighboring cells will be blank if no color change was found, so it's decently robust against bad inputs.
Edit adds ability to change original string if you want that instead:
Sub FindChangeInColorAndAddChar()
Dim rng_cell As Range
Dim i As Integer
For Each rng_cell In Range("B2:B4")
For i = 1 To Len(rng_cell.Text) - 1
If IsNull(rng_cell.Characters(i, 2).Font.Color) Then
rng_cell = Mid(rng_cell, 1, i) & "|" & Mid(rng_cell, i + 1)
End If
Next
Next
End Sub
Picture of results again use same input as above.

Prevent Partial Duplicates in Excel

I have a worksheet with products where the people in my office can add new positions. The problem we're running into is that the products have specifications but not everybody puts them in (or inputs them wrong).
Example:
"cool product 14C"
Is there a way to convert Data Valuation option so that it warns me now in case I put "very cool product 14B" or anything that contains an already existing string of characters (say, longer than 4), like "cool produKt 14C" but also "good product 15" and so on?
I know that I can prevent 100% matches using COUNTIF and spot words that start/end in the same way using LEFT/RIGHT but I need to spot partial matches within the entries as well.
Thanks a lot!
If you want to cover typo's, word wraps, figure permutations etc. maybe a SOUNDEX algorithm would suit to your problem. Here's an implementation for Excel ...
So if you insert this as a user defined function, and create a column =SOUNDEX(A1) for each product row, upon entry of a new product name you can filter for all product rows with same SOUNDEX value. You can further automate this by letting user enter the new name into a dialog form first, do the validation, present them a Combo Box dropdown with possible duplicates, etc. etc. etc.
edit:
small function to find parts of strings terminated by blanks in a range (in answer to your comment)
Function FindSplit(Arg As Range, LookRange As Range) As String
Dim LookFor() As String, LookCell As Range
Dim Idx As Long
LookFor = Split(Arg)
FindSplit = ""
For Idx = 0 To UBound(LookFor)
For Each LookCell In LookRange.Cells
If InStr(1, LookCell, LookFor(Idx)) <> 0 Then
If FindSplit <> "" Then FindSplit = FindSplit & ", "
FindSplit = FindSplit & LookFor(Idx) & ":" & LookCell.Row
End If
Next LookCell
Next Idx
If FindSplit = "" Then FindSplit = "Cool entry!"
End Function
This is a bit crude ... but what it does is the following
split a single cell argument in pieces and put it into an array --> split()
process each piece --> For Idx = ...
search another range for strings that contain the piece --> For Each ...
add piece and row number of cell where it was found into a result string
You can enter/copy this as a formula next to each cell input and know immediately if you've done a cool input or not.
Value of cell D8 is [asd:3, wer:4]
Note the use of absolute addressing in the start of lookup range; this way you can copy the formula well down.
edit 17-Mar-2015
further to comment Joanna 17-Mar-2015, if the search argument is part of the range you're scanning, e.g. =FINDSPLIT(C5; C1:C12) you want to make sure that the If Instr(...) doesn't hit if LookCell and LookFor(Idx) are really the same cell as this would create a false positive. So you would rewrite the statement to
...
...
If InStr(1, LookCell, LookFor(Idx)) <> 0 And _
Not (LookCell.Row = Arg.Row And LookCell.Column = Arg.Column) _
Then
hint
Do not use a complete column (e.g. $C:$C) as the second argument as the function tends to become very slow without further precautions

Calculate alphanumeric string to an integer in Excel

I have an issue that I've not been able to figure out even with many of the ideas presented in other posts. My data comes in Excel and here are examples of each manner that any given cell might have the data:
4days 4hrs 41mins 29seconds
23hrs 43mins 4seconds
2hrs 2mins
52mins 16seconds
The end result would be to calculate the total minutes while allowing seconds to be ignored, so that the previous values would end up as follows:
6041
52
1423
122
Would anyone have an idea how to go about that?
Thanks for the assistance!
Bit tedious (and assumes units are always plural - also produces results in different order to example) but, with formulae only, if your data is in column A, in B1 and copied down:
="="&SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(A1,"days","*1440+"),"hrs","*60+"),"mins","*1+"),"seconds","*0")," ","")&0
then Copy B and Paste Special values into C and apply Text to Columns to C with Tab as the delimiter.
This array formula** should also work:
=SUM(IFERROR(0+MID(REPT(" ",31)&SUBSTITUTE(A1&"dayhrminsecond"," ",REPT(" ",31)),FIND({"day","hr","min","second"},REPT(" ",31)&SUBSTITUTE(A1&"dayhrminsecond"," ",REPT(" ",31)))-31,31),0)*{1440,60,1,0})
Regards
**Array formulas are not entered in the same way as 'standard' formulas. Instead of pressing just ENTER, you first hold down CTRL and SHIFT, and only then press ENTER. If you've done it correctly, you'll notice Excel puts curly brackets {} around the formula (though do not attempt to manually insert these yourself).
The easiest option is probably VBA with a regular expression. You can then easily find each of the fields, and do the maths.
If you want to stick to "pure" Excel, then it seems to only option is to use SEARCH or FIND to find the position of each of the "days", "hrs", "mins" in the text (you may have to check if they're always plural). Then use MID with the position found above to extract the different components. See http://office.microsoft.com/en-gb/excel-help/split-text-among-columns-by-using-functions-HA010102341.aspx for similar examples.
But there's quite a bit of work to handle the cases where some components are missing, so either you'll use quite a few cells, so you'll get a very complex formula...
Here is a User Defined Function, written in VBA, which takes your string as the argument and returns the number of minutes. Only the first characters of the time interval names are checked (e.g. d, h, m) as this seems to provide sufficient discrimination.
To enter this User Defined Function (UDF), opens the Visual Basic Editor.
Ensure your project is highlighted in the Project Explorer window.
Then, from the top menu, select Insert/Module and
paste the code below into the window that opens.
To use this User Defined Function (UDF), enter a formula like
=SumMinutes(A1)
in some cell.
Option Explicit
Function SumMinutes(S As String) As Long
Dim RE As Object, MC As Object
Dim lMins As Long
Dim I As Long
Set RE = CreateObject("vbscript.regexp")
With RE
.Pattern = "(\d+)(?=\s*d)|(\d+)(?=\s*h)|(\d+)(?=\s*m)"
.Global = True
.ignorecase = True
If .test(S) = True Then
Set MC = .Execute(S)
For I = 0 To MC.Count - 1
With MC(I)
lMins = lMins + _
.submatches(0) * 1440 + _
.submatches(1) * 60 + _
.submatches(2)
End With
Next I
End If
End With
SumMinutes = lMins
End Function

Copying all #mentions and #hashtags from column A to Columns B and C in Excel

I have a really large database of tweets. Most of the tweets have multiple #hashtags and #mentions. I want all the #hashtags separated with a space in one column and all the #mentions in another column. I already know how to extract the first occurrence of a #hashtag and a #mention. But I don't know to get them all? Some of the tweets have as much as 8 #hashtags. Manually going through the tweets and copy/pasting the #hashtags and #mentions seem an impossible task for over 5,000 tweets.
Here is an example of what I want. I have Column A and I want a macro that would populate columns B and C. (I'm on Windows &, Excel 2010)
Column A
-----------
Dear #DavidStern, #spurs put a quality team on the floor and should have beat the #heat. Leave #Pop alone. #Spurs a classy organization.
Live broadcast from #Nacho_xtreme: "Papelucho Radio"http://mixlr.com nachoxtreme-radio … #mixlr #pop #dance
"Since You Left" by #EmilNow now playing on KGUP 106.5FM. Listen now on http://www.kgup1065.com  #Pop #Rock
Family Night #battleofthegenerations Dad has the #Monkeys Mom has #DonnieOsman #michaelbuble for me #Dubstep for the boys#Pop for sissy
#McKinzeepowell #m0ore21 I love that the PNW and the Midwest are on the same page!! #Pop
I want Column B to look like This:
Column B
--------
#DavidStern #Pop #Spurs
#mixlr #pop #dance
#Pop #Rock
#battleofthegenerations #Monkeys #DonnieOsman #Dubstep #Pop
#pop
And Column C to look like this:
Column C:
----------
#spurs #heat
#Nacho_xtreme
#EmilNow
#michaelbuble
#McKinzeepowell #m0ore21
Consider using regular expressions.
You can use regular expressions within VBA by adding a reference to Microsoft VBScript Regular Expressions 5.5 from Tools -> References.
Here is a good starting point, with a number of useful links.
Updated
After adding a reference to the Regular Expressions library, put the following function in a VBA module:
Public Function JoinMatches(text As String, start As String)
Dim re As New RegExp, matches As MatchCollection, match As match
re.pattern = start & "\w*"
re.Global = True
Set matches = re.Execute(text)
For Each match In matches
JoinMatches = JoinMatches & " " & match.Value
Next
JoinMatches = Mid(JoinMatches, 2)
End Function
Then, in cell B1 put the following formula (for the hashtags):
=JoinMatches(A1,"#")
In column C1 put the following formula:
=JoinMatches(A1,"#")
Now you can copy just the formulas all the way down.
you could convert text to columns using the other character #, then against for #s and then concatenate the rest of the text back together for column A, if you are not familiar with regular expressions see (#Zev-Spitz)

Removing tags from formatted text in Excel cells

Walk with me for a moment.
I have built an Access application to manage data for an internal project at my company. One of the functions of this application is queries the database, then outputs the queries to an Excel spreadsheet, then formats the spreadsheet to spec.
One of the cells of the output is a large amount of text from a Rich Text Memo field in the database. When the rich text is sent to Excel it carries with it HTML tags indicating bold or italic, so for the output I have to add the formatting and remove the tags.
Here is an example of the text I need to format (this text is in a single cell):
For each participant, record 1 effort per lesson delivered
• Time Spent = # minutes spent on lesson
<strong>OR</strong>
For each participant, record 1 effort per month
• Time Spent = total # minutes spent on lessons that month
<strong>Note:</strong> Recording 1 effort per lesson is recommended but not required
<strong>Note:</strong> Use groups function in ABC when appropriate (see <u>Working With Groups</u> in ABC document library on the ABC portal)
I have a three neat little recursive functions for formatting the text, here is the bolding function:
Function BoldCharacters(rng As Range, Optional ByVal chrStart As Long)
'This will find all the "<strong></strong>" tags and bold the text in between.
Dim tagL As Integer
tagL = 8
rng.Select
If chrStart = 0 Then chrStart = 1
b1 = InStr(chrStart, ActiveCell.Value, "<strong>") + tagL
If b1 = tagL Then Exit Function
b2 = InStr(b1, ActiveCell.Value, "</strong>")
ActiveCell.Characters(Start:=b1, Length:=b2 - b1).Font.Bold = True
'Remove the tags
'ActiveCell.Characters(Start:=1, Length:=1).Delete
'ActiveCell.Characters(Start:=b2 - tagL, Length:=tagL + 1).Delete
'Recursion to get all the bolding done in the cell
Call BoldCharacters(ActiveCell, b2 + tagL + 1)
End Function
Now here's the issue. This formats the text nicely. But the "ActiveCell.Characters.Delete" method fails when I attempt to use it to remove the tags because the cell contains more than 255 characters. So I can't use the delete method.
And when I do this:
With xlApp.Selection
.Replace what:="<strong>", replacement:=""
The tags are all removed, but the formatting is all destroyed! So what's the point!?
I'm looking for a way of formatting my text and removing the tags. I'm considering taking the large bit of text and 'chunking' it up into a number of cells, processing the formatting and re-assembling, but that sounds difficult, prone to error, and might not even work.
Any ideas!?
Thanks!
You might want to remove the formatting before exporting the data to Excel. At the same time that you remove the formatting, store the formatting information (location, length, style) to a data structure. After you export the "plain text" data you could then iterate over your structure and apply the formatting in Excel. This could be a time consuming process depending upon how many records you plan on exporting at a given time, but it would remove the limitation imposed by Excel.
If it's well formed html (ie it always has closing tags) then you could use a regular expression.
Dim data As String
data = "For each participant, record 1 effort per lesson delivered • Time Spent = # minutes spent on lesson <strong>OR</strong> For each participant, record 1 effort per month • Time Spent = total # minutes spent on lessons that month <strong>Note:</strong> Recording 1 effort per lesson is recommended but not required <strong>Note:</strong> Use groups function in ABC when appropriate (see <u>Working With Groups</u> in ABC document library on the ABC portal)"
Dim r As New RegExp
r.Pattern = "<(.|\n)*?>"
r.Global = True
Debug.Print r.Replace(data, "")
To use the RegExp object, set a reference to Microsoft VBScript Regular Expressions 5.5.
hth
Ben
Something along these lines might be useful:
Sub DoFormat(rng As Range)
Dim DataObj As New MSForms.DataObject
Dim s As String, c As Range
For Each c In rng.Cells
s = "<html>" & Replace(c.Value, " ", " ") & "</html>"
DataObj.SetText s
DataObj.PutInClipboard
c.Parent.Paste Destination:=c
Next c
End Sub
You'll need a reference to "Microsoft Forms 2.0 Object Library"

Resources