I'm trying to write a macro in Excel to find the average and standard deviation of typos in a natural language text data set in tab delimited format (a set of Tweets, specifically). I can find the average in Word easy enough by doing a CTRL+F for tabs to get number of messages and looking at the total number of errors listed in SpellCheck. This doesn't help with SD though. Purpose-built language analytics software can't search for general non-words without counting things like disfluencies ("ugh", "ach") as far as I can tell.
I can't figure out how to include spelling and grammatical errors in the Excel macro or how to break them apart by cell.
The data set is big enough that I don't mind minor inaccuracies (they shouldn't vary systematically between conditions).
This tool could be adjusted to evaluate basic writing skills or to compare non-standard uses of English in sufficiently large writing samples. Any help is appreciated.
Since Word has the ability built-in to detect spelling and grammatical errors, you could create a cross-program script in Excel. You would just have Word do the language processing piece, and Excel do the statistical analysis. You would need to enable the Microsoft Word 15.0 Object Library from the Tools > References menu in the Excel VBE.
VBA in Word allows you to detect if there's a spelling error. See this link:
https://msdn.microsoft.com/en-us/library/office/aa171830(v=office.11).aspx
The logic behind the code would be:
From Excel, open a new Word document.
For each Tweet, copy the content of the cell and paste into the blank Word doc.
Have Word scan the document for errors, returning True if detected, False if not.
In Excel, if you get a True value, insert a 1 next to the tweet, and a 2 if False.
Clear all content from the Word doc.
Go to the next tweet (next cell) in Excel, and do steps 2-6 until each tweet has a 1 or 2 next to it.
You should be able to correlate the occurrences of language errors with other variables, such as, e.g., the Twitter handles.
Essentially, have each program do what it does best.
Related
I am currently working to provide an extensive xls-based form, which is then to be filled by various users and submitted to a committee deciding about the files.
The form will need to be printed for this last step, and as some sheets are limited to a couple lines only and others could flow over to a 2nd or 3rd page depending on free text user input, I see the risk of having a lot of white space in the printouts (e.g. three quarters empty pages).
I have considered/am considering some ideas to get this done:
Moving all individual sheets to one sheet, which is then printed more conveniently - probably not a good idea because of the variety of needed column widths making it impossible to properly format anything
Possibly coding in VBA to copy the individual sheets (by print areas) to a word doc, which is then printed easily - I know basic VBA, but would definitely need guidance as to how this could be achieved
Does anyone on here have other ideas or guidance on the ones posed above?
Many thanks in advance
I would have thought this one would be asked to death so cannot see a solution - looking for a way to live link PowerPoint to Excel data, only for a word within an otherwise manually typed sentence.
I am not asking how to live link a chart or a table, I am asking how to have a live field within otherwise static text.
E.g. In a text box, there's the sentence "Revenue increased by 10% over the period, an improvement from the 7% increase over the prior period" and have only the '10%' and the '7%' be linked to two Excel cells.
I have seen that this is possible in the following pieces of software:
https://www.youtube.com/watch?v=mUGqgsT4gHU (skip to 20sec)
https://www.presentationpoint.com/blog/dynamic-text-boxes-powerpoint/
Doesn't seem like it's do-able in VBA though I'm comfortable in .NET too and have not been able to work out how this works, so any suggestion in either most welcome.
There are multiple suggestions to the effect of copy a cell and then paste-special - this does not allow you to embed the number in the sentence, it only allows you to past the cell in, which you would then have to type around. In the two links above, it is properly embedded and this is the type of solution I am after.
I think this is very do-able in VBA. The program on the website is of course very sophisticated, but you could very easily replicate some of the functionality.
For example by using tokens. You could enter something like "Best beer in city: #beerBrand#". Then you would iterate through your columngs and just search and replace the tokens.
If I get the program functionality right, what they do is ask the user to enter a prefix and sufix for each variable. That makes it even easier because you have the three parts of the sentence separate and you can just alsways replace the variable in the middle.
Would one of the approaches work in your case?
I want a excel macro which search words in PDF and give the page number where macro finds the words. I have 20 words that I want to search in PDF. I have put the keywords in coulmn A of the excel spreadsheet and I want to populate the page number in coulmn b. Please note that I am currently using Adobe reader XI, so please help me with the code which also work in Adobe reader XI.
This is more of a direction and not an answer.
Try searching for command line tools that will export ocr data into a text file. I've looked at them before and a few gave me the option of looking at the particular page of a pdf. All of these tools require a purchase (I was trying to OCR a barcode and I could not find a free tool for this) but there are some free ones out there.
But using excel will make this project harder. I would look at using powershell or some other scripting language and exporting the results into a csv file.
Hope this helps.
I need to find and copy a word(s) in a string. The condition is that the word is an incorrect one. Essentially, it's something like copy all words that has wiggle red underline in browser,MS Words, etc.
I am doing this to extract the brand names in hundred of thousand of free text cells. Since the brand names are usually not words in dictionary (for searchability and identifiablity) , this approach would help find the majority of them.
It doesn't have to be an excel functionality, I am open to any tool that works.
moving them directly into excel is tedious, shown by the link in the previous answer. If you would like a generated list of the misspelled words, follow the instructions on this site:
http://www.techrepublic.com/blog/microsoft-office/a-word-macro-that-highlights-and-lists-misspelled-words/
The code copies the misspelled words into a new document for you, so they will be isolated from your original document. Then you can apply any formatting or data analyses if you need it.
First, I dont have any experience with programming. If I ever start, then this would be probably my first. I keep looking for answer until I found this site.
I am looking outside the box because in excel doing a data of 1 million + row and 20 + column would take a very long time just to wait for the calculation to be done and the copy and paste with formula would take longer. Imagine I have to let the computer running for 8+ hours with the helps of marco and F4 (repeat). All my formula have to paste into number only with I have done with the formula. And even I break the files into piece, the files sizes are 20MB to 110MB without active formula. Opening the file is taking forever.
I wonder how to write a programme with 1) dialog box, 2) the excel command and formula (sort, delimiter, concatenate), 3) ability to create graph, 4) with tabs to view different set of data or graph 5) add in a set of data 6) limiting the number (1-100000), etc. Outlook something look like utorrent.
What compiler suitable for this programme? It's easier you tell me which 'book' to read that me finding which 'book' is suitable because even if it is I might flip it through and go on to the next one. 'book' may refer to book, way, steps, etc.
I'm not sure what you actually want. With 1M+ rows and 20+ columns, an Excel sheet doesn't seem to be the right tool for the job. So do you...
want to keep using Excel, but automate the job? Use Excel VBA like renick suggested. It's the language that Excel uses internally for macros, but you can write any kind of automated processing you'd like. Beware, however, that VBA is not exactly the best language to start a programming experience with. (That's my personal opinion, and what matters is of course whether you get the job done).
want to switch to something else? A database management system seems better suited for the amount of data you have. Microsoft Access is part of Office and might already be on your system. Getting your data into and out of the database could be a problem, but the advantage you have is that a database is built to handle colossal amounts of data and will happily munch your figures for several days without failing. You can access the data using the Structured Query Language (SQL), which is not really a programming language, but very powerful (and it most certainly has CONCATENATE, ADD etc.). Graphing is more difficult, but can also be done.
If you know excel then Excel VBA is a VERY capable language to do all this. I would suggest you go to the VBA Dev Center here to get started.
I can't believe I'm about to say this (for most things I do it would be the wrong choice) but:
If the computations aren't that complex (just lots of them) Python might be a good bet.
If you can get the input as a CSV file than, for about 10 lines of code, you can write a loop that will be run for each line of input and hands you the values to play with.
for line in open('filename', 'w')
values = line.split(',')
#values has the values from this line as strings.
#these can be converted to numbers:
x = float(values[0])
n = int(values[1])
#... and then processed
That might not be the cleanest/best approach but it's simple and straight forward.
p.s. For 1M+ rows, don't expect it to be blazing fast (10 sec to a min or so, depending on what you do to the data)