I have never used Visual Basic before but could do with a pointer on where to begin.
I have 750 excel spreadsheets that contains various amounts of data of different types. The columns are always the same, but the number of data rows vary per spreadsheet. I need to extract data and put it into two new spreadsheets.
Obviously to do this 750 times manually would be a nightmare. I just want to run a script that can do it for me and thus thought of Visual Basic although i've never used it before.
My specific questions are:
What type of command should i research that would allow me to copy data where the row number to start at varies (as data above varies in no of rows). There is a title before this new data - how can i get it to search for this title and then choose the row below?
Would all my spreadsheets have to be in one folder so that the script goes through them all, or can i have some kind of folder structure in that folder too?
Anyone recommend any good resources for me to get to grips with visual basic and grasp what i need to do?
thanks
Tom
So the compilation task got easier with the introduction of MS PowerQuery. If you are using MS Excel 2013, you already have this. If no, you should download it and use the extension from MS.
The following guide outlines how to Using Power Query to Combine Data from Multiple Excel Files into One Table. This means that with Power Query (PQ), MS has taken and enabled easy aggregation using a few simple button clicks. PQ is a lightweight alternative to a lot of tasks that used to require VBA.
In this example, you will use PQ to point to an entire folder (750 should be no problem) worth of commonly formatted Excel files. The only limitation is that each data file should have a similarly named tab.
I won't repeat the details of the guide for how to do it, as it is in-depth and visual. But if you run into issues, get in touch.
Related
I need to send RFQs to various vendors using their specific forms which are Excel formatted. I need specific information from my co-workers in order to properly fill out the vendor's forms. I was thinking about using MS Forms to gather the specific info I need from my co-workers, then hoping some how that data could easily/automatically be transferred to the vendor's specific Excel form based on the MS Forms responses. The responses received determine which vendor form, and in most cases multiple vendor forms, to use. Then each completed form saved as its own file.
I was looking at different MS Flows, PowerApps, and Power Automate templates that have already been created, but I'm sure one matches my needs or if any of those are the best solution. I did watch some videos about how to create your own MS Flow/PowerApp, but I wasn't sure it was going to be a viable solution. I am hoping to streamline the action of copying & pasting the data, but I'm not sure how I would go about setting up a way, if there is one. Or if there is a "dummies" how to way or if the way would be over my head.
Background knowledge/experience: I have zero experience or base knowledge of coding. I can record macros, but cannot edit the coding, I have to re-record it from scratch. I can do simple IF formula's & Pivot Tables in Excel. I tried for a minute to teach myself PowerBi, but think knowing SQL first would be better from my understanding. Haven't dived down rabbit hole of trying to teach myself SQL, if that's even possible without some base coding knowledge. I want to learn these things and how to do more, but lack of time is a factor. I piece and squeeze in micromillimeters of knowledge in when I can from Googling and YouTube. I haven't had much luck Googling this, because what I'm typing in the search bar isn't producing helpful results as far as I can tell.
Microsoft's official site has an explanation on how to use scenarios in Excel.
If you name the input cells, the scenario manager will show the name, so it's easier to remember that $C$5 is, say, the price.
My question is: is it possible to set up the scenarios in a table somewhere in Excel, and get the scenario manager to read from there? Setting multiple scenarios in the scenario manager is very fiddly, time-consuming and error-prone, especially when the inputs are linked - e.g. setting 10 scenarios where each scenario is an x% change from the previous.
Any suggestions?
PS I know all these things can be done very easily in a scripting language like Python or R, but in this very specific case the calculations are not too complex and the file needs to be shared with other people, so I must use Excel.
VBA would be a last resort because some of these people have VBA disabled by default.
Edit
To clarify, what I'd need is a way to create a table like this below, where those in blue are the inputs, and those in grey are the outputs. I have put together a banal example below, along the lines of the example in the VBA macro answer given below, but the general idea is:
define a number of scenario as the combination of multiple inputs (more than 2) ;
create a table showing, for each scenario, the inputs and some key outputs;
note the table doesn't have all the possible combinations of all the inputs, like the macro given in one of the examples - that would be too much and wouldn't be very readable.
I could put together a quick VBA script that changes the inputs in the model, reads the result and creates the table, but I was wondering if there is a better way - VBA is typically not very robust, in the sense that just changing the location of one cell can often mess things up. I usually avoid Excel for the more complex models (this would be banal in any scripting language), but this I have to do in Excel.
EDIT #2:
Trying to further clarify what I have in mind, I have put together the screenshot below. Each output is the result of many different calculations, and CANNOT be calculated as a small, simple formula - if it could, I would not have any issue, of course!
My issue is that:
- if I change an input, then all the many many calculations occurring behind the scenes change
- the outputs are read from all those calculations
- I cannot use two-way what-if tables
If even this is not clear, the only other thing I can try is to upload an Excel file, which is generally discouraged on SO.
Scenario Manager is a built in function with it's own GUI.
For this reason, the function will be limited in what it can call (only data entered in the GUI)
VBA will allow you to manipulate this data, telling it where to pull the changing values and what data to change it by
So the answer for your specific query:
Can I use Excel without VBA to perform Scenario Manager tasks not set by the GUI?
No.
But it doesn't mean fiddling with the Manger itself would be horrendous. There are ways to teach and learn with it, but also if you save a macro enabled document, users should be able to turn the macro on with the click of a button - so VBA can be an option too
I hope this helps?
We get at least 20 queries a day on an average from our clients, where in we have to open and look at data on 4 to 5 Excel sheets to answer them. questions such as what is my available balance, am i eligible for this etc. All our clients are connected to our intranet and have access to internet. I was wondering is there a way where we can develop a front end app (do not have budget for MS VB or any other) either in excel or any other to connect these 4 to 5 excel sheets to retrieve the data in response to queries (e.g. using perhaps some if and true/false queries). I am not an advanced Excel user but would be great for an advice from tech experts.
Yes, i wouldn't call it an app but consider a worksheet like a dashboard. You can have a cell for entering client name, and then use formulas to look up relevant information of the name entered. The cosmetic and arrangement of the information retrieved and published on the dashboard is up to you and of course do consider investing some time in the looks and feel if you want to enjoy using it.
Things you may consider are:
Place the files are kept and file name convention
because your dashboard will look for information in external workbooks, ensure that the files are saved in a fixed directory and have a specific file name. if the external files are updated from time to time by other folks, let them know too that they have to save it in a particular folder with a specific name format.
Properly structure the source of data
Format you data source into tables so that it is easier for use with formulas. Throw away titles if any in the data source worksheet. Use tools like "Table" under the INSERT tab. When data are properly organized, they can be easily looked up using formulas such as VLOOKUP, SUMIFS, MATCH-INDEX, and COUNTIF.
Be good with formulas
Since we have no budget for VB, then good formulas will be needed. There are plenty of help on the internet for this I think you'll have no problem in it.
Employ sanitary check measure
It is difficult to tell if our formula isn't functioning properly when we have no counter check measure. Certainly you want to give your clients accurate information. One way to check is, think of alternative ways to get the information wanted and check if it matches to the first way. Another way is to retrieve a sequence of related information to be put on the dashboard, then do simple calculation to check if the numbers add up. Use conditional formatting to highlight errors if necessary.
I think these are key consideration, there may be more, but this is what i can think of for the moment.
I'm considering replacing a (very) large body of Office-automation code with something that works with the Office XML format directly. I'm just starting out, but already I'm worried that it's too big a task.
I'll be dealing with Word, Excel and PowerPoint. So far I've only looked at Word and Excel. It looks like Word documents should be reasonably easy to manipulate, but Excel workbooks look like a nightmare. For example...
In Word, it looks like you could delete a paragraph simply by deleting the corresponding "w:p" tag. However, the supplied code snippet for deleting a row in Excel takes about 150 lines of code(!).
The reason the Excel code is so big is that deleting a row means updating the row indexes of all the subsequent rows, fixing up the "shared strings" table, etc. According to a comment at the top, the code snippet is not even complete, in that it won't deal with a workbook that has tables in it (I can live with that).
What I'm not clear on is whether that's the only restriction that the sample code has. For example, would there also be a problem if the workbook contained a Pivot Table? Or a chart that references data from the same sheet? Or some named ranges? Wouldn't you also have to update the formulae for any cells (etc.) that referenced a row whose row index had changed?
[That's not to mention the "calc chain", which (thankfully) I think you can simply delete since it's only a chache that can be re-built.]
And that's my question, woolly though it is. Just how hard do you have to work do something as simple as deleting a row properly? Is it an insurmountable task?
Also, if there are other, similar issues either with Excel or with Word or PowerPoint, I'd love to hear about them now, before I waste too much time going down a blind alley. Thanks.
Having worked with the Open XML SDK 2.0 for almost two years now I can say that doing seemingly trivial tasks can take many hours and sometimes days to figure out how to do it properly. For example, deleting an Excel row should be fairly straightforward and easy to do right? Nope because not only do you need code to delete your row, but then you have to update all the row indices, update any merged cell references, update hyperlink references, etc. Our internal delete method is close to 500 lines of code to just delete a row and I'm sure we don't have all the cases accounted for either.
The biggest complaint I have is the lack of documentation on how to do the most common tasks. The MSDN section on the Open XML SDK is very limited and whenever you need to do anything complicated you are really on your own. I've had to read the Open XML standard a lot to figure out what certain elements mean and how they should be implemented since I could find very little online.
The other challenging part is if you insert an element in a spot where it doesn't belong or put an invalid attribute on an element you will get a corrupt file when you try and open it. Most of the time you will not get any information on what caused the error and you will have to look at the Open XML standard spec to see what you did wrong.
If you need a fast turnaround time on converting that Office automation code into Open XML and what you are doing is not really basic, then I would say pass. If you have time and the patience to read up on the Word, Excel and PowerPoint XML structures and get familiar with how they relate then I say go for it. In my opinion it is really the only way to have very fine control over these office documents, but there will be a great learning curve when you start.
Oh and just for fun here is how much code is needed to add a comment to an Excel cell.
Just for completeness, here are some libraries I found for working with Excel XML:
www.extremexml.com - a layer on top of the Open XML SDK classes; focusses on injecting data into an existing spreadsheet; handles many of the cross-reference problems I identified in my question. Open source but GPL2 not LGPL. Code looks nice, and documentation is excellent. Does not appear terribly active on codeplex though.
Closed XML - another layer on top of the Open XML SDK - again open source, but with a less restrictive license (MIT). Looks nice, and looks more "active" than the above.
SpreadsheetLight - from what I can tell, a closed-source library sitting atop the Open XML SDK classes. Targeted more at those looking to create a spreadsheet from scratch rather than making changes to existing spreadsheets.
Here is another third party library dedicated to working with OpenXML:
http://www.officewriter.com
In the example cited by amurra above of deleting Excel spreadsheet rows, this is a single method call with this tool. It updates formulas and all the other references for which it seems that 500 lines of code would be required for otherwise.
The OpenXML SDK itself is a great tool for very simple things, but you still have to concern yourself with a lot of the internals of the file format and packaging structure to get things really right.
Here are some additional libraries that can manipulate with OOXML formats:
- GemBox.Spreadsheet (XLSX)
- GemBox.Document (DOCX)
Also GemBox published some articles that demonstrate how to manipulate with OOXML file format with pure .NET (without a use of any library), I think you'll find this interesting:
www.codeproject.com/Articles/15593/Read-and-write-Open-XML-files-MS-Office
(Introduction to SpreadsheetML format and an explanation on how we can read and write worksheet's cell content)
www.codeproject.com/Articles/649064/Show-Word-File-in-WPF
(Introduction to WordprocessingML format and demonstration on how we can read document's text)
My current employer (to remain nameless) has a collection of incredibly sophisticated Microsoft Excel 2003 worksheets (developed by contractors, also to remain nameless).
The employer is replacing the Excel-based solution with a SalesForce-based solution (developed by other contractors, likewise to remain unnamed). The SalesForce solution is also very complex using dozens of related objects and "Dynamic SOQL" to contain the data and formulas which previously was contained in the Excel-based solution.
The employer's problem, which has become my problem, is that the data from the Excel spreadsheets needs to be meticulously and tediously recreated in .CSV files so it can be imported into SalesForce.
While I've recently learned I can use CTRL-` to review formulas in Excel, this doesn't solve the problem that variables in Excel have cryptic names like $O$15. If I'm lucky, when I investigate $O$15, I'll find some metadata explaining if n cells up and/or some other data m cells to the left, and/or (in rare instances) there may be a comment on the cell.
Patterns within the Excel spreadsheets are very limited, rarely lasting more than 6 concurrent rows or columns and no two sheets which need to be imported have much similarity.
Documentation of all systems are very limited.
Without my revealing any confidential data, does anyone have any good ideas how I might optimize my workflow?
It's not clear exactly what you need to do: here are 3 possible scenarios, requiring increasing knowledge of Excel.
1. If all you want is to convert the Excel spreadsheets into CSV format then just save the worksheets as CSVs.
2. If you just want the data and not the formulae then it would be simple (using VBA) to output anything that isn't a formula (the cell.Formula won't start with =).
3. If you need to create a linkage excel-->csv-->existing Salesforce objects/SOQL then you will need to understand both the Excel Spreadsheets and the Salesforce objects/SOQL that have been created. This will be difficult unless you have good knowledge and experience of Excel and also understand what the salesforce App requires.
Brian, if you're still working on this, here's one way to approach the problem. I use this kind of process often for updating data between SFDC and marketing automation apps.
1) Analyze the formulae that you're re-creating in Salesforce.com to determine what base data fields you need (stuff that doesn't have to be calculated from something else.
2) Find those columns/rows in your spreadsheets and use Paste Special -> Values in a new spreadsheet to create an upload file with values instead of formulae that you need for each data area (leads, prospects, accounts, etc.)
3) If you have to associate the info with leads or contacts or accounts and you have already uploaded or created those records in Salesforce.com, be sure to export them with their ID numbers. That makes it easy to use the vlookup formula in Excel to match up fields that you need to add and then re-upload the data into Salesforce.
Like data cleaning, this can be a tedious process. But if you take it step by step it shouldn't be too hard. Good luck.