Can an excel worksheet be used as UDF? - excel

I'm building a network business model in excel. A similar model is that of Gawker Media.
In my model I have a number properties that have some over lap of audience. Each property attracts users, which in turn affords cross promotional opportunities. In the case of Gawker they have a series of blogs whose audience will likely read several of their blogs in their network.
If gawker launched a new blog they're able to direct traffic from their blog network.
Creating a model for a single blog is fairly simple - although the initial assumptions are harder. The next step is to model the network effect.
Excel provides a scenarios manager that allows me to vary the key assumptions in the basic model. This is almost perfect, I can model the launch of 10 properties, each with different launch assumptions and see the summary.
Where I need help is figuring out how I can vary the initial number of users for the launch of each property. In other words, once the network is established, its possible to drive people to any new property launched on the network.
I don't believe the scenario manager will do what I need.
So, I'm wondering if its possible to use the model work sheet as a UDF? The UDF would need to spit out the monthly revenue and unique users given a number of input assumptions.
I would then be able to create my own summary sheet for the 10 properties and using the total uniques for each property get a summary for the network. This network summary would be used to determine how many people could be driven to the launch of a new property.
In effect, the only difference to the scenario manager is that I need one of my input variables (initial users) to be programmatically generated as a function of the number of people in the network at the time of launch.
I'm hoping its possible to achieve something along these lines in excel. I could drop down and create the whole model in Java, but then its much harder to share with business colleagues!
Thanks - Matt.

You could try Data Table.
It only allows you to analyse the effect of varying 2 input parameters, but you can create several data tables, and each parameter can take hundreds of different values.
It's little know, but efficient and available since Excel 3.0.

There is a product that I have researched but never used - search for calc4web. It takes a sheet of formulas and generates code (C++) that can be compiled into an XLL add-in. Then you can call a function that does what your sheet does. But of course then you have an XLL to distribute, and a build step every time you change your logic, which defeats much of the point of using a spreadsheet.
In my case, I wound up writing some very simple VBA code to vary my sheet "inputs" using the scenario manager, and capture my "outputs". This works if you have a batch of inputs that you can just point your macro at and step through.
EDIT:
See here for a VBA-only example of doing this:
using a sheet in an excel user defined function

Related

Large Excel File - Alternate solution to receive inputs from multiple users at the same time

I manage a huge 50 mb Ms-Excel model which has input, calculations and output/report sheets (total 63 tabs and macros).
Since the model is huge, the users are unable to open the file and feed their inputs at the same time. Users would like to feed inputs and refresh the calculations to monitor their respective product report sheets and then save it.
Because of this, we have to maintain a time table for product owners to feed the file one after another. This delays the entire process.
Can anyone please suggest an alternate solution to receive the input from the users and update the file at the same time?
Kindly let me know if you need any other details.
Thanks for your help !!!
There would need to be more details to describe the environment you are working in to get a specific answer. To get to a more specific answer, there would have to be a description of what the macros are doing, how the outputs use the inputs and where the workbook is stored and used.
In all cases, it seems that you need to split the analytics from the inputs. With the description you have provided, I would think that among the myriad approaches (with some presumed constraints based on your description), you could use Power BI to split the input content from the output results. I can give a crude example that is based on a lot of assumptions:
Each Product Owner would have a personal input workbook.
They would save their own personal workbook that would contain their inputs using a common and uniform structure in a common place:
Put them in a common OneDrive folder (there is also a Box API, but more complex to implement)
Put them in a common Sharepoint directory
Build a separate and common PowerBI based analytics output § that all product owners can open and refresh that reads all of the inputs from the common directory or folder using the built-in Power Query capabilities.
shape the inputs and combine them into a set of common tables that the analytics computations can use
the Power BI analytics can be opened simultaneously by many users who can refresh it to get the most up-to-date view that includes the content from their colleagues.
Each time a product owner makes an update, they must save it and refresh the analytics to see the results. Until it is saved, none of their colleagues will see the update either.
§ - depending on the analytics you are performing, this can be done in Power BI Desktop or Excel for Windows or Excel mounted in Sharepoint.
Note that step 3 can also be added to each individual workbook from the product owners, so that they could receive the analytics output from their own personal workbook. The analytics engine would use Power Query to read in the latest results from their colleagues.
But my suspicion is that what I described above would be a terrible user experience because the Power BI would be constantly applying the ETL with each refresh. Without knowing more about the environment, what the macros are doing, and how the analytics working, it is hard to suggest a really good solution. I suspect that your answer could be found in Sharepoint or Azure Data Factory or a CI/CD tool.

Optimiser for excel spreadsheet

I'm a mechanical engineer, and I have developed a pretty cool spreadsheet that I use to size some steel members for lifting beams. The set back is that I need to do some trial and error in the selection of the member until I get one that gets as close to the allowable limits as possible.
What I'm hoping to improve on is to develop a function that based upon a length and weight variable that I enter, the program runs a loop and automatically selects the best member size(s) based upon a list of the members and their physical properties. Is this possible?
Yeah, depending on the complexity, either a simple search through parameters (less than, more than etc) might bring you the answer. You can do it quite easily via Pandas library. Just load up the excel as pandas DataFrame (pandas.read_excel()), which then will allow you to perform the searches on that DataFrame object.
If you want to run some optimization algo, you should look into SciPy's optimize to get what you're looking for based on the input data (it handles unconstrained and constrained functions).
Of course, the question you've stated is quite general, so I only pointed the direction. More info would be better.

Automating Raw Export Data Cleansing for Client Onboarding - Format is Always Different

So a bit of a general question. I work as a data analyst for a startup. My primary process involves taking existing customer data a client has and cleansing/normalizing it to fit into our platform once as part of our onboarding process. A member of our team exports their data from their system they are transitioning from or, if they kept track of it in house, we receive their Excel log they used to track it. It is always in a different format and requires extensive cleansing (avg 1 min/record). We take what is usually one large table (.xlxs format), and after cleansing, split it into four .csv files; which we load as four tables on our platform.
I feel I have optimized the process quite well in terms of the process steps and cleansing with excel functions (if, concat, text-to-columns, etc). I have beginner-intermediate skills in VBA and SQL and have just scratched the surface in R; what is frustrating is that I know there is the potential to automate this process but I just don't know where to start. If anyone has experience with something like this, code, a link to an article / another thread, or just some general direction would be much appreciated. Please ask for clarification where you feel it is needed. Thanks.
This will be really hard to do in Excel. If you have the time you can try out Optimus, a Data Cleansing library written in Python and Pyspark (you don't need to know spark). Here is the webpage https://hioptimus.com.
You can create Data Pipelines with it, and I recommend that you do that, try to generalize your processes, and asking the client for more a structure way of passing the data.
The good thing is that you don't need Big Data for running Optimus, bit if you have it some day, the same code will work.
Check out the documentation for more:
http://optimus-ironmussa.readthedocs.io/en/latest/
Let me know if you have doubts!

How to generate a custom Word-based report with different client data? (No Mail Merge)

What I want to do: Generate a report in Word based on unique data that I manually enter for different clients.
I collect at least 100 variables of data for different clients. I must write a report for each client that contains this information.
What I have tried in the past: I tried to streamline this process by using Excel to enter the data in select cells and run the Mail Merge function, which would then export the unique data into a templated Word document.
Problem: Unfortunately, this process is prone to error and has a tendency to crash my computer.
Question: Is there a way that I can successfully make this a seamless process?
Note: I do NOT have any programming knowledge whatsoever but I am here because I think a non-programming approach is simply not efficient. I am hoping I can reach a solution to this issue by teaching myself basic programming principles. Is this possible?
Yes - one way is to first add the Microsoft Office Word references in the VBA window. Then you can set up a word document with bookmarks. Then for each data you would like to insert:
Doc.Bookmarks("Bookmarknamehere").Select
App.Selection.TypeText "ClientDataHere"
You will have to define the word application and document variables and the above will work.

Integrating with 500+ applications

Our customers use 500+ applications and we would like to integrate these applications with our. What is the best way to do that? These applications are time registration applications and common for most of them is that they can export to csv or similar, some of them are actually home-brewed excel sheets where time is registered.
The best idea so far is to create our own excel sheet, which can be used to integrate with all these applications. The integrations could be in the form of cells containing something like ='[c:\export.csv]rawdata'!$A$3 Where export.csv is the csv file exported from the time registration applications. Can you see a better way to integrate against all these applications? It should be mentioned that almost all our customers have Microsoft Office.
Edit: Answers to the excellent questions from Pontus Gagge:
How similar are the data in the different applications?
I assume that since they time registration applications, they will have some similarities, but I assume that some will register the how long time one has worked in total for a whole month, while others will spesify for each day. If Excel is chosen, I believe that many of the differences could be ironed out using basic formulas.
What quality is the data?
The quality of the data can vary so basic validation must be undertaken, a good way is also to make it transparent for the customers, how our application understands their input, so they are responsible.
How large amounts of data are you talking about?
There will be information about the time worked for up to 50 employees.
Is the integration one-way only?
Yes
With what frequency should information be transferred?
Once per month (when they need to pay salaries).
How often do the applications themselves change, and how often does your product change?
If their application is a home-brewed Excel sheet, then I assume it will change once a year (due for example a mistake someone). If it is a standard proper time registration application, then I do not believe they are updated more often than every fifth year or so, as it is a very stabile concept.
Should the integration be fully automatic or can your end users trigger a data transfer?
They can surely trigger data transfer. The users are often dedicated to the process so they can be trained at doing it, which means that they could make up to, say 30, mouse clicks in order to integrate each month.
Will the customers have somebody to monitor the integrations?
As we have many customers, many of them should be able to undertake the integration themselves. We will though be able to assist them over the telephone. We cannot, though undertake the integration ourselves because we would then be responsible for any errors due to user mistakes, etc.
Does the phrase 'integration spaghetti' mean anything to you...?
I am looking for ideas from the best chefs to cook a nice large portion of that.
You need to come up with a common data format, and a way to translate the individual data formats to the common format. There's really no way around this - any solution you come up with will have to do this in one way or the other. It's the essential complexity of what you're doing.
The bigger issue is actually variances within the source data, in terms of how things like dates are stored, missing columns, etc. Doing a generic conversion for CSV to move columns around is comparatively easy.
I would also look at CSV and then use an OLEDB connection against the CSV file for importing.
If you try to make something that can interface to any data structure in the universe (and 500 is plenty close enough), it is guaranteed to be a maintenance nightmare. Instead I would approach this from multiple angles:
Devise an interface into which a human can enter this data already in the proper format. With 500+ clients, I'd make this a small, raw but functional browser based site that users can use to enter this information manally. This is the fall-back. At the end of the day, a human can re-key the information into the site and solve the import issue. Ideally, everyone would use this instead of their own format. Data entry people are cheap.
Similar to above, but expanded, I would develop a standard application or standardize on an off-the-shelf application that can be used to replace their existing format. This might take more time than #1. The goal would be to only do one-time imports of these varying data schemas into the application and be done with them for good.
The nice thing about spreadsheets is that you can do anything anywhere. The bad thing about spreadsheets is that you can do anything anywhere. With CSV or a spreadsheet there is simply no way to enforce data integrity and thus consistency (which is the primary goal) on the data. If the source data is already in a database, then that is obviously simpler.
I would be inclined to use database format into which each of these files need to be converted rather than a spreadsheet (e.g. use something like Jet (MDB)). If you have non-Windows users then that will make it harder and you might have to use a spreadsheet. The problem is that it is too easy for the user to change their source structure, break their upload and come crying to you. If a given end user has a resident expert, they can find a way of importing the data into that database format . If you are that expert, then I would on a case-by-case basis, write something that would import into that database format. XML would be the other choice, but that will likely take more coding than an import/export into a database format.
Standardization of the apps (even having all the sources in a database format instead of a spreadsheet would help) and control over the data schema is the ultimate goal rather than permitting a gazillion formats. There really is no nice answer other than standardization. Otherwise, you are having to write a converter for every Tom-Dick-and-Harry format and again when someone changes the source format.
With a multitude of data sources mapping each one correctly to an intermediate format is not trivial. Regular expressions are good with a finite set of known data formats. Multipass can help when data is ambiguous without context (month,day fields and have several days of data), and also help defeat data entry errors. But it seems as this data is connected to salaries there needs a good reliable transfer.
An import configuring trick
Get the customer to make a set of training data in the application. It should have a "predefined unique date" and each subsequent data field have a number corresponding to the target data field in your application. On importing your application needs to recognise the predefined date, determine the unique translation required and effect the displaying/saving of this "mapping key", and stop the import. eg If you expect "Duration hours" in field two then get the user to enter 2 in the relevant field which might be "Attendance hours".
On subsequent runs, and with the mapping definition key, import becomes a fairly easy process of translation.
Note on terms
"predefined date" - must be historical, say founding date of your company?, might need to be in PC clock settable range.
"mapping key" - could be string of hex digits and nybble based so tractable to workout
The entered code can be extended to signify required conversions ie customer's application has durations in days and your application expects it in hours.
Interfacing with windows programs (in order if increasing fragility)
Ye Olde saving as CSV file
Print to operating system printer that is setup as a text file/pdf, then scavenge the data out of that
Extract data via the application interface control, typically ActiveX for several windows programs ie like Matlab's Spreadsheet Link
Read native file format xls format ie like Matlab's xlsread
Add an additional intermediate spreadsheet sheet that has extended cell references ie ='[filename]rawdata'!$A$3
Have a look at Teiid by JBoss: http://jboss.org/teiid
Also consider using SOA - e.g., if you're on Java, try JBoss SOA platform: http://www.jboss.com/resources/soa/?intcmp=1004
Use a simple XML format. A non-technical person can easily understand a simple XML format (and could even identify basic problems with XML documents that are not well-formed).
Maybe use a DTD (or even better an XML schema) to do very basic validation, and then supplement this with an XSL stylesheet to do more validation with better error reporting. (An XSL stylesheet simply converts from XML to something else and so can be generate readable error messages.)
The advantage of this approach is that web browsers such as Internet Explorer can apply the XSL stylesheets. A customer need only spend at most a day enhancing their applications or writing excel macros to generate the XML data in the format that you specify.
Recent versions of Excel have support for converting spreadsheet data to XML, and can even validate against schemas.
Once the data passes the XSL validation checks, you have validated XML data.
If you have heaps of data and heaps of money, you could look at existing data management and cleansing tools:
http://www-01.ibm.com/software/data/infosphere/datastage
http://www-01.ibm.com/software/data/infosphere/qualitystage
But even then, you'll likely need to follow kyoryu's suggestion assuming you have 500+ data formats. The problem isn't your side. You need them to standardize their output formats if you have no control over their apps. CSV is likely the easiest. You could even send them a excel template to help them along.

Resources