How to work simultaneosly on multiple worksheets in Tableau?

How to work simultaneosly on multiple worksheets in Tableau? - multithreading

How is it possible to switch to another worksheet while the results or computations of the current worksheet are being fetched?
Currently we have to wait for the current computation to complete before moving to another sheet. Ideally I would like to know how can I push the current worksheet job in the background and resume working on another worksheet in Tableau?
Note: Extracts not an option nor multiple instances as the source data has millions of records.

I don't think this is functionality Tableau offers right now, but here are a few things that might speed up development:
If possible, extract your data, rather than using a live connection. I know this isn't always an option, but it can remove a lot of overhead, particularly during development.
Optimize your extracts. Not having to recompute all of your calculated fields every time you make a query can make for some pretty notable speed boosts.
By far the best way to minimize your load times is to subset your data while you're building the worksheets. During development, it might not be necessary to load every row of your dataset. In more cases than not, a subset will be enough to confirm that your worksheets and calculations are working as you need them to. You could try filtering to just a month's worth of data, for example, or maybe just a handful of individuals/stores/dog breeds/Skrillex songs.
Nadir's suggestion to pause auto-updates is a great one, but if you're building more complicated views or more intricate calculations, not having real-time feedback on your work can make development a lot more challenging. However, if subsetting your data isn't an option, this might be your best route. One way to speed this process up a bit would be to toss all of the sheets you want to load onto a dashboard and resume auto updates from there. Note that this dashboard would not be a formatted production dashboard — it's simply serving as a drop point for the sheets you'd like to load all at once.

I've never found a way to achieve this. But it is worth noting that you can have multiple Tableau workbooks open at the same time and they run in separate threads.
So, if you can split your work across workbooks, then you can switch from one to the other during calculations quite easily without the calculation impeding your work. You may be able to merge workbooks later if you organise the split of your work the right way.

If you are using a Live Data Source, you can pause auto-updates, until you are ready to see the results/computations. I know this does not completely help with what you are trying to do but it does give you a chance to go through and create the worksheets you need and then have it all load at once.

Related

Excel: better way to handle living document with multiple simultaneous authors

The department I work in has an Excel document on a SharePoint drive that multiple simultaneous people edit throughout the day. The dataset/worksheet is around 80k rows with 37 columns, and the file size itself with VBA macros is around 20Meg.
The issue is with the amount of people in at a time (up to 6), there often is sync issues where excel can't merge changes and then end up with multiple instances of files that need to get re-correlated which wastes a far amount of time and is hard to determine which edits came when.
I've tried searching google for sync issues but nothing really points to how to optimize the file/system to decrease this from happening.
I've argued that this isn't the best way to go about this and to break up the data and assign it to individuals (parse it out into separate files that get combined at end of day), but management wants none of that as, well, change.
Any thoughts - experience - direct link pointers on how how to properly manage multiple simultaneous users in a SharePoint Excel workbook to ease sync issues?
Thanks in advance!

Is there a database specifically engineered for cross-referencing Excel like tables?

I have 500 Excel documents. I want users to keep working as if that was excel (I'll provide app for that) yet cross-reference data in-between that documents. What database can feet such needs?

So, if i get it ok then you need to get data from ~500 excel files while people may access and change them in real time! I can think of 4 ways of approach:
live links of all files to 1 workbook... hurts me to even think the maintenance and setting ... but it will be "live".
powerQuery: group them all in one data table using PowerQueries or PowerBI or similar, then load them on workbook OR save as csv... 1 button refresh, relatively quick, no actual coding needed
use VBA: access all files (or changed ones...) and get what you want, when you want it. If implemented expertly will only take a few seconds for full scan in modern pc, yet needs someone good at coding VBA.
setup 1) using VBA instead of manually, then using VBA to check for errors etc. Result will be "live" but requires again serious VBA coding...
I believe that 2) is the easiest choice with good maintenance features, ease of setting and good speed... (start in excel ...Data / new Query/from File/from Folder ...)

data and structure cleansing of excel sheets

I have over 6,000 excel sheets. While all the sheets describe the same thing, they are independently formatted. They all have between 9 and 13 columns, but they are out of order, the column names are independently misspelled, and they may or may not have a second, or third, column header.
I am currently trying in python to read cells in a left-down-right-up motion to attempt to locate the same data, but there is physically too many differences in structure names, column ordering, and data definitions to lock them in one a time. Is there a tool that I can use to read these documents and conform them to a single format, via a rapid mapping function?
Thanks much.
Thanks

Wow, it's the Ultimate Data Horror Story.
I want to ask how you ever let it get this way... but I actually don't want to know; I'm already going to have nightmares about this.
It's like that Hoarding show on TV, but with data.
No, I'm afraid that if you can't even identify a pattern then there's no magic function that will be able to either.
But that doesn't mean it's a lost cause. It's just going to need some human interaction, and there are ways to minimize the pain.
What you need is a custom interface that will load the documents one by one, and will walk a human through clicking each relevant column or area, and then automatically load the next document.
There would also need to be buttons for sorting things like obvious garbage sheets (blanks?), "unknowns" (that get put in a folder for advanced research later), and other "unpredictables" may come up during the process.
Also, perhaps once you get into it, you'll notice a pattern you're not thinking of, like maybe *"the person who handled the files from 2002 to 2004 set them up this way"*, or, "when Budget is misspelled, it's always either Bugdet or Budteg".
In this scope, little patterns like that can make a big difference.
Depending on your coding skills, you may or may not need outside assistance with this. I assume this is not data that can just get thrown out, or you wouldn't be asking...
If each document took an average of 20 seconds to process, that would be about 33 hours in total. An hour a day an it's done in a month. Or someone full-time, and it's done in a week.
Do you have a budget you can throw at this? Data archaeology is an actual thing! Hell, I'll do it for you for the right price... (wouldn't break the bank, depending on how urgent it is, of course!)
Either way, this ain't going to be fun for "someone"...

Automating Raw Export Data Cleansing for Client Onboarding - Format is Always Different

So a bit of a general question. I work as a data analyst for a startup. My primary process involves taking existing customer data a client has and cleansing/normalizing it to fit into our platform once as part of our onboarding process. A member of our team exports their data from their system they are transitioning from or, if they kept track of it in house, we receive their Excel log they used to track it. It is always in a different format and requires extensive cleansing (avg 1 min/record). We take what is usually one large table (.xlxs format), and after cleansing, split it into four .csv files; which we load as four tables on our platform.
I feel I have optimized the process quite well in terms of the process steps and cleansing with excel functions (if, concat, text-to-columns, etc). I have beginner-intermediate skills in VBA and SQL and have just scratched the surface in R; what is frustrating is that I know there is the potential to automate this process but I just don't know where to start. If anyone has experience with something like this, code, a link to an article / another thread, or just some general direction would be much appreciated. Please ask for clarification where you feel it is needed. Thanks.

This will be really hard to do in Excel. If you have the time you can try out Optimus, a Data Cleansing library written in Python and Pyspark (you don't need to know spark). Here is the webpage https://hioptimus.com.
You can create Data Pipelines with it, and I recommend that you do that, try to generalize your processes, and asking the client for more a structure way of passing the data.
The good thing is that you don't need Big Data for running Optimus, bit if you have it some day, the same code will work.
Check out the documentation for more:
http://optimus-ironmussa.readthedocs.io/en/latest/
Let me know if you have doubts!

Can an excel worksheet be used as UDF?

I'm building a network business model in excel. A similar model is that of Gawker Media.
In my model I have a number properties that have some over lap of audience. Each property attracts users, which in turn affords cross promotional opportunities. In the case of Gawker they have a series of blogs whose audience will likely read several of their blogs in their network.
If gawker launched a new blog they're able to direct traffic from their blog network.
Creating a model for a single blog is fairly simple - although the initial assumptions are harder. The next step is to model the network effect.
Excel provides a scenarios manager that allows me to vary the key assumptions in the basic model. This is almost perfect, I can model the launch of 10 properties, each with different launch assumptions and see the summary.
Where I need help is figuring out how I can vary the initial number of users for the launch of each property. In other words, once the network is established, its possible to drive people to any new property launched on the network.
I don't believe the scenario manager will do what I need.
So, I'm wondering if its possible to use the model work sheet as a UDF? The UDF would need to spit out the monthly revenue and unique users given a number of input assumptions.
I would then be able to create my own summary sheet for the 10 properties and using the total uniques for each property get a summary for the network. This network summary would be used to determine how many people could be driven to the launch of a new property.
In effect, the only difference to the scenario manager is that I need one of my input variables (initial users) to be programmatically generated as a function of the number of people in the network at the time of launch.
I'm hoping its possible to achieve something along these lines in excel. I could drop down and create the whole model in Java, but then its much harder to share with business colleagues!
Thanks - Matt.

You could try Data Table.
It only allows you to analyse the effect of varying 2 input parameters, but you can create several data tables, and each parameter can take hundreds of different values.
It's little know, but efficient and available since Excel 3.0.

There is a product that I have researched but never used - search for calc4web. It takes a sheet of formulas and generates code (C++) that can be compiled into an XLL add-in. Then you can call a function that does what your sheet does. But of course then you have an XLL to distribute, and a build step every time you change your logic, which defeats much of the point of using a spreadsheet.
In my case, I wound up writing some very simple VBA code to vary my sheet "inputs" using the scenario manager, and capture my "outputs". This works if you have a batch of inputs that you can just point your macro at and step through.
EDIT:
See here for a VBA-only example of doing this:
using a sheet in an excel user defined function

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string