The department I work in has an Excel document on a SharePoint drive that multiple simultaneous people edit throughout the day. The dataset/worksheet is around 80k rows with 37 columns, and the file size itself with VBA macros is around 20Meg.
The issue is with the amount of people in at a time (up to 6), there often is sync issues where excel can't merge changes and then end up with multiple instances of files that need to get re-correlated which wastes a far amount of time and is hard to determine which edits came when.
I've tried searching google for sync issues but nothing really points to how to optimize the file/system to decrease this from happening.
I've argued that this isn't the best way to go about this and to break up the data and assign it to individuals (parse it out into separate files that get combined at end of day), but management wants none of that as, well, change.
Any thoughts - experience - direct link pointers on how how to properly manage multiple simultaneous users in a SharePoint Excel workbook to ease sync issues?
Thanks in advance!
Related
I have 500 Excel documents. I want users to keep working as if that was excel (I'll provide app for that) yet cross-reference data in-between that documents. What database can feet such needs?
So, if i get it ok then you need to get data from ~500 excel files while people may access and change them in real time! I can think of 4 ways of approach:
live links of all files to 1 workbook... hurts me to even think the maintenance and setting ... but it will be "live".
powerQuery: group them all in one data table using PowerQueries or PowerBI or similar, then load them on workbook OR save as csv... 1 button refresh, relatively quick, no actual coding needed
use VBA: access all files (or changed ones...) and get what you want, when you want it. If implemented expertly will only take a few seconds for full scan in modern pc, yet needs someone good at coding VBA.
setup 1) using VBA instead of manually, then using VBA to check for errors etc. Result will be "live" but requires again serious VBA coding...
I believe that 2) is the easiest choice with good maintenance features, ease of setting and good speed... (start in excel ...Data / new Query/from File/from Folder ...)
I have recently observed an issue regarding my data in a column that I use to perform data validation on my spreadsheet.
So There is nothing wrong with the formula, neither is there anything from with the use of data validation.
It should be looking for duplicate entries, which works quite fine.
The issue is that it no longer recognizes input made from a smartphone using the excel app.
so what i did was to retype cell text field from my PC and it worked perfectly.
Is there a way that I can continue using this technique (Data validation) without having to re-enter data from a PC in order for it to process?
Certainly! Yes, that is possible.
But... with all the possibilities in today's world, is your current strategy the one that is the best for you?
That is something I cannot answer for you.
That is something I cannot enumerate for you.
But... There is something that I can introduce to you.
PowerQuery
PowerQuery was a free add-on for Excel 2010 and 2013 and it has been baked directly into Excel for more than half a decade. So, if you're using the mobile app then you probably have a modern version of Excel with PowerQuery right at your finger tips.
Your first step if to determine how you want to make your data available for Excel to get. Go to the Data Tab on the ribbon and review your options in the "Get Extetnal Data" group.
It doesn't matter if free data is your Creed and your most intimate moments are publicly available through your raw data feed. Or if paranoia is the reason why you constantly drive around the block scraping SSIDs before squirreling them away to SQL server for detailed analysis. Or if you're using a USB cable to transfer photos to your PC because your mom walked in on you without knocking and was so disgusted by what she saw on your desktop that you're banned from the family LAN... For life. None of that matters because Excel can connect to your data in so many ways that one of them will be perfect for you.
There is a sense of familiarity when Importing your data into PowerQuery. It's not unlike following those timeless MS Wizards; but nothing like the uncanny sensation of being dropped into the PowerQuery editor. It is simultaneously the same as Excel and different from Excel and it may be the closest you ever come to visiting a parallel universe. Many of the same tools are available but they behave just slightly differently. And in some cases, like the Text To Columns tool, it is light years ahead of Excel and you will find yourself cursing at MS for not using it as a replacement for the old tool.
When you're done transforming your data, you'll have a tight clean table. But the real prize, is that you have fully automated pipe from source to product .
I figured that the phone user included extra spaces when inputting the data.
So i Used the TRIM() function which takes care of the extra spaces between, before, or after each word, and that did the job.
Therefore the major error was that there were additional spaces that was not recognized in the tested data.
I have over 6,000 excel sheets. While all the sheets describe the same thing, they are independently formatted. They all have between 9 and 13 columns, but they are out of order, the column names are independently misspelled, and they may or may not have a second, or third, column header.
I am currently trying in python to read cells in a left-down-right-up motion to attempt to locate the same data, but there is physically too many differences in structure names, column ordering, and data definitions to lock them in one a time. Is there a tool that I can use to read these documents and conform them to a single format, via a rapid mapping function?
Thanks much.
Thanks
Wow, it's the Ultimate Data Horror Story.
I want to ask how you ever let it get this way... but I actually don't want to know; I'm already going to have nightmares about this.
It's like that Hoarding show on TV, but with data.
No, I'm afraid that if you can't even identify a pattern then there's no magic function that will be able to either.
But that doesn't mean it's a lost cause. It's just going to need some human interaction, and there are ways to minimize the pain.
What you need is a custom interface that will load the documents one by one, and will walk a human through clicking each relevant column or area, and then automatically load the next document.
There would also need to be buttons for sorting things like obvious garbage sheets (blanks?), "unknowns" (that get put in a folder for advanced research later), and other "unpredictables" may come up during the process.
Also, perhaps once you get into it, you'll notice a pattern you're not thinking of, like maybe *"the person who handled the files from 2002 to 2004 set them up this way"*, or, "when Budget is misspelled, it's always either Bugdet or Budteg".
In this scope, little patterns like that can make a big difference.
Depending on your coding skills, you may or may not need outside assistance with this. I assume this is not data that can just get thrown out, or you wouldn't be asking...
If each document took an average of 20 seconds to process, that would be about 33 hours in total. An hour a day an it's done in a month. Or someone full-time, and it's done in a week.
Do you have a budget you can throw at this? Data archaeology is an actual thing! Hell, I'll do it for you for the right price... (wouldn't break the bank, depending on how urgent it is, of course!)
Either way, this ain't going to be fun for "someone"...
How is it possible to switch to another worksheet while the results or computations of the current worksheet are being fetched?
Currently we have to wait for the current computation to complete before moving to another sheet. Ideally I would like to know how can I push the current worksheet job in the background and resume working on another worksheet in Tableau?
Note: Extracts not an option nor multiple instances as the source data has millions of records.
I don't think this is functionality Tableau offers right now, but here are a few things that might speed up development:
If possible, extract your data, rather than using a live connection. I know this isn't always an option, but it can remove a lot of overhead, particularly during development.
Optimize your extracts. Not having to recompute all of your calculated fields every time you make a query can make for some pretty notable speed boosts.
By far the best way to minimize your load times is to subset your data while you're building the worksheets. During development, it might not be necessary to load every row of your dataset. In more cases than not, a subset will be enough to confirm that your worksheets and calculations are working as you need them to. You could try filtering to just a month's worth of data, for example, or maybe just a handful of individuals/stores/dog breeds/Skrillex songs.
Nadir's suggestion to pause auto-updates is a great one, but if you're building more complicated views or more intricate calculations, not having real-time feedback on your work can make development a lot more challenging. However, if subsetting your data isn't an option, this might be your best route. One way to speed this process up a bit would be to toss all of the sheets you want to load onto a dashboard and resume auto updates from there. Note that this dashboard would not be a formatted production dashboard — it's simply serving as a drop point for the sheets you'd like to load all at once.
I've never found a way to achieve this. But it is worth noting that you can have multiple Tableau workbooks open at the same time and they run in separate threads.
So, if you can split your work across workbooks, then you can switch from one to the other during calculations quite easily without the calculation impeding your work. You may be able to merge workbooks later if you organise the split of your work the right way.
If you are using a Live Data Source, you can pause auto-updates, until you are ready to see the results/computations. I know this does not completely help with what you are trying to do but it does give you a chance to go through and create the worksheets you need and then have it all load at once.
I am looking for an alternative spreadsheet to Excel, preferably but not necessarily open source, that allows a programmer to create a plugin that can update cells in the sheet from an external data source in real time. The spreadsheet would then internally compute all dependent calculation chains upon change of value.
This is similar functionality to what the RTD method does with Microsoft Excel. The rate of external data change could be moderate to high (whatever such relativistic terms mean).
Also the reverse process would be useful, i.e. detecting a change in cells and then sending that information to a plugin that can communicate with external processes.
Any recommendations or experience in trying this?
I am afraid you will not find any. The main consumers of the real-time spreadsheets (grids) are big banks and they usually invest in their own solutions. [Because they can afford and they used to see it as their advantage over the competition] Some of the solutions are very dated, but still going strong! Three years ago I worked on a system which was written in C++ (with TibcoRv as a backbone) and it was already five years old. It is still alive and kicking.
One of the strong points of the bespoke grid are "Excel-like formulae" where a user can use a field from the provided data dictionary. So rather than reference cells, you reference data from your systems. It makes formulae easier to implement and read. And of course you can export or share them; users really like that.
The following could be of some help:
http://www.dadisp.com
http://www.quantrix.com
http://www.resolversystems.com/products/
http://pyspread.sourceforge.net/
http://matrex.sourceforge.net/
This may not exactly satisfy your real time requirement but worth exploring.