Restricting access to Excel source data

Restricting access to Excel source data - excel

I have an Excel template which reads data from a source Excel file using vlookups and Index/Match functions. I there a way to prevent the end user from accessing the source data file/sheet? e.g. by storing the source file on a remote location and make the vlookups read from there..

Depending on what resources are available to you, it may be difficult to prevent users from just going around the restrictions you put in place. Even if the data is in a database table you will need measures in place to prevent users from querying it outside of your Excel template. I don't know your situation, but ideally there would be someone (i.e. database administrator, infosec, back-end developer) who could help engineer a proper solution.
Having said that, I do believe your idea around using MS SQL Server could be a good way to go. You could create stored procedures instead of using sql queries to limit access. See this link for more details:
Managing Permissions with Stored Procedures in SQL Server
In addition, I would be worried about users figuring out other user IDs and arbitrarily accessing data. You could implement some sort of protection by having a mapping table so that there's no way to access information with user IDs. The table would be as follows:
Columns: randomKey, userId, creationDate
randomKey is just an x digit random number/letter sequence
creationDateTime is a time stamp and used for timeout purposes
Whenever someone needs a user id you would run a stored procedure that adds a record to the mapping table. You input the user id, the procedure creates a record and returns the key. You provide the user with the key which they enter in your template. A separate stored procedure takes the key and resolves to the user id (using the mapping table) and returns the requested information. These keys expire. Either they can be single use (the procedure deletes the record from the mapping table) or use a timeout (if creationDateTime is more than x hours/days old it will not return data).
For the keys, Mark Ransom shared an interesting solution for creating random IDs for which you could base your logic:
Generate 6 Digit unique number
Sounds like a lot of work, but if there is sensitivity around your data it's worth building a more robust process around it. There's probably a better way to approach this, but I hope it at least gives you food for thought.

No, it's not possible.
Moreover, you absolutely NEED these files open to refresh the values in formulas that refer them. When you open a file with external references, their values will be calculated from local cache (which may not be equal to actual remote file contents). When you open the remote files, the values will refresh.

Related

Best way to run a script for large userbase?

I have users stored in postgresql database (~10 M) and i want to send all of them emails.
Currently i have written a nodejs script which basically fetches users 1000 at a time (Offset and limit in sql) and queues the request in rabbit MQ. Now this seems clumsy to me, as if the node process fails at any time i have to restart the process (i am currently keeping track of number of users skipped per query, and can restart back at the previous number skipped found from logs). This might lead to some users receiving duplicate email and some not receiving any. I can create a new table with new column indicating whether email has been to that person or not, but in my current situation i cant do so. Neither can i create a new table nor can i add a new row to existing table. (Seems to me like idempotent problem?).
How would you approach this problem? Do you think compound indexes might help. Please explain.

The best way to handle this is indeed to store who received an email, so there's no chance of doing it twice.
If you can't add tables or columns to your existing database, just create a new database for this purpose. If you want to be able to recover from crashes, you will need to store who got the email somewhere so if you are given hard restrictions on not storing this in your main database, get creative with another storage mechanism.

Is there an accurate way to get all of the stored procedure column dependencies in Sybase ASE?

I am currently working within a Sybase ASE 15.7 server and need a dependable way to get all of the stored procedures which are dependent upon a particular column. The Sybase system procedure sp_depends is notoriously unreliable for this. I was wondering if anyone out there had a more accurate way to discover these dependencies.
Apparently, the IDs of the columns are supposed to be stored in a bitmap in the varbinary column sysdepends.columns. However, I have not yet found a bitmask which has been effective in decoding these column IDs.
Thanks!

a tedious solution could be to parse the SP code in the system table syscomments to retrieve the tables.

A partial solution might be to run sp_recompile on all relevant tables, then watch master..monCachedProcedures to see changes in the CompileDate. Note that the CompileDate will only change once the stored proc has been executed after the sp_recompile (it actually gets compiled on 1st execution)
This would at least give you an idea of stored procedures that are in use, that are dependent on the specified table.
Not exactly elegant...

Possibility of GUID collision in MS CRM Data migration

We are doing CRM data migration in order to keep two CRM Systems in Sync. And removing history data from Primary CRM. Target CRM is been created by taking Source as base. Now while we migrate the data we keep guids of record, same in order to maintain data integrity. This solution expects that in target systems that GUID must be available to assign to new record. There are no new records created directly in target system except Emails, that too very low in number. But apart from that there are ways in which system creates its guids, e.g when we move newly created entity to target solution using Solution it will not maintain the GUID of entity and attributes and will create its own, since we do not have control on this. Also some of the records which are created internally will also get created by platform and assigned a new GUID. Now if we do not have control over guid creation in target system(Although number is very small), i fear of the situation where Source System has guid which target has already consumed!! And at time of data migration it will give errors.
My Question is there any possibility that above can happen? because if that happens to us whole migration solution will loose its value.

SQL Server's NEWID() generates a 128-bit ID. All IDs generated on the same machine are guaranteed to be unique but because yours have been generated across multiple machines, there's no guarantee.
That being said, from this source on GUIDs:
...for there to be a one in a billion chance of duplication, 103 trillion version 4 UUIDs must be generated.
So the answer is yes there is a chance of collision, but it's so astromonically low that most consider the answer to effectively be no.

Need to create an SSIS Package for users to directly modify a table

I need to allow a couple of users to modify a table in my database, preferably as part of an integrated package that then submits the changes into our live database.
Please allow me to explain further:
We have an automated import task from one database system into another, with data transformation on the way through.
As part of this task, various checks are run before the final import and any rows with incomplete or incorrect data are sent to a rejections table and deleted from the import table.
I now need to allow a couple of senior users that ability to view and correct the missing/incorrect entries from the rejection table, before re-staging it and submitting to the live database.
(Obviously, it will be re-checked before submission and re-rejected if it is still wrong).
Can anyone tell me what I need to do in SSIS to display the contents of a specific table (e.g. MyDatabase.dbo.Reject_Table) to the user running this package from their local PC (the package will, of course, be located on the server).
Then they need the ability to modify the contents of the table - Either 1 row at a time or en-masse. Not bothered which).
When that is done, they hit a "Continue" or "Next" type button, which then continues to run the remainder of the package, which I am more than comfortable writing.
It is only the interactive stage(s) that I am struggling with and I would really appreciate some advice.
Thanks
Craig

That is non-native functionality in SSIS.
You can write pretty much anything you want in a script task and that includes GUI components. (I once had a package play music). In your data flow, you would have a Script Component that edits each row passing through the component.
Why this is a bad idea
Suitability - This isn't really what SSIS is for. The biggest challenge you'll run into is the data flow is tightly bound to the shape of the data. The reject table for Customer is probably different than the reject table for Phone.
Cost - How are you going to allow those senior users to run SSIS packages? If the answer involves installing SSIS on their machines, you are looking a production license for SQL Server. That's 8k to 23k ish per socket for SQL Server 2005-2008R2 and something insane per core for SQL Server 2012+.
What is a better approach
As always, I would decompose the problem into smaller tasks until I can solve it. I'd make 2 problem statements
As a data steward, I need the ability to correct (edit) incomplete data so that data can be imported into our application.
As an X, I need the ability to import (workflow) corrected rejected data so that we can properly bill our customers (or whatever the reason is).
Editing data. I'd make a basic web page or thick client app to provide edit capability. A DataGridView would be one way of doing. Heck, you could forgo custom development and just slap an Access front end to the tables and let them edit the data through that.
Import corrected data. This is where I'd use SSIS but possibly not exclusively. I'd probably look at adding a column to all the reject tables that indicates whether it's ready for reprocessing. For each reject table, I'd have a package that looks for any rows flagged as ready. I'd probably use a Delete first pattern to remove the flagged data and either insert it into the production tables or route it back into the reject table for further fixing. The mechanism for launching the packages could be whatever makes sense. Since I'm lazy,
I'd have a SQL Agent job that runs the packages and
Create a stored proc which can start that job
Grant security on that stored proc to the data stewards
Provide the stewards a big red button that says Import How that's physically implemented would depend on how you solved the edit question.

Data retrieval - Database VS Programming language

I have been working with databases recently and before that I was developing standalone components that do not use databases.
With all the DB work I have a few questions that sprang up.
Why is a database query faster than a programming language data retrieval from a file.
To elaborate my question further -
Assume I have a table called Employee, with fields Name, ID, DOB, Email and Sex. For reasons of simplicity we will also assume they are all strings of fixed length and they do not have any indexes or primary keys or any other constraints.
Imagine we have 1 million rows of data in the table. At the end of the day this table is going to be stored somewhere on the disk. When I write a query Select Name,ID from Employee where DOB="12/12/1985", the DBMS picks up the data from the file, processes it, filters it and gives me a result which is a subset of the 1 million rows of data.
Now, assume I store the same 1 million rows in a flat file, each field similarly being fixed length string for simplicity. The data is available on a file in the disk.
When I write a program in C++ or C or C# or Java and do the same task of finding the Name and ID where DOB="12/12/1985", I will read the file record by record and check for each row of data if the DOB="12/12/1985", if it matches then I store present the row to the user.
This way of doing it by a program is too slow when compared to the speed at which a SQL query returns the results.
I assume the DBMS is also written in some programming language and there is also an additional overhead of parsing the query and what not.
So what happens in a DBMS that makes it faster to retrieve data than through a programming language?
If this question is inappropriate on this forum, please delete but do provide me some pointers where I may find an answer.
I use SQL Server if that is of any help.

Why is a database query faster than a programming language data retrieval from a file
That depends on many things - network latency and disk seek speeds being two of the important ones. Sometimes it is faster to read from a file.
In your description of finding a row within a million rows, a database will normally be faster than seeking in a file because it employs indexing on the data.
If you pre-process you data file and provided index files for the different fields, you could speedup data lookup from the filesystem as well.
Note: databases are normally used not for this feature, but because they are ACID compliant and therefore are suitable for working in environments where you have multiple processes (normally many clients on many computers) querying the database at the time.

There are lots of techniques to speed up various kinds of access. As #Oded says, indexing is the big solution to your specific example: if the database has been set up to maintain an index by date, it can go directly to the entries for that date, instead of reading through the entire file. (Note that maintaining an index does take up space and time, though -- it's not free!)
On the other hand, if such an index has not been set up, and the database has not been stored in date order, then a query by date will need to go through the entire database, just like your flat-file program.
Of course, you can write your own programs to maintain and use a date index for your file, which will speed up date queries just like a database. And, you might find that you want to add other indices, to speed up other kinds of queries -- or remove an index that turns out to use more resources than it is worth.
Eventually, managing all the features you've added to your file manager may become a complex task; you may want to store this kind of configuration in its own file, rather than hard-coding it into your program. At the minimum, you'll need features to make sure that changing your configuration will not corrupt your file...
In other words, you will have written your own database.

...an old one, I know... just for if somebody finds this: The question contained "assume ... do not have any indexes"
...so the question was about the sequential dataread fight between the database and a flat file WITHOUT indexes, which the database wins...
And the answer is: if you read record by record from disk you do lots of disk seeking, which is expensive performance wise. A database always loads pages by concept - so a couple of records all at once. Less disk seeking is definitely faster. If you would do a mem buffered read from a flat file you could achieve the same or better read values.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string