Privacy laws and the Azure platform

Privacy laws and the Azure platform - azure

While privacy laws are normally outside the domain of us developers, I do think it's an important topic to keep here at SO because we developers should take the responsibility to warn our employers if they want something that would break some laws... In this case, privacy-laws... Normally, we developers don't have to think much about legal stuff, but this seems to become a much bigger issue these days. It's too easy for employers to forget about these things but the consequences of these laws could be very harmful for future developments...
Many countries dictate limitations on how companies are allowed to store privacy-sensitive data in databases. For example, social security numbers, bank account numbers, criminal pasts, former employees, birthdate, relatives, sexual orientation and whatever more. Such data is limited to certain restrictions that could differ from country to country...
The Azure platform makes it even more complex since Azure is owned by an US company (Microsoft) and the US law dictates that Microsoft needs to hand over data if the Feds need it for some research. (This article highlights it.) Thus, this could put Azure in conflict with specific laws in certain areas of this world.
What I need to know is which countries would have such a restriction that I cannot offer customers in those areas an Azure-based solution that would process privacy-sensitive data? (Thus, those countries would need a non-azure, localized solution!)
This is important because I need to display a disclaimer warning those users, making it clear that they might be in violation. Users will tell from which country they are so basically the disclaimer is just limited to those users. (Each user will be maintaining data for possible hundreds of their customers each, so it's a lot of sensitive data.)

There are too many different sets of laws for you to be able to give or even keep up to date that kind of information on your web site.
What you could do is make them aware of the problem and say that users must take in to consideration laws in their country before signing up.

Related

How do we gather and document non-functional requirements in Agile

I know in waterfall, they are gathered and documented at an early stage of SDLC, I believe very first stage. Therefore, they are captured and documented before development and testing even starts.
But I am confused how is that done in Agile?
If I understand correctly, user stories should be written with acceptance criteria which capture non-functional requirements. But in Agile, we pick project, create it, and start working on it right away.
So, my guess is that someone (perhaps product owner) goes through user stories and collects acceptance criteria into a formatted document which then becomes Non-Functional-Requirements document?

First, to answer your question, I must be clear that no Agile frameworks or methodologies attempt to define everything that a team might need to do (especially Scrum) so there is nothing wrong with adding extra artifacts or practices that the team finds useful as long as they aren't contradicting a defined practice.
There are a few places I typically see non-functional requirements recorded. Here are a few of the most common ones:
Definition of Done
The definition of done contains standards for quality that should be applied across all backlog items that come through. Often times this includes things like "n% unit test coverage of code", "code and configuration changes have been peer reviewed", and "all automated regression tests have been run and pass". I've sometimes seen broader non-functional requirements like "no changes cause the application load time to exceed X ms".
Architectural Design Documents
You can still have these in Agile. Rather than establishing the finished architecture at the beginning of the project, they introduce constraints that the architecture has to stay within. As the project progresses and architectural decisions are made or changed, these documents are updated to reflect that information. Examples of constraints may include "System X is considered to be the authoritative source of customer personal data" or "Details needed for payment processing should never be available to a public-facing server in order to reduce attack opportunities on that data."
Product Chartering
Depending on the project, "starting right away" is a bit fluid. On very large projects or products, it is not uncommon to take a few days (in my experience, 1 - 3 is a good number) to charter the project. This would include identifying personas, making sure business stakeholders and team members have a shared understanding of the vision, talk through some expected user experiences and problems at a high level, etc. It is very common that non-functional needs come out here and should be recorded either in the DoD, existing architectural documents, or in some cases, in backlog items. One good example of this happening is something called a trade-off matrix. When building a tradeoff matrix, we talk about constraints on the project like performance, adaptability, feature set, budget, time, etc. We identify one as a primary constraint, two as secondary, and all others are considered tertiary. This isn't a hard-and-fast rule, but it establishes an general understanding of how trade-offs on non-functional needs will be decided in the work.
Backlog Items
Ok, last one. Not all backlog items have to be User Stories. If you have an actionable non-functional requirement (set up a server, reconfigure a firewall, team needs to convert to a new version of the IDE) there is nothing that stops you from creating a backlog item for this. It isn't a User Story, but that's ok. I will warn that most teams find a correlation between the number of items in the backlog that are User Stories and their ability to effectively deliver value and adapt to changes along the way, so don't get carries away. But I'd rather see a team put in a non-US in their backlog than try to pass off those things as user stories like "As a firewall, I want to be updated, so we don't get h#XX0rD" <- real backlog item I saw.
As a final note: remember that in Agile, we strive to adapt to change, so don't worry about getting the DoD or architectural document perfect the first time. It can change as you learn more.

best practice for permission implementation in a system?

I have an application which contains different kinds of permissions. As mentioned in (Role Bases Security) RBC ,I grouped users into roles and assigning different permissions to roles. (and permissions are in this style :
public enum Permission {
View = 1,
Create =2,
Edit =4,
Delete =8,
Print = 16
}
everything is ok in simple systems but when the system becomes a little complex , specific permissions come to the system such as :
View Just His Issued Invoices
View All Invoices
Edit Just His Issued Invoices
Edit All Invoices
Create Sale Invoice
Create Purchase Invoice
Create Proforma
Create Sale Report On His Own Invoices
Create Daily Sale Report
Create Monthly Sale Report
-....
As you see different kind of permissions arises in system (it can grows to about 200 different permissions). So the problems are :
I cannot put them all in one enum . then using binary pattern (1,2,4,8,..) cannot be used because in its best case(int64) it supports up to 64 different permissions.
a big enum (with about 200 items) is not so good in coding
what are your ideas in this case?
thanks in advance :-)

I'm not sure why you feel that you need to try to shove all the permissions into a single flags (or so I'm inferring from the vales) enum. Permission requests and grants can be represented using lists as opposed to a single ORed value. If you use a list approach, you become free to create whatever permission representation you like. For example, you could use a non-flags enum or even multiple enums to represent your permissions.

It sounds like you need a level of indirection...
For example, you need a category (represented by an object, say) that represents "His Issued Invoices". You need a way to grant a role any of your basic permissions on that object. You need a way to test whether something is a member of that category.
Suppose "Jane" tries to view an invoice. Then you just need to check: Does Jane have a role which has View access to some category of which this invoice is a member?
This check might be slow, since you have to check all of Jane's roles against all of the invoice's categories. But presumably you can cache the result... Or your can use a "capability based" approach, where Jane asks the security manager for a handle (pointer) to the invoice with View access. The security manager does the check and hands Jane the handle, after which she can use that handle to do whatever Viewing operations the handle supports with no additional security checks.

I agree with Nicole it does seem like you are performing what may have seemed like a good optimization but you are encountering issues with scale.
Many RBC systems deal with a large number of permissions, which is one reason roles exist - regular users need only know what role they are in - leave it to the developers to figure the role-permission mapping out. Larger systems might provide a GUI for superusers to do the role-permission mapping, or even create permissions, but only to provide the power user ultimate flexibility.
However, because of J2EE, at the code level it all boils down to checking 'roles' programmatically. That tends to confuse things when what you actually want to test for is the permission to perform an operation. Just keep that semantic gap in mind.
In terms of optimization, consider not the method of assignment of permissions, but when and how you perform the check. In a web application, you may only need to check when the call from the front-end comes in, and perhaps network latency will dwarf any optimizations you perform here.
If you decide you do still want to optimize, you'll probably find simply caching the permissions at login is enough. The actual search for a permission will be all in memory, so will be tiny after the initial load from the database.
To avoid the combinatorial explosion of permissions, establish some strong logic up front - write it down - and make sure you're covering all your bases. If you see the need for new dynamic permissions to be created, such as when new entities are added in to your system, then watch out - this is better done in a mediator or manager pattern that can check your business rules before handing out the protected entity. Here you are stepping into the realm of libraries like Drools which serve to expose business logic from your application so that it can be updated based on changing business requirements.

Best Practices / Patterns for Enterprise Protection/Remediation of SSNs (Social Security Numbers)

I am interested in hearing about enterprise solutions for SSN handling. (I looked pretty hard for any pre-existing post on SO, including reviewing the terriffic SO automated "Related Questions" list, and did not find anything, so hopefully this is not a repeat.)
First, I think it is important to enumerate the reasons systems/databases use SSNs: (note—these are reasons for de facto current state—I understand that many of them are not good reasons)
Required for Interaction with External Entities. This is the most valid case—where external entities your system interfaces with require an SSN. This would typically be government, tax and financial.
SSN is used to ensure system-wide uniqueness.
SSN has become the default foreign key used internally within the enterprise, to perform cross-system joins.
SSN is used for user authentication (e.g., log-on)
The enterprise solution that seems optimum to me is to create a single SSN repository that is accessed by all applications needing to look up SSN info. This repository substitutes a globally unique, random 9-digit number (ASN) for the true SSN. I see many benefits to this approach. First of all, it is obviously highly backwards-compatible—all your systems "just" have to go through a major, synchronized, one-time data-cleansing exercise, where they replace the real SSN with the alternate ASN. Also, it is centralized, so it minimizes the scope for inspection and compliance. (Obviously, as a negative, it also creates a single point of failure.)
This approach would solve issues 2 and 3, without ever requiring lookups to get the real SSN.
For issue #1, authorized systems could provide an ASN, and be returned the real SSN. This would of course be done over secure connections, and the requesting systems would never persist the full SSN. Also, if the requesting system only needs the last 4 digits of the SSN, then that is all that would ever be passed.
Issue #4 could be handled the same way as issue #1, though obviously the best thing would be to move away from having users supply an SSN for log-on.
There are a couple of papers on this:
UC Berkely
Oracle Vault

I have found a trove of great information at the Securosis site/blog. In particular, this white paper does a great job of summarizing, comparing and contrasting database encryption and tokenization. It is more focused on the credit card (PCI) industry, but it is also helpful for my SSN purpose.

It should be noted that SSNs are PII, but are not private. SSNs are public information that be easily acquired from numerous sources even online. That said if SSNs are the basis of your DB primary key you have a severe security problem in your logic. If this problem is evident at a large enterprise then I would stop what you are doing and recommend a massive data migration RIGHT NOW.
As far as protection goes SSNs are PII that is both unique and small in payload, so I would protect that form of data no differently than a password for one time authentication. The last four of a SSNs is frequently used for verification or non-unique identification as it is highly unique when coupled with another data attribute and is not PII on its own. That said the last four of a SSN can be replicated in your DB for open alternative use.

I have come across a company, Voltage, that supplies a product which performs "format preserving encryption" (FPE). This substitutes an arbitrary, reversibly encrypted 9-digit number for the real SSN (in the example of SSN). Just in the early stages of looking into their technical marketing collateral...

organizing information for a software development organization

over time our information strategy has gone all over the place and we are looking to have a clearer policy and a more explicit way for everyone to be in sync on information sharing. Some things to note is that the org is 300+ people and is in multiple countries across the world. Also, we have people that are comfortable in Sharepoint, people that are comfortable in confluence, etc so there is definately a "change" factor here
Here are our current issues and what we are thinking about doing about them. I would love to hear feedback, suggestions, etc.
The content we have today:
Technical design info / architecture docs
Meeting minutes, action items, etc
Project plans and roadmaps
organization business mgmt info - travel, budget info, headcount info, etc
Project pages with business analysis, requirements, etc
Here are some of our main issues:
Where should data go - Confluence WIKI versus Sharepoint versus intranet site - we use confluence WIKI for #1, #2, #3, #5 but we also use sharepoint for #1, #3, #4, #5. We are trying to figure out if we should mandate each number to a specific place to make things consistent. We are using Sharepoint more a directory structure of documents, and we are using confluence for more adhoc changable content.
Stale Data - this is maybe a cultural thing with the org but at certain points in time data just becomes stale and is no longer relevant. What is the best way to ensure old data doesn't create a lot of noise and to ensure that the latest correct data is up to date. Should there be people in the org responsible for this or should it be an implicit "everyones job". This is more of an issue when people leave, join, etc . .
More active usage - whats is the best way to get people off of email and trying to stop and think "could this be useful for others . . let me put it in a centralized place instead of in email chains" . .
also, any other stories of good ways to improve an org's communication and information management

A fundamental root cause of information clutter is "no ownership".
People are assigned to projects. The projects end (or are cancelled), the people move on and the documents remain behind to gather "dust" and become information clutter.
This is hard to prevent. The wiki vs. sharepoint doesn't address the clutter, it just shifts the technology base that's used to accumulate clutter.
Let's look at the clutter
Technical design info / architecture docs. Old ones don't matter. There's current and there's irrelevant. Wiki.
Last year's obsolete design information is -- well -- obsolete.
Meeting minutes, action items, etc. Action items become part of someone's backlog in a development sprint, or, they're probably never going to get done. Backlogs are wiki items. Everything else is history that might be interesting but usually isn't. If it didn't create a sprint backlog items, update an architecture, or solve a development problem, the meeting was probably a waste of time.
Project plans and roadmaps. The sprint backlog matters -- this is what a "plan and roadmap" aspires to be. If you have to supplement your plans with roadmaps, you probably ought to give up on the planning and just use Scrum and just keep the backlog current.
The original plan is someone's guess at project inception time, and not really very interesting to the current project team.
Organization business mgmt info - travel, budget info, headcount info, etc. This is a weird mixture of highly structured stuff (budget, organization) and unstructured stuff ("travel"?)
How much history do you need? None? Wiki at best. Financial or HR System is where it belongs. But, in big organizations, the accounting systems can be difficult and cumbersome to use, so we create secondary sources of information like a SharePoint page with out-of-date budget numbers because the real budget numbers are buried inside Oracle Financials.
Project pages with business analysis, requirements, etc. This is your backlog. Your project roadmap and your requirements and your analysis ought to be a single document. In the wiki.
History rarely matters. Someone's concept at project inception time of what the requirements are doesn't matter very much any more. What the requirements evolved to in their final form matters far more than any history. This is wiki material.
How old is 'too old'?
I've worked with customers that have 30-year old software. The software -- obviously -- is relevant because it's in production.
The documentation, however, is all junk. The software has been maintained. It's full of change control records. The "original" specifications would have to be meticulously rewritten with each change control folded in. Since the change control documents can be remarkably pervasive, the only way to see where the changes were applied is to read the source and -- from that -- reverse engineer the current-state specification.
If we can only understand a 30-year old app by reverse engineering the source, then, chuck the 30-year old pile of paper. It's useless.
As soon as maintenance is done, the "original" specification has been devalued.
How to clean it up?
If you create the wiki page or sharepoint site, you own it forever.
When you leave, your replacement owns it forever.
Each manager is 100% responsible for every piece of information their staff creates. They have to delete things. The weak solution is to "archive" stuff. Which is just a polite way of saying "delete" without the "D-word".
Cleanup must be every manager's ongoing responsibility. If they can't remember what it is, or why they own it, they should be required (or "encouraged") to delete it. Everything unaccessed in the last two years should be archived without question. Everything 10 years old is just irrelevant history.
It's painful, and it doesn't appear to be value-creating work. After all, we work in IT. Our job is to "write" software, not delete it. No one will do it unless compelled on threat of firing.
The cost of storage is relatively low. The cost of cleanup appears higher.
How to stop the email chain?
Refuse to participate. Create a "Break the Chain" campaign focused on replacing email chains with wiki updates (or sharepoint updates).
Be sure your wiki provides links and is faster to edit than an email.
You can't force people to give up a really, really convenient solution (Email). You have to make the wiki more valuable and almost as convenient as email.
Ramp up the value on the wiki. Deprecate email chains. Refuse to respond to email chains. Refuse to accept "to do" action items through email.

You can use Confluence Wiki for storing documents as attachements and have the Wiki's paths work as the file paths in Sharepoint.
Re: stale data: have ownership of the data (both person and team) and ensure that deliverables for the owners include maintenance of ALL the data.
As far as "Off email", this is hard to do as you can't force people to do this short of actively monitoring all email... but you can try some deliverables with metrics regarding content added to the Wiki. That way people would be more likely to want to re-use the work already done on the email to paste into Wiki to meet the "quota" instead of composing fresh stuff.
Our company and/or team used all 3 of these approaches with some degree of success in the past

Is there a reason not to have the wiki hold the files?
Also, perhaps limiting the mail server to not allowing attachments on internal emails is too draconian, but asking folks to put everything in the wiki that needs to be emailed more than once is pretty darn useful.

Efficient information management is indeed a very hard problem. We found that "the simpler the better" principle can make miracles to solve it.
Where should data go - we are big believers of the wiki approach. In fact, we use Confluence for sharing possibly every type of information, except really large binary files. For those, we use Dropbox. Its simplicity is an absolutely killer feature. (Tip: you can integrate them with the Dropbox in Confluence plugin.)
Finding stale data - in our definition, stale data is something that is not updated or viewed for a specific period of time. The Archiving Plugin of Confluence can quickly and automatically find these, then report them to the authors and administrators, who may potentially update them (or remove them, see next item). There is, of course, information that never expires, but the plugin is able to skip them after you mark the corresponding pages.
Removing stale data - we are fairly aggressive on this. If the data is not (highly) relevant anymore, clean it up now! We can safely follow this practice, because we never actually delete data. We just move outdated data to hidden archive spaces using, again, the Archiving Plugin. If we changed our mind later, it is very easy to find it in the the archive, view it or even to recover it.
More active usage - our rule: if the information is required to be persistent, don't email it. Put it to a wiki page instead. The hard thing for some people is to find the best location for the information (which space? where in the page hierarchy?). Badly organized spaces with vague scope are another big efficiency divider, unfortunately. Large companies may consider introducing a wiki gardener to cure this.

Potential legal issues with storing Social Security/Insurance Numbers (SSNs/SINs)? [closed]

Closed. This question is off-topic. It is not currently accepting answers.
Want to improve this question? Update the question so it's on-topic for Stack Overflow.
Closed 10 years ago.
Improve this question
A client using our system has requested that we store the SSNs/SINs of the end users in our database. Currently, we store minimal information about users (name, email address, and optionally, country), so I'm not overly concerned about a security breach - however, I have a suspicion there could be legal issues about storing SSNs and not taking "appropriate" measures to secure them (coming from Australia, this is my first encounter with them). Is this a valid concern?
I also read on the Wikipedia page about SINs (Canada's equivalent to SSNs) that it should ONLY be used when absolutely necessary and definitely shouldn't be used as a general identifier, or similar.
So, are there any potential legal issues about this sort of thing? Do you have any recommendations?

The baseline recommendation would be to:
Inform the user that you are storing their SSN before they use your site/application. Since the request appears to be to collect the information after the fact, the users should have a way to opt out of your system before they log in or before they put in their SSN
Issue a legal guarantee that you will not provide, sell, or otherwise distribute the above information (along with their other personal information of course)
Have them check a checkbox stating that they understand that you really are storing their SSNs
but the most important part would probably be:
Hire a lawyer well-versed with legal matters over the web

Funny thing about SSNs... the law that created them, also clearly defined what they may be used for (basically tax records, retirement benefits, etc.) and what they are not allowed to be used for - everything else.
So the fact that the bank requires your SSN to open a checking account, your ISP asks for it for high speed internet access, airlines demand it before allowing you on a plane, your local grocery/pub keeps a tab stored by your SSN - that is all illegal. Shocking, isn't it...
All the hooha around identity theft, and how easy it is thanks to a single, unprotected "secret" that "uniquely" identifies you across the board (not to mention that its sometimes used as authentication) - should never have been made possible.

Some good warning stated already here.
I'll just add that speaking of SIN (Canada's Social Insurance Number) codes, I believe it's possible to have collisions between a SIN and a SSN (in other words the same number, but two different people/countries). It shouldn't be a surprise since these are separate codification systems, but I somehow can imagine some doing data entry that may be inclined to stick a SIN into a SSN field and vis-versa (think international students in college/university as one instance - I was told by a DBA friend that he saw this happen).
A given information system may be designed to not allow duplicates, and either way, you can see why there might be confusion and data integrity issues (using a SSN column as a unique key? Hmm).

Way too many organizations in the USA use SSNs as unique identifiers for people, despite the well-documented problems with them. Unless your application actually has something to do with government benefits, there's no good reason for you to store SSns.
Given that so many organizations (mis)use them to identify people for things like credit checks, you really need to be careful with them. With nothing more than someone's name, address, and SSN, it's pretty easy to get credit under their name, and steal their identity.
The legal issues are along the lines of getting sued into oblivion for any leak of personal information that contains SSNs.

If it were me I'd avoid them like the plague, or figure out some very very secure way to store them. Additionally (not a legal expert by any extent but..) if you can put in writing somewhere that you are no way responsible if any of this gets out.

At a minimum, you want to be sure that SSNs are never emailed without some protection. I think the built-in "password to open" in Excel is enough, legally. I think email is the weakest link, at least in my industry.
Every now and then, there is a news item "Laptop Stolen: Thousands of SSNs Possibly Compromised." It's my great fear that it could be my laptop. I put all SSN containing files in a PGP-protected virtual drive.
You do have good security on your database, don't you? If not, why not?

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string