Optimizations in security systems - security

I would like to know what we can mean by saying a optimized security system(physical or logical security system).
Does it mean something like a system which can monitor performance of services, SQL, DB maintenance, logs etc.
Thanks

Optimized is a general term, you will have to get specific in terms of defining what you need to consider it optimized to an "acceptable" level. Plus there are different kinds of "optimization", such as for speed, memory usage, maintainability, etc.
Are you trying to figure out some criteria so that you can market your product as "optimized" and be able to explain it if someone asks what you mean?
If so, you need to figure out what your customers (or potential customers) actually care about. If they care about video resolution and disk space usage (how much the system can store before having to archive elsewhere), then you need to make your application smart (optimized! :) in those areas.
THEN, you could be more specific in your marketing and say, "optimized to use XYZ resolution and store up to 2 weeks of video on a standard hardware setup!" - which would actually mean something tangible to your customers, and show them that you care about what they care about.

Related

How to export specific price and volume data from the LMAX level 2 widget to excel

Background -
I am not a programmer.
I do trade spot forex on an intraday basis.
I am willing to learn programming
Specific Query -
I would like to know how to export into Excel in real time 'top of book' price and volume data as displayed on the LMAX level 2 widget/frame on -
https://s3-eu-west-1.amazonaws.com/lmax-widget/website-widget-quote-prof-flex.html?a=rTWcS34L5WRQkHtC
In essence I am looking to export
price and volume data where the coloured flashes occur.
price and volume data for when the coloured flashes do not occur.
I understand that 1) and 2) will encompass all the top of book prices and volume. However i would like to keep 1) and 2) separate/distinguished as far as data collection is concerned.
Time period for which the collected data intends to be stored -> 2-3 hours.
What kind of languages do I need to know to do the above?
I understand that I need to be an advanced excel user too.
Long term goals -
I intend to use the above information to make discretionary intraday trading decisions.
In the long run I will get more involved with creating an algo or indicator to help with the decision making process, which would include the information above.
I have understood that one needs to know coding to get involved in activities such as the above. Hence I have started learning C ++. More so to get a hang/feel for coding.
I have been searching all over the web as to where to start in this endeavor. However I am quite confused and overwhelmed with all the information.
Hence apart from the specific data export query, any additional guidelines would also be helpful.
As of now I use MT4 to trade. Hence I believe to do the above - I will need more than just MT4.
Any help would be highly appreciated.
Yes, MetaTrader4 is still not able ( in spite of all white-label-ed Terminals' OrderBook Add-On(s) marketing and PR efforts ) to provide an OrderBook-L2/DoM-data into your MQL4 / NewMQL4 algorithm for any decision making. Third party software tools' integration is needed to make MQL4-code aware of the real-time L2/DoM-data.
LMAX widget has impressive look & feel, however for your Excel export it requires a lot of programming efforts to re-use it for an automated scanner to produce data for 1 & 2 while there may be some further, non-technical, troubles on legal / operational restrictions for automated scanner to be operated on such data-source. To bring an example, the data-publisher policy restrict automated Options-pricing scanners for options on { FTSE | CAC | AMS | DAX }, may re-visit the online published data-sources no more than once a quarter of an hour and get blocked / black-listed otherwise. So a care and a proper data-source engineering is in place.
Size of data collection is another issue. Excel has some restrictions on an amount of rows/columns that may get imported. Large data-files, the more the CSV-imports may strike these limits. L2/DoM-data, collected for 2-3 hours just for one single FX Major may go beyond such a limit, as there are many records per second ( tens, if not hundreds, with just a few miliseconds between them ). Static file-size of collected data-records take typically several minutes to just get written on disk, so proper distributed processing data-flow-design and non-blocking-fileIO engineering is a must.
Real-time system design is the right angle to view the problem solution approach, rather than just some programming language excersise. Having mastered some programming language is a great move, nevertheless, so called robust real-time system design, and Trading software is such a domain, requires, with all respect, a lot more insights and hands-on experience than to make an MQL4 code run multi-thread-ed & multi-process-ed with a few DLL services for a Cloud/Grid-based distributed processing system.
How much real-time traffic is expected to be there?
For just a raw idea, what the Market can produce per second, per milisecon, per microsecond, let's view a NYNEX traffic analysis for one instrument:
One second can have this wild relief:
And once looking into 5-msec sampling:
How to export
Check if the data-source owner legally permits your automated processing.
Create your own real-time DataPump software, independent of the HTML-wrapped Widget
Create your own 'DB-store' to efficiently off-load scanned data-records from real-time DataPump
Test the live data-source >> DataPump >> DB-store performance & robustness on being able to serve error-free a 24/6 duty for several FX Majors in parallel
Integrate your DataPump fed DB-store local data-source for on-line/off-line interactions with your preferred { MT4 | Excel | quantitative-analytics } package
Integrate a monitoring of any production environment irregularity in your real-time processing pipeline, which may range from network issues, VPN / hosting issues, data-source availability issues to an unexpected change in the scanned data-source format/access conditions.

How much risk is exposing all the sources to a third party? [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about programming within the scope defined in the help center.
Closed 7 years ago.
Improve this question
I've been arguing with a co-worker about how necessary it is to wipe or destroy the hard disks that were used for storing the sources and are replaced with bigger ones or discarded.
His point is that no piece of source code exposed to a third party gives that party any competitive advantage. My point is that it only takes ten minutes to set up a wiping program and start it before leaving and in the morning you have a disk that contains no data that could be possibly recovered - doesn't hurt and compeletely removes the risk.
Now really how risky is it to throw away a hard drive containing a working copy of a repository of a commercial product having 10 million lines of source code?
The Drake Equation states that
N = R * d * p * e * c * x * y * z
where
N is the probability that doing this will result in the bankruptcy of your company, leaving you and all your co-workers unemployed and starving.
R is the number of hard drives discarded every year without first being erased
d is the fraction of those hard drives that are fished out of dumpsters
p is the fraction of recovered drives that are ever plugged in and fired up
e is the number of such drives that are subsequently listed on eBay because their contents look interesting
c is the number of competitors you have who browse eBay looking for trade secrets
x is the probability that your discarded drive contains something they can use
y is the probability that they do actually use that information
z is the probability that their use of such information ruins your company.
To estimate the risk that someone will work out it was you and sue/prosecute you for the damage you caused, calculate
(N / t) * m
where t is the number of people on your team, and m is the number of managers who are paying enough attention to work out who did what.
If you can prove that any of the coefficients involved is zero, then your strategy is risk-free. Otherwise there's a very small chance you'll bankrupt your company, starve your colleagues and end up in jail.
I also would not worry that much about leaking source code - the source alone is of limited value without the technical and domain knowledge required to use it. If you just want to copy stuff, you'll pirate the binary. Still, it's probably better to keep it private if you don't want to release it.
I'd be more concerned about private data on the drives. Private or confidential business email, test data with confidential information (think employee database or similar). That might cause you/your company lawsuits from affected parties.
So definitely wipe the disks. Even just checking that there's nothing sensitive on the disks is more work than just wiping them.
Without wiping it, it's very risky. If you see the disk on ebay, you'll find that most buyers will run a recovery software on it.
In order to sell it safely, it's enough to overwrite the whole disk once. The myth that it's possible to recover data after it has been overwritten is really a myth. Not even the NSA could do it.
If you don't have a special disk wiper, either use a script language to write a single big file onto the disk until it's full or format the disk and uncheck the "quick format" option. On Linux/Unix, use dd if=/dev/zero of=/dev/xxx and be really really sure that the device given is the correct one.
It all boils down to the unintended release of somebody's intellectual property with an associated value.
Who owns the Intellectual Property?
If it belongs to your company then the Board should be very annoyed to see an asset being released - it complicates corporate actions (Q. Is there any chance that other parties have access to this technology?)
SO WIPE THE DISK
If it belongs to a third party (perhaps work done by your company for them) then they'll be pissed (Q. Can we have our money back please?)
SO WIPE THE DISK
Aren't there any corporate IT standards in your organisation? Are you likely to get asked difficult questions?
SO WIPE THE DISK
I wouldn't take such a big risk. I'd do it on every hard disk I want to give/sell:
Boot up a Linux system from a Live CD/USB and run:
shred /dev/xxx
I'd say it depends on the source code. I'm sure Google wouldn't risk throwing a hard-drive away that contains their complete search algorithm. That would sell on Ebay. On the other hand, if yours is 'just another application' for some insurance company which won't interest any living soul except for you and the company itself, then why bother?
Then again, if you're really concerned, just grab a big hammer and smash your harddrive to smithereens.
While the source code may be of limited value to a 3rd party there is always a bit more in the source code than just pure statements, there may be comments describing algorithms or trade secrets, names/email addresses of programmers/customers or code describing some encryption scheme/copy protection. If somebody is knowledgeable and has patience he can learn a lot from the code.
The bottom line: better be safe than sorry.
It is a question of how to treat information.Any proprietary information, source code, documents, or any other type of information that can fall into the hands of a competitor, or anyone else that may misuse or take advantage, is simply to be avoided.
Unless it needs huge investments, big chunks of your time, and is generally causes hassles for you, there is no reason not to cleanup before you throw it out of your sight.
I am nervous that IP and information security (even if someone didnt strictly classify it as such) is stilla matter of debate in most of our work zones.
Any malicious, competent person with your source code has a much better chance of finding security holes and exploits in your system, even ones that you might not be aware of.
It may not compile, but it's a tour-guide into your world. Wipe it.

Eventual Consistency

I am in the early stages of design of an application that has to be highly available and scalable. I want to use an eventual consistency data model for this for a number of reasons. I know and understand why this is an unpopular architectural choice for many solutions, but it's important in my case.
I am looking for real-world advice, best-practices and gotchas to look out for when dealing with distributed / document-style databases. And particularly areas around e-commerce (shopping cart style) apps that traditionally are easier to put together with a relational db.
I understand using these types of DB is challenging, but hey, Google and E-bay use them so they can't be that hard ;-) Any advice would be appreciated.
If you want to have a Distributed System (that "Eventual Consistency" thing) you need people, build, maintain and to operate it.
I found that there are three classes of people which have very little problems with "Eventual Consistency":
People with a solid background in distributed systems. They have learned about Eventual Consistency Byzantine Failures and stuff like that. If you understand that Paxos is not about holidays, you are probably one of them.
People experienced in network programming. They might miss the theoretical background but have an intuitive understanding of asynchronity and the "no global clocks & counters" paradigm. If you own at least 8 books by Richard Stevens you are probably one of them.
Very experienced coders which had little exposure to RDBMS. Kernel guys, people from scientific computing and the gaming industry come to mind.
All in all this people are very sought after in the job market. For example 75% or so of the academics in distributed systems leave for institutions who run big, self-designed distributed systems, e.g. the stock exchanges.
The whole thing got somewhat simpler with offerings like Hardoop, SimpleDB and CouchDB but it is still a big challenge to build something on distributed systems technology.
On the other Hand RDBMS are a very fine pice of engineering. They are well understood and expertise on them is available the job market. There are a lot of decent tools, education opportunities and lots of highly skilled experts are available to be rented by the hour. So think twice of you can't get on with a RDBMS approach - perhaps coupled with some clever cheating. I usually point students to the Lifejournal architecture.
For Distributed Databases there is much less experience. That's exactly the reason you have found so little advice so far.
If you are determined to use "Eventual Consistency" I think besides immature tools the main challenge is the mindset of everyone involved. Are your API users (coders) and application users (your employees and your customers) are willing and able to accept the inconsistency? Can you hide it from certain classes of users? We are not used to that mindset that computers are inconsistent. Something is in stock or it isn't. "Maybe" isn't an answer users expect.
Also keep in mind that "eventual" can mean a very long time to algorithm designers. For how long can you accept inconsistency?
For a shopping cart application you might want to go truly distributed: Use the Clients Browser as data store. On checkout you can submit the cart to the server side batch processing system. This means for the catalog you need read only high availability (easier) and the cart submission is a very narrow interface with no need for transactions. Later on the processing of the order has no (Soft) real time requirements and thus is easier.
BTW: Last time I checked on E-Bay architecture they where big in RDBMS but it may have changed since then. (Edit: it did change - see comments)
The only solution to your problem is to decide which tradeoffs in the CAP theorem are right for you, then begin implementing it.
mdorseif has a great point. There are many configurations of to what extent you trade off consistency, availability, and partitioning. You have two main options.
Go the route of an in-house distributed system (takes lots of expertise and research)
Vet and experiment with a number of distributed databases to decide what can handle your requirements as scale.
This is probably an over-simplification. A real production-ready pipeline is an eco-system. It'll at least get you on the right track.
Appnexus is an ad platform that uses hbase for very high availability and eventual consistency. They talk a lot about this here.
An article on http://highscaleability.com outlines how the New York Times implemented RabbitMQ alongside Cassandra across a WAN for fault tolerance and high availability.
MongoDB provides a great deal of flexibility in balancing consistency with availability with their implementation of write concerns. They've got excellent documentation that highlights exactly how to implement it with all the gotchas (including partitioning). They implement the two-phase commit to maintain state across the network (on their config servers).
Google has a great paper on this subject, their photon project implements a highly scalable, highly reliable system with the paxos algoritm at the heart of it alongside a few other techniques. It also happens to be very consistent (with end-to-end latency of about 10s) and fault tolerant, standing up to regional failures.
All systems build on distributed computing models are build on CAP and BASE. Here the main concern is If our system provides Availability and Partition Tolerance we cannot have true consistency but we can have eventual consistency.
The idea behind eventual consistency is that each node is always available to serve requests. As a trade-off, data modifications are propagated in the background to other nodes. This means that at any time the system may be inconsistent, but the data is still largely accurate.
Source: http://www.techspritz.com/eventual-consistency-and-base-model/
How to achieve high availability and scalability using relational databases is well known and there is a vast body of knowledge out there on how to do this!
Google is a special case which does not apply to most sites, very very high volumes of queries, very very large amounts of data, and, most importantly no Service Level Agreements with most of its users. There is no correct answer to a Web search only better answers, for the average user Google is good enough, if Google misses a vital page from a search list you as a user cannot complain.
E-Bay is a rather different case, somehow they have persuaded there users and customers to accept poor service in exchange for theoretically lower prices -- good on them but this is not an option for every business.

Production, Test, Developer Environments vs Security

What are current practices for enabling developers to build systems that contain private data? Can anyone point to a "best practices" guide for that sort of thing?
We have a Catch-22 here in that developers need to write applications that go against systems that have data that is considered "private." The IT administration would like for us developers to not have access to the data (ie. provide a schema or data structure, but not data itself) whereas most developers (myself included) would like to have access to the production data since not having a representative dataset can lead to bad assumptions (eg. the format of data) and bugs later on.
Does anyone have any formalized "best practices" for this type of thing? Especially official guildines from some "BigCo" (eg. Microsoft, IBM) might help since it is needed to convince management.
My view of the world may be different, as I'm based in the UK, but for the past 20-odd years, I've worked primarily in the public sector on systems handling sensitive data.
The rules are **completely** cut-and-dried. No production data is allowed on the development estate.
As a fundamental principle, we do not want to be responsible for the loss of sensitive data. The users are perfectly good at that, themselves.
Within the past 12 months, my wife has moved from the same regime to one in the private sector where they allow developers access to production data and she's horrified by it. The legal implications (in the UK, at least) can be severe.
Developers don't **need** access to production data. It's simply laziness. Define and create test data to exercise defined test cases (including edge cases) and don't rely on the random-esque nature of production data.
If you **must** use production data (i.e. you manage to convince someone who doesn't know any better that it's acceptable), ensure the data is anonymised **before** it reaches the development estate.
Often times, a subset of sanitized data will be provided that is representative of the private data, but not the private data itself.
At my company, we started using Red-gate's data generator to generate test data. There is a bit of setup, but you can use the tools to generate very usable test data. Yes, I would prefer to use live production data, but it's not feasible (especially if you need to consider in HIPAA). It uses regex for each column and allows you to use look-up table's for related tables.
At MediumCo, we strip proprietary data out of our production data in Test and Dev. It has hurt us a little in the past to not have exactly-representative data, but the clients have asked about this point before, and it's usually not an issue, as the environments are populated with a lot of fake proprietary data.
I don't have any best practices paper or anything. But I would think that if you're developing out of an environment that is as protected as the environment that hosts the data in production, there wouldn't be a lot of argument to be made against it.
That is, if your production database is in a datacenter hosted and controlled and secured by your IT staff, if you have a development database that lives in the exact same scenario and doesn't offer any new ways to access the information - you would be in pretty good shape. As an added token of good will - it might be nice to offer to allow anyone worried about security a chance to do some kind of penetration test to ensure that you're telling the truth about security.
The other side of this, of course, is the analysis of the cost for not using the data: that is, it will lead to buggier code, which will cost $xxxxxx.xx in development time vs. virtually no cost to allow a small subset of your development team access to said data.
To avoid the need to manually sanitise/anonymise data, you could use random text replacement - to replace every alphanumeric character in each text field with a random alphanumeric. This:
keeps the data similar in length, size etc. from the developer's point of view
does not cause problems with character sets
leaves date and number fields untouched, which allows for accurate testing with respect to date ranges and quantities
will satisfy most privacy requirements
If you wanted to go a little further you could run random number-for-number replacement on telephone numbers and zip codes, while using alphanumeric replacement on other text fields.
Having an automated replacement script allows you to get up-to-date data dumps from the live system regularly, so your tests are up-to-date with respect to the size and variability of the data in practice.
It does mean that a small number of operations will not be realistic (e.g. indexing on name fields, which in real life are clustered around common letters) but these should be limited.

How do you measure if an interface change improved or reduced usability?

For an ecommerce website how do you measure if a change to your site actually improved usability? What kind of measurements should you gather and how would you set up a framework for making this testing part of development?
Multivariate testing and reporting is a great way to actually measure these kind of things.
It allows you to test what combination of page elements has the greatest conversion rate, providing continual improvement on your site design and usability.
Google Web Optimiser has support for this.
Similar methods that you used to identify the usability problems to begin with-- usability testing. Typically you identify your use-cases and then have a lab study evaluating how users go about accomplishing certain goals. Lab testing is typically good with 8-10 people.
The more information methodology we have adopted to understand our users is to have anonymous data collection (you may need user permission, make your privacy policys clear, etc.) This is simply evaluating what buttons/navigation menus users click on, how users delete something (i.e. changing quantity - are more users entering 0 and updating quantity or hitting X)? This is a bit more complex to setup; you have to develop an infrastructure to hold this data (which is actually just counters, i.e. "Times clicked x: 138838383, Times entered 0: 390393") and allow data points to be created as needed to plug into the design.
To push the measurement of an improvement of a UI change up the stream from end-user (where the data gathering could take a while) to design or implementation, some simple heuristics can be used:
Is the number of actions it takes to perform a scenario less? (If yes, then it has improved). Measurement: # of steps reduced / added.
Does the change reduce the number of kinds of input devices to use (even if # of steps is the same)? By this, I mean if you take something that relied on both the mouse and keyboard and changed it to rely only on the mouse or only on the keyboard, then you have improved useability. Measurement: Change in # of devices used.
Does the change make different parts of the website consistent? E.g. If one part of the e-Commerce site loses changes made while you are not logged on and another part does not, this is inconsistent. Changing it so that they have the same behavior improves usability (preferably to the more fault tolerant please!). Measurement: Make a graph (flow chart really) mapping the ways a particular action could be done. Improvement is a reduction in the # of edges on the graph.
And so on... find some general UI tips, figure out some metrics like the above, and you can approximate usability improvement.
Once you have these design approximations of user improvement, and then gather longer term data, you can see if there is any predictive ability for the design-level usability improvements to the end-user reaction (like: Over the last 10 projects, we've seen an average of 1% quicker scenarios for each action removed, with a range of 0.25% and standard dev of 0.32%).
The first way can be fully subjective or partly quantified: user complaints and positive feedbacks. The problem with this is that you may have some strong biases when it comes to filter those feedbacks, so you better make as quantitative as possible. Having some ticketing system to file every report from the users and gathering statistics about each version of the interface might be useful. Just get your statistics right.
The second way is to measure the difference in a questionnaire taken about the interface by end-users. Answers to each question should be a set of discrete values and then again you can gather statistics for each version of the interface.
The latter way may be much harder to setup (designing a questionnaire and possibly the controlled environment for it as well as the guidelines to interpret the results is a craft by itself) but the former makes it unpleasantly easy to mess up with the measurements. For example, you have to consider the fact that the number of tickets you get for each version is dependent on the time it is used, and that all time ranges are not equal (e.g. a whole class of critical issues may never be discovered before the third or fourth week of usage, or users might tend not to file tickets the first days of use, even if they find issues, etc.).
Torial stole my answer. Although if there is a measure of how long it takes to do a certain task. If the time is reduced and the task is still completed, then that's a good thing.
Also, if there is a way to record the number of cancels, then that would work too.

Resources