How to cache xsds to offload w3c servers? - xsd

I'm building a web service based on SAML-P and XACML, which requires a large number of XSDs to be
considered by JAX-B/XJC for each build. This takes forever and is exceedingly unreliable, I think due to W3C throttling XSD requests to ease the load on their servers (based on their blog posting).
Worse yet, some of the W3C XSDs contain obvious typos, so these must be downloaded and patched, and the schemaLocation of referrring files edited to load the local copies. One of the primary SAML-P schema has this problem (a double >> and wildly incorrect import addresses).
I think there is a way to make Eclipse (or JAX-B, or something else; not sure what would solve this; maybe Xerces?) maintain a cache of XSDs and substitue these for http:// refs in my build (perhaps even system-wide). But I've not managed to track this down to a workable recipe. Can someone help? Thanks!

You can use a CatalogResolver for this:
http://jaxb.java.net/guide/Fixing_broken_references_in_schema.html

Related

Terraform providers vulnerability detection

Using a lot of (official and non official) terraform providers, I'm looking for a tool to perform security analysis on terraform providers before executing terraform plan/apply commands (and so executing providers code). I want to prevent malicious code from providers to be executed blindly.
I'm basically executing terraform providers mirror command to save local copies of required providers and I'm wondering if I can security scan that result.
I tested kics, checkov and tfsec but they are all looking for security issues in my terraform static code but not in providers.
Do you have any good advices regarding this topic ?
This is actually quite a good question. There are many other problems that can be reduced to same generic question - how to make sure that the thing you downloaded from the internet does not do anything malicious to you like e.g.:
How to make sure that a minecraft plugin does not hack you?
How to make sure that a spring boot dependency does not hack you?
How to make sure that a library xxx you attach to your project does not do harm to you?
Should you use docker image yyy in your project?
Truth is: everything you use has the potential to explode right in your face (or more correctly: right into the face of the system owner). That's why the system owner (usually a company) defines a set of rules to follow what is allowed and what is not allowed. No set of rules you are aware of? Below a set of rules we came up with ourselves when thinking about on-boarding a new library for some projects to use:
Do not take random stuff from github. Take only products with longer history, small bug backlog, little to none past issues in the CVE list, actively maintained.
Do static code analysis yourself. Sometimes it is possible to have tools that work on binaries level do that for you. Sometimes you can do it on source level only. In case of Java libraries, check what tools like Dependency Track think about the library and version you are about to use.
Run the code and see how it works: what does it write, what does it read, what URLs does it communicate with (do a TCP dump if necessary).
Document everything you have done somewhere.
This gives you no 100% confidence that things will not go terribly wrong. But this is a systematic approach that will reduce the risk of doing something stupid.

Maintaining a service required by two different apps

I have two node apps running on my server, each performing different tasks.
However, I now need to create a service that is going to be used by both of them. Obviously I don't want to create it in both of the apps, hence creating two codes to maintain.
My current thought is to have a separate repository only for this service, then require it from each app as an outsourced module.
I was wondering whether there are better methods, or if this method might encounter problems I'm not seeing
Well if you strictly follow the rule that shared means only shared things in the common package, I don't see any issues with that. The problem comes when you try to put the logic in one repo which is supposed to be only used for one. In that scenario you will need to rebuilt both apps as the repo or package is depedency of both.
One issue that I have seen people face is when they work with shared repo is that when you need to tweak things just because they are at common place. for example you have a method that does one job and suddenly you want to use that in other place as well but with a tweak. In that case you end up modifying the shared code to support the second repo but since it is shared, you will have to do regression testing of both apps.
I see shared repo candidates being drivers, client etc. I guess rest is up to your project structure and judgement. In this case there is nothing correct or incorrect. Hope this is making sense.

What are the main reasons against the Windows Registry?

If i want to develop a registry-like System for Linux, which Windows Registry design failures should i avoid?
Which features would be absolutely necessary?
What are the main concerns (security, ease-of-configuration, ...)?
I think the Windows Registry was not a bad idea, just the implementation didn't fullfill the promises. A common place for configurations including for example apache config, database config or mail server config wouldn't be a bad idea and might improve maintainability, especially if it has options for (protected) remote access.
I once worked on a kernel based solution but stopped because others said that registries are useless (because the windows registry is)... what do you think?
I once worked on a kernel based solution but stopped because others said that registries are useless (because the windows registry is)... what do you think?
A kernel-based registry? Why? Why? A thousand times, why? Might as well ask for a kernel-based musical postcard or inetd for all the point it is putting it in there. If it doesn't need to be in the kernel, it shouldn't be in. There are many other ways to implement a privileged process that don't require deep hackery like that...
If i want to develop a registry-like System for Linux, which Windows Registry design failures should i avoid?
Make sure that applications can change many entries at once in an atomic fashion.
Make sure that there are simple command-line tools to manipulate it.
Make sure that no critical part of the system needs it, so that it's always possible to boot to a point where you can fix things.
Make sure that backup programs back it up correctly!
Don't let chunks of executable data be stored in your registry.
If you must have a single repository, at least use a proper database so you have tools to restore, backup, recover it etc and you can interact with it without having a new set of custom APIs
the first one that come to my mind is somehow you need to avoid orphan registry entries. At the moment when you delete program you are also deleting the configuration files which are under some directory but after having a registry system you need to make sure when a program is deleted its configuration in registry should be deleted as well.
IMHO, the main problems with the windows registry are:
Binary format. This loses you the availability of a huge variety of very useful tools. In a binary format, tools like diff, search, version control etc. have to be specially implemented, rather than use the best of breed which are capable of operating on the common substrate of text. Text also offers the advantage of trivially embedded documentation / comments (also greppable), and easy programatic creation and parsing by external tools. It's also more flexible - sometimes configuration is better expressed with a full turing complete language than trying to shoehorn it into a structure of keys and subkeys.
Monolithic. It's a big advantage to have everything for application X contained in one place. Move to a new computer and want to keep your settings for it? Just copy the file. While this is theoretically possible with the registry, so long as everything is under a single key, in practice it's a non-starter. Settings tend to be diffused in various places, and it is generally difficult to find where. This is usually given as a strength of the registry, but "everything in one place" generally devolves to "Everything put somewhere in one huge place".
Too broad. Its easy to think of it as just a place for user settings, but in fact the registry becomes a dumping ground for everything. 90% of what's there is not designed for users to read or modify, but is in fact a database of the serialised form of various structures used by programs that want to persist information. This includes things like the entire COM registration system, installed apps, etc. Now this is stuff that needs to be stored, but the fact that its mixed in with things like user-configurable settings and stuff you might want to read dramatically lowers its value.

Distributing a bundle of files across an extranet

I want to be able to distribute bundles of files, about 500 MB per bundle, to all machines on a corporate "extranet" (which is basically a few LANs connected using various private mechanisms, including leased lines and VPN).
The total number of hosts is roughly 100, and the goal is to get a copy of the bundle from one host onto all the other hosts reliably, quickly, and efficiently. One important issue is that some hosts are grouped together on single fast LANs in which case the network I/O should be done once from one group to the next and then within each group between all the peers. This is as opposed to a strict central server system where multiple hosts might each fetch the same bundle over a slow link, rather than once via the slow link and then between each other quickly.
A new bundle will be produced every few days, and occasionally old bundles will be deleted (but that problem can be solved separately).
The machines in question happen to run recent Linuxes, but bonus points will go to solutions which are at least somewhat cross-platform (in which case the bundle might differ per platform but maybe the same mechanism can be used).
That's pretty much it. I'm not opposed to writing some code to handle this, but it would be preferable if it were one of bash, Python, Ruby, Lua, C, or C++.
I think all these problems have been solved by modern research into p2p networking and well packaged into nice forms. A bit of script and bit torrent should solve these problems. torrent clients exist for all modern OSs, then a script on each machine to check a location for a new torrent file, start the DL, then delete the old bundle once the DL has finished.
What about rsync?
I'm going to suggest you use compie's idea of rysnc to copy the files in which case you can use a scripting language of your choice.
On the propagating system you will need a script containing some form of representation of the hosts and a matrix between them weighted with the speed. You then need to calculate a minimum spanning tree from that information. From that, you can then send messages to the systems to which you intend to propagate detailing the MST and the bundle to fetch, whereby that script/daemon begins transfer. That host then contacts the hosts over the fastest links...
You could implement it in bash - python might be better or a custom C daemon.
When you update the network you'll need to update the matrix based on latest information.
See: Prim's Algorithm.

Common Lisp: What's the best way to use libraries in a shared hosting environment?

I was thinking about this the other day and wanted to see what the SO community had to say about the subject.
As it stands right now Common Lisp is getting some attention as a web development platform, and with good reason (of which I'm sure you are already convinced).
I was wondering how one would go about using a library in a shared environment in a similar fashion to PHP.
If I set up something like SBCL as an interperter to interpret FASL files like Python or PHP, what would be the best way to use libraries (like clsql for instance).
Most come as asdf installable libraries, but it would be a stupid amount of overhead to require and install the library each and every time a request is made.
Keeping in mind this is for shared hosting; would it be best to ..
1) Install system wide copies of the libraries for use in applications; reduces space, but there may be problems with using the correct version of the library.
2) Allow users (through a control panel) to install local copies for themselves; more space, no version problems.
3) Tell them to wrap it into a module and load it on demand like Python does (I'm not sure if/how this can be done with Lisp). Just being able to load a library for use would be the best option, but I don't think a lot of them are designed to be used this way.
Anyways, looking to hear your opinions, thanks.
There are two ways I would look at it:
start a Lisp for each request
This way it would be much better that the Lisp is a saved image with all necessary libraries and data loaded. But that approach does not look very promising to me.
run a Lisp and let a frontend (web browser, another web server, ...) connect to it
This way you can either start a saved image or a Lisp that loads a bunch of stuff once and serves the requests.
I like to use saved images/applications in a deployment scenario. They can be quickly started, contain all the necessary software and are independent of library changes.
So it might be useful to provide pre-configured Lisp images that contain the necessary software or let the user configure and save an image.

Resources