I read that if I set my enable_user_defined_functions true in the cassandra.yaml then the User defined functions (UDFs) present a security risk, since they are executed on the server side. In Cassandra 3.0 and later, UDFs are executed in a sandbox to contain the execution of malicious code. They are disabled by default.
My question is are they executed in the sandbox after I set enable_user_defined_functions true?
Unless you explicitly set enable_user_defined_functions_threads to false (which you really shouldn't do) the UDFs will be run asynchronously to a pool locked down with limited security manager and special class loader.
You should still only allow trusted sources for your UDF code though incase there are security bugs.
Related
[Question posted by a user on YugabyteDB Community Slack]
I am starting to port our Wildfly code that used “vanilla” PostgreSQL to now use YugabyteDB instead, and I am running into the following problem:
Suppose we have a J2EE bean called Manager, that has a method methodX() with annotation
#TransactionAttribute(TransactionAttributeType.NOT_SUPPORTED), a methodY() without any annotation, such that methodY() performs UPDATE on a database table, and a methodZ() similar to methodY() but which performs DELETE on the same record of the database table as methodY(). methodX() performs some logic that does not access the database and afterwards calls methodY(). If from within a method clientMethod() of a Client bean that does not have any annotation we call methodX() on the Manager bean and after that we attempt to call methodZ() on the Manager bean, then we get an error:
com.yugabyte.util.PSQLException: ERROR: Operation failed. Try again
By turning on the appropriate logging options on the PostgreSQL server being run by YugabyteDB, I see that the clientMethod() had started a transaction T1 before calling methodX(), then when the methodY() is called a new transaction T2 is started, and when methodZ() is called the transaction T1 is used.
This worked without any issues under “vanilla” PostgreSQL, so I would like to know if there are any configuration options we need to change from the defaults on the YugabyteDB yb-tserver or the PostgreSQL server that YugabyteDB runs for this to work on YugabyteDB, or if such functionality is not supported.
From further investigation, I found that while the default isolation level is “READ COMMITTED” in PostgreSQL, in YugabyteDB it is “Snapshot” (i.e., the equivalent of “REPEATABLE READ” in PostgreSQL). Also, in YugabyteDB the “READ COMMITTED” isolation level is by default mapped to “Snapshot”, unless the yb_enable_read_committed_isolation flag is set to true, in which case the “real” “READ COMMITTED” isolation level is supported. So, I set this flag, and also set ysql_default_transaction_isolation=‘READ COMMITTED’, so that the same isolation level is used as in PostgreSQL. Having done this, my scenario works in YugabyteDB as well without any errors.
However, I am concerned that I am using the READ COMMITTED isolation level, for which YugabyteDB has made some effort to keep “hidden” unless someone explicitly asks to use it. I also saw in https://docs.yugabyte.com/preview/architecture/transactions/isolation-levels a statement that the “Snapshot” isolation level is considered to be a good default for a distributed SQL database.
So, my question is whether it is not a good idea to use READ COMMITTED isolation level in YugabyteDB, and if so, why.
Your input is greatly appreciated.
The main reason to support Read Committed is to be compatible with the default in PostgreSQL which, like all defaults, is the most commonly used. Higher isolation levels prevent more anomalies and then are preferable if application can use them.
Here is an example of write consistency anomaly in PostgreSQL: READ COMMITTED anomalies in PostgreSQL - DEV Community 1
We have the same in YugabyteDB, for compatibility, but you can prevent that with a higher isolation level, like in PostgreSQL.
Note that, to prevent anomalies, higher isolation level may encounter more retry-able exceptions, that can happen at the end of the transaction (commit)
I'm confused about autosar memory protection mechanism
I have two applications, one trusted one non trusted
I configured a memory protection region range from 0x70000000 to 0x7100000 for the trusted application, and I configured an init task for the trusted application
In init task, if I try to directly write to the memory address inside the configured range it works fine.
If however, I try to write outside the configured range (still correct memory address) I go into an exception
If it happened to a non trusted application I can understand but this is a trusted one
I thought the trusted application can write to whole memory?, what I'm missing here
AUTOSAR_SWS_OS (R19-11) has a Configuration-parameter called OsTrustedApplicationWithProtection
Parameter to specify if a trusted OS-Application is executed with memory protection or not. true: OS-Application runs within a protected environment. This means that write access is limited. false: OS-Application has full write access (default)
Sounds a bit like, your trusted OSApplication is configured here like with true instead of false and therefore also write restricted.
On the other side, ch. 14 "Outlook on Memory Protection Configuration" it states:
As stated before, memory protection configuration is not standardized yet. Nevertheless it seems helpful to contribute a recommendation in this chapter, how the configuration might work
Ch. 14.1 also gives hints, how the MPU config should be handled (SWCD/BSWMD specifying the (CODE/VAR/CONST/..) memory sections and linker-input-sections), so you should not just use arbitrary memory definitions and accessing it directly, but using the AUTOSAR memory mapping way.
And what I do not understand in your case, why do you actually restrict the trusted application by giving the MPU config just this range, instead of restricting your non-trusted application's access?
In IMap configuration there is an attribute read-backup-data that can be set as true which enables a member to read the value from the backup copy, if available, in case the owner of the key is some other member.
http://docs.hazelcast.org/docs/latest-development/manual/html/Distributed_Data_Structures/Map/Backing_Up_Maps.html#page_Enabling+Backup+Reads
Then there is nearcache which will start caching results for a few datastructures locally.
http://docs.hazelcast.org/docs/latest-development/manual/html/Performance/Near_Cache/Hazelcast_Data_Structures_with_Near_Cache_Support.html
If we have 2 kinds of cluster setup:
2 members, and async-backup-count for a map is 1, and read-backup-data is true
2 members, nearcache enabled for this map
Would there be differences in these 2 approaches?
1st setup will probably use less memory, and will not be configurable. But in terms of read performance?
For two member cluster setup; enabling backup reads will provide you to access all the data locally, since both members hold all the entries as either primary or backup. This setup is not much different than using a Replicated Map (see here for details: http://docs.hazelcast.org/docs/latest-development/manual/html/Distributed_Data_Structures/Replicated_Map.html). So; when your cluster have only two members (also no clients), enabling backup reads can be more advantageous in terms of performance.
However; near cache has a bunch of configuration options, and you can decide how much data you need to access locally at any type of setup (including client-server topology). You can also decide the in-memory data format in near cache. These options can provide you more performance than enabling backup reads.
Both options are not so much different in single entry read performance (I assume near cache contains valid entry), since both don't perform a remote operation.
Why does being thread safe matter in a web app? Pylons (Python web framework) uses a global application variable which is not thread safe. Does this matter? Is it only a problem if I intend on using multi-threading? Or, does it mean that one user might not have updated state if another user... I'm just confusing myself. What's so important with this?
Threading errors can lead to serious and subtle problems.
Say your system has 10 members. One more user signs up to your system and the application adds him to the roster and increments the count of members; "simultaneously", another user quits and the application removes him from the roster and decrements the count of members.
If you don't handling threading properly, your member count (which should be 10) could easily be nine, 10, or 11, and you'll never be able to reproduce the bug.
So be careful.
You should care about thread safety. E.g in java you write a servlet that provides some functionality. The container will deploy an instance of your servlet, and as HTTP requests arrive from clients, over different TCP connections, each request is handled by a separate thread which in turn will call your servlet. As a result, you will have your servlet being call from multiple threads. So if it is not thread-safe, then erroneous result will be returned to the user, due to data corruption of access to shared data by threads.
It really depends on the application framework (which I know nothing about in this case) and how the web server handles it. Obviously, any good webserver is going to be responding to multiple requests simultaneously, so it will be operating with multiple threads. That web server may dispatch to a single instance of your application code for all of these requests, or it may spawn multiple instances of your web application and never use a given instance concurrently.
Even if the app server does use separate instances, your application will probably have some shared state--say, a database with a list of users. In that case, you need to make sure that state can be accessed safely from multiple threads/instances of your web app.
Then, of course, there is the case where you use threading explicitly in your application. In that case, the answer is obvious.
Your Web Application is almost always multithreading. Even though you might not use threads explicitly. So, to answer your questions: it's very important.
How can this happen? Usually, Apache (or IIS) will serve several request simultaneously, calling multiple times from multiple threads your python programs. So you need to consider that your programs run in multiple threads concurrently and act accordingly.
(This was too long to add a comment to the other fine answers.)
Concurrency problems (read: multiple access to shared state) is a super-set of threading problems. The (concurrency problems) can easily exist at an "above thread" level such as a process/server level (the global variable in the case you mention above is process-unique value, which in turn can lead to an inconsistent view/state if there are multiple processes).
Care must be taken to analyze the data consistency requirements and then implement the software to fulfill those requirements. I would always err on the side of safe, and only degrade in carefully analyzed areas where it is acceptable.
However, note that CPython runs only one thread context for Python code execution (to get true concurrent threads you need to write/use C extensions), so, while you can get a form of race condition upon expected data, you won't get (all) the same kind of partial-write scenarios and such that may plague C/C++ programs. But, once again. Err on the side of a consistent view.
There are a number of various existing methods of making access to a global atomic -- across threads or processes. Use them.
My VPS account has been occasionally running out of memory. It's using Apache on Linux. Support says it's a slow memory leak and has enabled MaxRequestsPerChild to deal with it.
I have a few questions about this. When a child process dies, will it cause my scripts to lose session data? Does anyone have advice on how I can track down this memory leak?
Thanks
No, when a child process dies you will not lose any data unless it was in the middle of a request at the time (which should not happen if it exits due to MaxRequestsPerChild).
You should try to reproduce the memory leak using an identical software stack on your test system. You can use tools such as Valgrind to try to detect it.
You can also try a debug build of your web server and its modules, which will enable you to detect what's going on.
It's difficult to reproduce the behaviour of production systems in non-production ones. If you have auto-test coverage of your web application, you could try using your full auto-test suite, but in practice this is unlikely to cover every code path therefore may miss the leaky one.
When a child process dies, will it cause my scripts to lose session data?
Without knowing what scripting language and session handler you are using (and the actual code) it rather hard to say.
In most cases, using scripting languages in modules or via [fast] cgi, then its very unlikely that the session data would actually be lost - although if the process dies in the middle of processing a request it may not get the chance to write the updated session back to whatever is storing the session. And in the very unlikely event it dies during the writeback, it may corrupt the session data. These are quite exceptional circumstances.
OTOH if your application logic is implemented via a daemon (e.g. a Java container) then its quite probable that memory leaks could accumulate (although these would be reported against a different process).
Note that if the problem is alleviated by setting MaxRequestsPerChild then it implies that the problem is occurring in an Apache module.
The production releases of Apache itself, in my experience, is very stable without memory leaks. However I've not used all the modules. Not sure if ExtendedStatus gives a breakdwon of memory usage by module - might be worth checking.
I've previously seen problems with the memory management of modules loaded by the PHP module not respecting PHP's memory limits - these did clear down at the end of the request though.
C.