Implementation of Varnish's `vcl_hash`

Implementation of Varnish's `vcl_hash` - varnish

It's my understanding that varnish's vcl configuration files are just C files. It's my new understanding that varnish's VCL files are a domain specific language that shares many similarities with C. However, I'm not familiar with modern C, and pretty rusty with my cave-man college C, and I don't understand how varnish's vcl_hash function works.
Specifically, a typical vcl_hash function will look like this.
sub vcl_hash {
hash_data(req.url);
if (req.http.host) {
hash_data(req.http.host);
} else {
hash_data(server.ip);
}
return (hash);
}
The hash_data function calls seem pretty straight forward -- I assume its adding the desired information to consider for the cache-key into a data structure of some kind.
What's confusing me is the final call
return (hash);
What is vcl_hash returning here? What is (hash)? Is it a function? If so, how is C calling it without parenthesis (hash()) -- or is varnish doing something clever behind the scenes?

The return statements in VCL actually refer to "actions". It effectively forces Varnish to go into the next phase of the request processing. Returning nothing will It actually seems strange to return (hash) from vcl_hash itself, as generally you would do that from vcl_recv to tell it to kick into the object lookup phase.
Normally vcl_hash would return (lookup) or nothing at all (to complete with the "built-in" vcl_hash, which applies hostname and url). "lookup" will short-circuit any additional vcl_hash implementations that might exist and go straight into the lookup phase.

Hash is a Varnish variable, from the varnish documentation:
The hash key used to refer to an object in the cache. Used when both
reading from and writing to the cache.
Have in mind that you can find hash as req.hash (req.hash is for Varnish versions older than 3.0)
hash_data works as you said, more specifically (again, from varnish documentation):
hash_data(str)
Adds a string to the hash input. In default.vcl hash_data() is called on the host and URL of the request.

Related

Passing Varnish struct variables across VCL reloads

I built a Varnish VMOD that defines an object, which is instantiated in vcl_init and is always kept in memory, and used in individual requests.
My configuration is split up in several VCL files, that get loaded from a "master" VCL depending on some request parameters.
The master VCL also instantiates the object in question, which I want to use in another VCL. The reason why I don't instantiate the object in the same VCL I use it in, is that I have another VCL that defines some ACL-restricted routes to update the object from a data source.
E.g. master.vcl:
sub vcl_init {
new myobj = mymodule.myobject();
}
sub vcl_recv {
if (req.url ~ "^pub/") {
return (vcl (pub_vcl));
}
// Other switches...
}
pub.vcl:
sub vcl_recv {
if myobj.mymethod() {
set req.http.x-bogus = "true";
}
}
But in this case, compilation fails because myobj is undefined in pub.vcl, which means it does not carry from master.vcl.
I also thought about adding the test and header setting in master.vcl before loading pub.vcl, but that won't work because loading a VCL calls std.rollback(req) which unsets all the request headers, which is the only variable accessible in vcl_recv.
Is there a way to pass this state across VCL reloads?
Thanks.

You can't do this with objects directly as they are scoped by the VCL and can't "escape" it. As you've experienced, you need to load the labeled vcl first, so you also need to create the object in it.
But nothing prevents you from creating objects that reference a global variable so all objects have access to the same data.
Alternatively, you can use the Event function to use a PRIV_VCL (https://stackoverflow.com/a/60753085) also referencing a global pointer and avoid using objects completely. This is what is done here for example: https://github.com/varnish/varnish-modules/blob/master/src/vmod_vsthrottle.c#L345

Performing a distributed search through spark-solr

I'm using spark-solr in order to perform Solr queries. However, my searches don't work as they're supposed to because for some reason the requests being generated by spark prevent the searches from being distributed. I have discovered it by looking at the Solr logs where I saw that a distrib=false parameter is added to the sent requests. When executing the queries manually (not using spark) with distrib=true the results were fine.
I was trying to set the parameters sent by spark by changing the "solr.params" value in the options dictionary (I'm using pyspark):
options = {
"collection": "collection_name",
"zkhost": "server:port",
"solr.params": "distrib=true"
}
spark.read.format("solr").options(**options).load().show()
This change did not have any effect: I still see in the logs that a distrib=false parameter is being sent. Other parameters passed through the "solr.params" key (such as fq=something) do have an effect on the results. But it looks like spark insists on sending distrib=false no matter what I do.
How do I force a distributed search through spark-solr?

The easy solution is to configure the request handler to run distributed queries using an invariant. The invariant forces the distrib parameter to have a true value even if spark-solr is trying to change it in query time. Introducing the invariant can be done by adding the following lines under the definition of your request handler entry in solrconfig.xml:
<lst name="invariants">
<str name="distrib">true</str>
</lst>
While the introduction of the invariant is going to fix the problem, I think it's kind of a radical solution. This is because the solution involves hiding a behavior in which you overload the value of a parameter. By introducing the invariant you cannot decide to set distrib to false: even if your request explicitly does so, the value of distrib would still be true. This is too risky in my opinion and that's why I'm suggesting another solution which might be harder to implement but wouldn't suffer from that flaw.
The solution is to implement a query component which is going to force distrib=true only when receiving a forceDistrib=true flag as a parameter.
public class ForceDistribComponent extends SearchComponent {
private static String FORCE_DISTRIB_PARAM = "forceDistrib";
#Override
public void prepare(ResponseBuilder rb) throws IOException {
ModifiableSolrParams params = new ModifiableSolrParams(rb.req.getParams());
if (!params.getBool(FORCE_DISTRIB_PARAM, false)) return;
params.set(CommonParams.DISTRIB, true);
params.set(FORCE_DISTRIB_PARAM, false);
rb.req.setParams(params);
}
}
After building the component you can configure solr to use it by adding the component to solrconfig.xml and set your request handler to use it.
Adding the component to solrconfig.xml is done by adding the following entry to the solrconfig.xml file:
<searchComponent name="forceDistrib" class="ForceDistribComponent"/>
Configuring the request handler to use the forceDistrib component is done by adding it to the list of components under the request handler entry. It must be the first component in the list:
<arr name="components">
<str>forceDistrib</str>
<str>query</str>
...
</arr>
This solution, while more involved than simply introducing an invariant, is much safer.

PHP void functions

PHP 7.1 will introduce the void function type.
https://wiki.php.net/rfc/void_return_type
In which cases is it useful to explicitly specify that a function is a "void" type instead of simply return; without any precision?

This would be helpful to the developers working on the project for a few reasons.
Provides a clear indication that the function does not return a value
The above could equally apply to auto-generated docs
Could be used for unit testing
Could help prevent coding errors, where a developer accidentally tries to return, or expect, a value

using spring cache read only, how set spring cache redis read only

when I use spring cache with redis, I use it in two app, the one read and write,the other is only read,how can I config?
I try do like this, but it does not work!
#Cacheable(value = "books", key = "#isbn", condition = "false")
Can anyone help ?

You have misunderstood the purpose of the #Cacheable annotation's "condition" attribute. Per the documentation...
If true, the method is cached - if not, it behaves as if the method is
not cached, that is executed every since time no matter what values
are in the cache or what arguments are used.
The condition attribute just determines whether the cache (e.g. Redis) is consulted first, before executing the (potentially expensive) method. If condition evaluates to false, then the method will always be executed and the result subsequently cached.
In the read-only app, I am assuming you want the cache consulted first, if the value is not in the cache, then execute the method, however, DO NOT cache the result. Is this correct?
If so, then you only need specify the unless attribute instead of the condition attribute like so...
#Cacheable(value="books", key="#isbn", unless="true")
void someBookMutatingOperation(String isbn, ...) { .. }
If, however, you want to avoid the cacheable method invocation in the read-only (version of the) app altogether and just consult the cache regardless of whether a value actually exists in the cache or not, then your problem is quite a bit more complex/difficult.
Spring's Cache Abstraction operates on the premise that if a value is not in the cache then it will return null to indicate a cache miss, which is then followed by a subsequent method invocation. Only when a cache returns a value for the specified key(s) will the method invocation be avoided.
Without a custom extension (perhaps using (additional) AOP interceptors) there is no way to avoid the OOTB behavior.
I will not elaborate on this later technique unless your use case requires it.
Hope this helps.

#John Blum
thanks! happy new year.
your answer inspired me, I have read a part of the spring cache source code. the CacheInterceptor class. the CacheAspectSupport class.
private Object execute(CacheOperationInvoker invoker, CacheOperationContexts contexts) {
// Process any early evictions
processCacheEvicts(contexts.get(CacheEvictOperation.class), true, ExpressionEvaluator.NO_RESULT);
// Check if we have a cached item matching the conditions
Cache.ValueWrapper cacheHit = findCachedItem(contexts.get(CacheableOperation.class));
// Collect puts from any #Cacheable miss, if no cached item is found
List<CachePutRequest> cachePutRequests = new LinkedList<CachePutRequest>();
if (cacheHit == null) {
collectPutRequests(contexts.get(CacheableOperation.class), ExpressionEvaluator.NO_RESULT, cachePutRequests);
}
Cache.ValueWrapper result = null;
// If there are no put requests, just use the cache hit
if (cachePutRequests.isEmpty() && !hasCachePut(contexts)) {
result = cacheHit;
}
// Invoke the method if don't have a cache hit
if (result == null) {
result = new SimpleValueWrapper(invokeOperation(invoker));
}
// Collect any explicit #CachePuts
collectPutRequests(contexts.get(CachePutOperation.class), result.get(), cachePutRequests);
// Process any collected put requests, either from #CachePut or a #Cacheable miss
for (CachePutRequest cachePutRequest : cachePutRequests) {
cachePutRequest.apply(result.get());
}
// Process any late evictions
processCacheEvicts(contexts.get(CacheEvictOperation.class), false, result.get());
return result.get();
}
I think should prevent the cachePutRequest execute. if no cache be hit, to invoke the method body of #Cacheable and don't cached the result. use unless will prevent the method invoke. Is this correct?

#Tonney Bing
First of all, my apologies for misguiding you on my previous answer...
If condition evaluates to false, then the method will always be
executed and the result subsequently cached.
The last part is NOT true. In fact, the condition attribute does prevent the #Cacheable method result from being cached. But, neither the condition nor the unless attribute prevent the #Cacheable service method from being invoked.
Also, my code example above was not correct. The unless attribute needs to be set to true to prevent caching of the #Cacheable method result.
After re-reading this section in the Spring Reference Guide, I came to realize my mistake and wrote an example test class to verify Spring's "conditional" caching behavior.
So...
With respect to your business use case, the way I understand it based on your original question and subsequently, your response to my previous answer, you have a #Cacheable service method that needs to be suppressed of invocation in the read-only app regardless of whether the value is in the cache or not! In other words, the value should always be retrieved from the cache and the #Cacheable service method should NOT be invoked in read-only mode.
Now to avoid polluting your application code with Spring infrastructure component references, and specifically, with a Spring CacheManager, this is a good example of a "cross-cutting concern" (since multiple, mutating-based application service operations may exist) and therefore, can be handled appropriately using AOP.
I have coded such an example satisfying your requirements here.
This is a self-contained test class. The key characteristics of this test class include...
The use of external configuration (by way of the app.mode.read-only System property) to determine if the app is in read-only mode.
The use of AOP and a custom Aspect to control whether the subsequent invocation of the Joint Point (i.e. the #Cacheable service method) is allowed (no, in a read-only context). In addition, I appropriately set the order in which the Advice (namely, the #Cacheable based advice along with the handleReadOnlyMode advice in the UseCacheExclusivelyInReadOnlyModeAspect Aspect) should fire based on precedence.
Take note of the #Cacheable annotation on the service method...
#Cacheable(value = "Factorials", unless = "T(java.lang.System).getProperty('app.mode.read-only', 'false')")
public Long factorial(long number) { .. }
You can see the intended behavior with the System.err output statements in the test class.
Hope this helps!

Why can't I add Contract.Requires in an overridden method?

I'm using code contract (actually, learning using this).
I'm facing something weird to me... I override a method, defined in a 3rd party assembly. I want to add a Contract.Require statement like this:
public class MyClass: MyParentClass
{
protected override void DoIt(MyParameter param)
{
Contract.Requires<ArgumentNullException>(param != null);
this.ExecuteMyTask(param.Something);
}
protected void ExecuteMyTask(MyParameter param)
{
Contract.Requires<ArgumentNullException>(param != null);
/* body of the method */
}
}
However, I'm getting warnings like this:
Warning 1 CodeContracts:
Method 'MyClass.DoIt(MyParameter)' overrides 'MyParentClass.DoIt(MyParameter))', thus cannot add Requires.
[edit] changed the code a bit to show alternatives issues [/edit]
If I remove the Contract.Requires in the DoIt method, I get another warning, telling me I have to provide unproven param != null
I don't understand this warning. What is the cause, and can I solve it?

You can't add extra requirements which your callers may not know about. It violates Liskov's Subtitution Principle. The point of polymorphism is that a caller should be able to treat a reference which actually refers to an instance of your derived class as if it refers to an instance of the base class.
Consider:
MyParentClass foo = GetParentClassFromSomewhere();
DoIt(null);
If that's statically determined to be valid, it's wrong for your derived class to hold up its hands and say "No! You're not meant to call DoIt with a null argument!" The aim of static analysis of contracts is that you can determine validity of calls, logic etc at compile-time... so no extra restrictions can be added at execution time, which is what happens here due to polymorphism.
A derived class can add guarantees about what it will do - what it will ensure - but it can't make any more demands from its callers for overridden methods.

I'd like to note that you can do what Jon suggested (this answers adds upon his) but also have your contract without violating LSP.
You can do so by replacing the override keyword with new.
The base remains the base; all you did is introduce another functionality (as the keywords literally suggest).
It's not ideal for static-checking because the safety could be easily casted away (cast to base-class first, then call the method) but that's a must because otherwise it would violate LSP and you do not want to do that obviously. Better than nothing though, I'd say.
In an ideal world you could also override the method and call the new one, but C# wouldn't let you do so because the methods would have the same signatures (even tho it would make perfect sense; that's the trade-off).

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Implementation of Varnish's `vcl_hash` - varnish

Related

Passing Varnish struct variables across VCL reloads

Performing a distributed search through spark-solr

PHP void functions

using spring cache read only, how set spring cache redis read only

Why can't I add Contract.Requires in an overridden method?

Categories

Resources