Give reads priority over writes in Elasticsearch - search

I have an EC2 server running Elasticsearch 0.9 with a nginx server for read/write access. My index has about 750k small-medium documents. I have a pretty continuous stream of minimal writes (mainly updates) to the content. The speeds/consistency I receive with search is fine with me, but I have some sporadic timeout issues with multi-get (/_mget).
On some pages in my app, our server will request a multi-get of a dozen to a few thousand documents (this usually takes less than 1-2 seconds). The requests that fail, fail with a 30,000 millisecond timeout from the nginx server. I am assuming this happens because the index was temporarily locked for writing/optimizing purposes. Does anyone have any ideas on what I can do here?
A temporary solution would be to lower the timeout and return a user friendly message saying documents couldn't be retrieved (however they still would have to wait ~10 seconds to see an error message).
Some of my other thoughts were to give read priority over writes. Anytime someone is trying to read a part of the index, don't allow any writes/locks to that section. I don't think this would be scalable and it may not even be possible?
Finally, I was thinking I could have a read-only alias and a write-only alias. I can figure out how to set this up through the documentation, but I am not sure if it will actually work like I expect it to (and I'm not sure how I can reliably test it in a local environment). If I set up aliases like this, would the read-only alias still have moments where the index was locked due to information being written through the write-only alias?
I'm sure someone else has come across this before, what is the typical solution to make sure a user can always read data from the index with a higher priority over writes. I would consider increasing our server power, if required. Currently we have 2 m2x-large EC2 instances. One is the primary and the replica, each with 4 shards.
An example dump of cURL info from a failed request (with an error of Operation timed out after 30000 milliseconds with 0 bytes received):
{
"url":"127.0.0.1:9200\/_mget",
"content_type":null,
"http_code":100,
"header_size":25,
"request_size":221,
"filetime":-1,
"ssl_verify_result":0,
"redirect_count":0,
"total_time":30.391506,
"namelookup_time":7.5e-5,
"connect_time":0.0593,
"pretransfer_time":0.059303,
"size_upload":167002,
"size_download":0,
"speed_download":0,
"speed_upload":5495,
"download_content_length":-1,
"upload_content_length":167002,
"starttransfer_time":0.119166,
"redirect_time":0,
"certinfo":[
],
"primary_ip":"127.0.0.1",
"redirect_url":""
}

After more monitoring using the Paramedic plugin, I noticed that I would get timeouts when my CPU would hit ~80-98% (no obvious spikes in indexing/searching traffic). I finally stumbled across a helpful thread on the Elasticsearch forum. It seems this happens when the index is doing a refresh and large merges are occurring.
Merges can be throttled at a cluster or index level and I've updated them from the indicies.store.throttle.max_bytes_per_sec from the default 20mb to 5mb. This can be done during runtime with the cluster update settings API.
PUT /_cluster/settings HTTP/1.1
Host: 127.0.0.1:9200
{
"persistent" : {
"indices.store.throttle.max_bytes_per_sec" : "5mb"
}
}
So far Parmedic is showing a decrease in CPU usage. From an average of ~5-25% down to an average of ~1-5%. Hopefully this can help me avoid the 90%+ spikes I was having lock up my queries before, I'll report back by selecting this answer if I don't have any more problems.
As a side note, I guess I could have opted for more balanced EC2 instances (rather than memory-optimized). I think I'm happy with my current choice, but my next purchase will also take more CPU into account.

Related

"shmop_open(): unable to attach or create shared memory segment 'No error':"?

I get this every time I try to create an account to ask this on Stack Overflow:
Oops! Something Bad Happened!
We apologize for any inconvenience, but an unexpected error occurred while you were browsing our site.
It’s not you, it’s us. This is our fault.
That's the reason I post it here. I literally cannot ask it on Overflow, even after spending hours of my day (on and off) repeating my attempts and solving a million reCAPTCHA puzzles. Can you maybe fix this error soon?
With no meaningful/complete examples, and basically no documentation whatsoever, I've been trying to use the "shmop" part of PHP for many years. Now I must find a way to send data between two different CLI PHP scripts running on the same machine, without abusing the database for this. It must work without database support, which means I'm trying to use shmop, but it doesn't work at all:
$shmopid = shmop_open(1, 'w', 0644, 99999); // I have no idea what the "key" is supposed to be. It says: "System's id for the shared memory block. Can be passed as a decimal or hex.", so I've given it a 1 and also tried with 123. It gave an error when I set the size to 64, so I increased it to 99999. That's when the error changed to the one I now face above.
shmop_write($shmopid, 'meow 123', 0); // Write "meow 123" to the shared variable.
while (1)
{
$shared_string = shmop_read($shmopid, 0, 8); // Read the "meow 123", even though it's the same script right now (since this is an example and minimal test).
var_dump($shared_string);
sleep(1);
}
I get the error for the first line:
shmop_open(): unable to attach or create shared memory segment 'No error':
What does that mean? What am I doing wrong? Why is the manual so insanely cryptic for this? Why isn't this just a built-in "superarray" that can be accessed across the scripts?
About CLI:
It cannot work in standalone CLI processes, as an answer here says:
https://stackoverflow.com/a/34533749
The master process is the one to hold the shared memory block, so you will have to use php-fpm or mod_php or some other web/service-running version, and maybe even start/request/stop it all from a CLI php script.
About shmop usage itself:
Use "c" mode in shmop_open() for creating the segment before it can be used with "a" or "w".
I stumbled on this error in a different scenario where shared memory is completely optional to speed up some repeated operations. So I wanted to try reading first without knowing memory size and then allocate from actual data when needed. In my case I had to call it as #shmop_open() to hide this error output.
About shmop on Windows:
PHP 7 crashed Apache worker process causing its restart with status 3221225477 when trying to reallocate a segment with the same predefined (arbitrary number) key but different size, even after shmop_delete(). As a workaround for this, I took filemtime() of the source file which contains data to be stored in memory, and used that timestamp as the key in shmop_open(). It still was not flawless IIRC, and I don't know if it would cause memory leaks, but it was enough to test my code which would mainly run on Linux anyway.
Finally, as of PHP version 8.0.7, shmop seems to work fine with Apache 2.4.46 and mod_php in Windows 10.

Native Transport Requests All time blocked

Trying to understand more about Native-Transport-Requests!
As we know these are cql requests and if limit exceeds the result will be all time blocked NTR.
My question is how do i monitor these requests in real time and get some kind of report on it.
I see some settings like max_queued_native_transport_requests and native_transport_max_threads. How these settings will have effect over all time blocked.
Have a look at JIRA-11363.
Also check this discussion for more info.
The recommendation is to start with the default values and tune from there. The default values are:
max_queued_native_transport_requests=1024
native_transport_max_threads: 128
Monitor you nodes and if you see an increasing number of blocked Native-Transport-Requests, then you need to increase max_queued_native_transport_requests.
Also, I think it's worth checking these discussions: 1, 2

Hazelcast High Response Times

We have a Java 1.6 application that uses Hazelcast 3.7.4 version,
with a topology of two nodes. The application operates mainly with 3 maps.
In normal application working, response times when consulting the maps are
generally in values around some milliseconds tens.
I have observed that in some circumstances such as for example with network
cuts, the response time increases to huge values such as for example, 20 or 30 seconds!!
And this is impacting the application performance.
I would like to know if this kind of situation with network micro-cuts can lead
to increase searches response time in this manner. I do not know if some concrete configuration can be done to minimize this, and also which other elements can provoke so high times.
I provide some examples of some executed consults
Example 1:
String sqlPredicate = "acui='"+acui+"'";
Collection<Agent> agents =
(Collection<Agent>) data.getMapAgents().values(new SqlPredicate(sqlPredicate));
Example 2:
boolean exist = data.getMapAgents().containsKey(agent);
Thanks so much for your help.
Best Regards,
Jorge
The Map operations are all TCP Socket based and thus are subject to your Operating Systems TCP Driver implementation.
See TCP_NODELAY

How to determine Redis memory leak?

Our redis servers are, since yesterday, gradually (200MB/hour) using more memory, while the amount of keys (330K) and their data (132MB redis-rdb-tools) stay about the same.
Output of redis-cli info shows 6.89G used memory?!
redis_version:2.4.10
redis_git_sha1:00000000
redis_git_dirty:0
arch_bits:64
multiplexing_api:epoll
gcc_version:4.4.6
process_id:3437
uptime_in_seconds:296453
uptime_in_days:3
lru_clock:1905188
used_cpu_sys:8605.03
used_cpu_user:1480.46
used_cpu_sys_children:1035.93
used_cpu_user_children:3504.93
connected_clients:404
connected_slaves:0
client_longest_output_list:0
client_biggest_input_buf:0
blocked_clients:0
used_memory:7400076728
used_memory_human:6.89G
used_memory_rss:7186984960
used_memory_peak:7427443856
used_memory_peak_human:6.92G
mem_fragmentation_ratio:0.97
mem_allocator:jemalloc-2.2.5
loading:0
aof_enabled:0
changes_since_last_save:1672
bgsave_in_progress:0
last_save_time:1403172198
bgrewriteaof_in_progress:0
total_connections_received:3616
total_commands_processed:127741023
expired_keys:0
evicted_keys:0
keyspace_hits:18817574
keyspace_misses:8285349
pubsub_channels:0
pubsub_patterns:0
latest_fork_usec:1619791
vm_enabled:0
role:slave
master_host:***BLOCKED***
master_port:6379
master_link_status:up
master_last_io_seconds_ago:0
master_sync_in_progress:0
db0:keys=372995,expires=372995
db6:keys=68399,expires=68399
The problem started when we updated our (.net) client code from BookSleeve 1.1.0.4 to ServiceStack v3.9.71 to prepare for an upgrade to Redis 2.8. But a lot of other stuff was updated to And our session state store (also redis, but with harbour client) does not show the same symptoms.
Where is all that Redis memory going? How can I troubleshoot it's usage?
Edit: I just restarted this instance and memory returned to 350M and is now climbing again. The top 10 largest objects are still the same size, ranging from 100K to 25M for nr 1. The amount of keys has dropped to 270K (330K earlier).
Here are some sources of "hidden" memory consumption in Redis:
Marc already mentioned the buffers maintained by the master to feed the slave. If a slave is lagging behind its master (because it runs on a slower box for instance), then some memory will be consumed on the master.
when long running commands are detected, Redis logs them in the SLOWLOG area, which takes some memory. You may want to use the SLOWLOG LEN command to check the number of records you have here.
communication buffers can also take memory. As far as I remember, with old versions of Redis (and 2.4 is quite old - you should really upgrade), it was unbounded, meaning that if you transfer a big object at a point, the communication buffer associated to this client connection will grow and never shrink. If there are many clients dealing occasionally with large objects, it could be a possible explanation. If you use commands retrieving very large data from Redis (in one shot), it can be an explanation as well. For instance, a simple KEYS * command applied on a Redis server storing millions of keys will consume a significant amount of memory.
You mentioned that you have objects as big as 25 MB. You have 404 client connections, if each of them needs to access such objects at a point in time, it will consume 10 GB of memory.

oracle query - ORA-01652: unable to extend temp segment but only in some versions of sql*plus

This one has me rather confused. I've written a query which runs fine from my development client but fails on the production client with error "ORA-01652: unable to extend temp segment by....". In both instances, the database and user is the same. On my development machine (MS Windows) I've got SQL*PLUS (Release 9.0.1.4.0) and Toad 9.0 (both using version 9.0.4.0.1 of the oci.dll). Both run the code without errors.
However when I run the same file, against the same database, using the same username/password from a different machine, this time version 10.2.0.4.0 (from the 10.2.0.4-1 Oracle instant client) I get the error.
It does occur reproducibly.
Unfortunately I've only got limited access to the dictionary views on the database which is set up as read-only (can't even get an explain plan!).
I've tried working around the problem by tuning the query (I suspect that there is a large interim result set which is subsequently trimmed down) but have not managed to change the behaviour at either client.
It may be possible to deploy a different version of the client on the machine causing the problems - but currently that looks like downgrading to a previous version.
Any ideas?
TIA
Update
Based on Gary's answer below, I had a look at the glogin.sql scripts - the only difference was that 'SET SQLPLUSCOMPATIBILITY 8.1.7' was present on the working client but absent on failing client - but adding it in did not resolve the problem.
I also tried
alter session set workarea_size_policy=manual;
alter session set hash_area_size=1048576000;
and
alter session set sort_area_size=1048576000;
to no avail :(
Update 2
I managed to find the same behaviour, this time talking to an Oracle 8i backend. In this case the database was RW. That allowed me to confirm that the different clients were, as I suspected, generating different plans. But why????
Looking at the output of 'SHOW PARAMETERS' both clients reported exactly the same settings!
Years ago I worked on a DR database that was fully READONLY, and even the TEMP tablespace wasn't writable. Any query that tried to spill to temp would fail (even if the temp space to be used was pretty trivial).
If this is the same situation, I wouldn't be surprised if there was a login.sql (or glogin.sql or a logon trigger) that does an ALTER SESSION to set larger PGA memory value for the session, and/or changes the optimizer goal to FIRST_ROWS.
If you can, compare the results of the following from both clients:
select * from v$parameter
where ismodified != 'FALSE';
Also from each client for the problem SQL, try EXPLAIN PLAN FOR SELECT...
and SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY);
See if it is coming up with different query plans.
Not really an answer - but a bit more information....
Our local DBAs were able to confirm that the 16Gb (!) TEMP tablespace was indeed being used and had filled up, but only from the Linux clients (I was able to recreate the error making an oci8 call from PHP). In the case of the sqlplus client I was actually using exactly the same file to run the query on both clients (copied via scp without text conversion - so line endings were CRLF - i.e. byte for byte the same as was running on the Windows client).
So the only rational solution was that the 2 client stacks were resulting in different execution plans!
Running the query from both clients approx simultaeneously on a DBMS with very little load gave the same result - meaning that the two clients also generated different sqlids for the query.
(and also Oracle was ignoring my hints - I hate when it does that).
There is no way Oracle should be doing this - even if it were doing some internal munging of the query before presenting it to the DBMS (which would give rise to the different sqlids) the client stack used should be totally transparent regarding the choice of an execution plan - this should only ever change based on the content of the query and the state of the DBMS.
The problem was complicated by not being to see any explain plans - but for the query to use up so much temporary tablespace, it had to be doing a very ugly join (at least partially cartesian) before filtering the resultset. Adding hints to override this had no effect. So I resolved the problem by splitting the query into 2 cursors and doing a nested lookup using PL/SQL. A very ugly solution, but it solved my immediate problem. Fortunately I just need to generate a text file.
For the benefit of anyone finding themselves in a similar pickle:
BEGIN
DECLARE
CURSOR query_outer IS
SELECT some_primary_key, some_other_stuff
FROM atable
WHERE....
CURSOR query_details (p_some_pk) IS
SELECT COUNT(*), SUM(avalue)
FROM btable
WHERE fk=p_some_pk
AND....
FOR m IN query_outer
LOOP
FOR n IN query_details(m.some_primary_key)
LOOP
dbms_out.put_line(....);
END LOOP;
END LOOP;
END;
The more I use Oracle, the more I hate it!
A bit more information - I've run into the same problem connecting to a different database - this time an Oracle 8i. And I can get EXPLAIN plans.
Although in this case the query completed on both clients, it took 50% longer running from Linux sql*plus 10.2.0.4.0 vs WinXP sql*plus 8.0.6.0
As I suspected, the plans are different - but both are using the default settings on the client, both are using the same optimizer mode. It seems to think the plan generated from the Linux client has a lower cost than that from the XP client (although it does take longer to run). How does the optimizer prune the search path for potential plans?

Resources