SharePoint Search Component has multiple process and use lots memory? - sharepoint

I'm using sharepoint2013 + windows2012. I noticed that the SP search component has 5 processes in taskmgr. each uses about 400-500 MB memory. Is this normal? I also tried
Set-SPEnterpriseSearchService -PerformanceLevel Reduced
But it did not change anything. Should I restart the server?
I never nooticed this on other SP server I worked before. Just curious, is it because of SP 2013, some default settings?
thanks

user3211586 ‘s link worked for me. Basically this article says:
Quick and Dirty
Kill the noderunner.exe (Microsoft Sharepoint Search Component) process via TaskManager
This will obviously break everything related to Search on the site
Production
Change the Search Service Performance Level with powerhsell
Get-SPEnterpriseSearchService | Set-SPEnterpriseSearchService –PerformanceLevel “PartlyReduced”
Performance Level Explained:
Reduced: Total number of threads = number of processors, Max Threads/host = number of processors
PartlyReduced: Total number of threads = 4 times the number of processors , Max Threads/host = 16 times the number of processors
Maximum: Total number of threads = 4 times the number of processors , Max Threads/host = 16 times the number of processors (threads are created at HIGH priority)
For the setting to take effect do an IISReset or restart the Search Service in Central Admin
I had the same issue as the OP and running Set-SPEnterpriseSearchService –PerformanceLevel “PartlyReduced” followed by IISRESET /noforce resolved the issue for me.

Please check below given article:
http://social.technet.microsoft.com/wiki/contents/articles/20413.sharepoint-2013-performance-issues-with-search-service-application.aspx

When I tried this method, and when I changed the config setting from 0 to any value between 1 and 500, it did reduce the memory usage but the search stopped working. After I reverted back the config settings to 0, the memory usage increased but search started working again.

Related

Not possible to increase number of threads in Julia

I changed the number of threads to 4 following the documentation. I know that the maximum number of threads is limited by Sys.CPU_THREADS which is 8 in my case. But I can't change it to 8. Why is this?
What I am doing:
set JULIA_NUM_THREADS=8 in cmd
There shows no error, but in Julia I still have Threads.nthreads() = 4
If you use Atom you can set the number of threads in the settings of the julia-client package (see screenshot). The default is set to number of cores which would probably be the four you are experiencing.

Native Transport Requests All time blocked

Trying to understand more about Native-Transport-Requests!
As we know these are cql requests and if limit exceeds the result will be all time blocked NTR.
My question is how do i monitor these requests in real time and get some kind of report on it.
I see some settings like max_queued_native_transport_requests and native_transport_max_threads. How these settings will have effect over all time blocked.
Have a look at JIRA-11363.
Also check this discussion for more info.
The recommendation is to start with the default values and tune from there. The default values are:
max_queued_native_transport_requests=1024
native_transport_max_threads: 128
Monitor you nodes and if you see an increasing number of blocked Native-Transport-Requests, then you need to increase max_queued_native_transport_requests.
Also, I think it's worth checking these discussions: 1, 2

PERF STAT does not count memory-loads but counts memory-stores

Linux Kernel : 4.10.0-20-generic (also tried this on 4.11.3)
Ubuntu : 17.04
I have been trying to collect stats of memory-accesses using perf stat. I am able to collect stats for memory-stores but the count for memory-loads return me a 0 value.
The below is the details for memory-stores :-
perf stat -e cpu/mem-stores/u ./libquantum_base.arnab 100
N = 100, 37 qubits required
Random seed: 33
Measured 3277 (0.200012), fractional approximation is 1/5.
Odd denominator, trying to expand by 2.
Possible period is 10.
100 = 4 * 25
Performance counter stats for './libquantum_base.arnab 100':
158,115,510 cpu/mem-stores/u
0.559922797 seconds time elapsed
For memory-loads, I get a 0 count as can be seen below :-
perf stat -e cpu/mem-loads/u ./libquantum_base.arnab 100
N = 100, 37 qubits required
Random seed: 33
Measured 3277 (0.200012), fractional approximation is 1/5.
Odd denominator, trying to expand by 2.
Possible period is 10.
100 = 4 * 25
Performance counter stats for './libquantum_base.arnab 100':
0 cpu/mem-loads/u
0.563806170 seconds time elapsed
I cannot understand why this does not count properly. Should I use a different event in any way to get proper data ?
The mem-loads event is mapped to the MEM_TRANS_RETIRED.LOAD_LATENCY_GT_3 performance monitoring unit event on Intel processors. The events MEM_TRANS_RETIRED.LOAD_LATENCY_* are special and can only be counted by using the p modifier. That is, you have to specify mem-loads:p to perf to use the event correctly.
MEM_TRANS_RETIRED.LOAD_LATENCY_* is a precise event and it only makes sense to be counted at the precise level. According to this Intel article (emphasis mine):
When a user elects to sample one of these events, special hardware is
used that can keep track of a data load from issue to completion.
This is more complicated than simply counting instances of an event
(as with normal event-based sampling), and so only some loads are
tracked. Loads are randomly chosen, the latency determined for each,
and the correct event(s) incremented (latency >4, >8, >16, etc). Due
to the nature of the sampling for this event, only a small percentage
of an application's data loads can be tracked at any one time.
As you can see, MEM_TRANS_RETIRED.LOAD_LATENCY_* by no means count the total number of loads and it is not designed for that purpose at all.
If you want to to determine which instructions in your code are issuing load requests that take more than a specific number of cycles to complete, then MEM_TRANS_RETIRED.LOAD_LATENCY_* is the right performance event to use. In fact, that is exactly the purpose of perf-mem and it achieves its purpose by using this event.
If you want to count the total number of load uops retired, then you should use L1-dcache-loads, which is mapped to the MEM_UOPS_RETIRED.ALL_LOADS performance event on Intel processors.
On the other hand, mem-stores and L1-dcache-stores are mapped to the exact same performance event on all current Intel processors, namely, MEM_UOPS_RETIRED.ALL_STORES, which does count all retired store uops.
So in summary, if you are using perf-stat, you should (almost) always use L1-dcache-loads and L1-dcache-stores to count retired loads and stores, respectively. These are mapped to the raw events you have used in the answer you posted, only more portable because they also work on AMD processors.
I have used a Broadwell(CPU e5-2620) server machine to collect all of the below events.
To collect memory-load events, I had to use a numeric event value. I basically ran the below command -
./perf record -e "r81d0:u" -c 1 -d -m 128 ../../.././libquantum_base 20
Here r81d0 represents the raw event for counting "memory loads amongst all instructions retired". "u" as can be understood represents user-space.
The below command, on the other hand,
./perf record -e "r82d0:u" -c 1 -d -m 128 ../../.././libquantum_base 20
has "r82d0:u" as a raw event representing "memory stores amongst all instructions retired in userspace".

Give reads priority over writes in Elasticsearch

I have an EC2 server running Elasticsearch 0.9 with a nginx server for read/write access. My index has about 750k small-medium documents. I have a pretty continuous stream of minimal writes (mainly updates) to the content. The speeds/consistency I receive with search is fine with me, but I have some sporadic timeout issues with multi-get (/_mget).
On some pages in my app, our server will request a multi-get of a dozen to a few thousand documents (this usually takes less than 1-2 seconds). The requests that fail, fail with a 30,000 millisecond timeout from the nginx server. I am assuming this happens because the index was temporarily locked for writing/optimizing purposes. Does anyone have any ideas on what I can do here?
A temporary solution would be to lower the timeout and return a user friendly message saying documents couldn't be retrieved (however they still would have to wait ~10 seconds to see an error message).
Some of my other thoughts were to give read priority over writes. Anytime someone is trying to read a part of the index, don't allow any writes/locks to that section. I don't think this would be scalable and it may not even be possible?
Finally, I was thinking I could have a read-only alias and a write-only alias. I can figure out how to set this up through the documentation, but I am not sure if it will actually work like I expect it to (and I'm not sure how I can reliably test it in a local environment). If I set up aliases like this, would the read-only alias still have moments where the index was locked due to information being written through the write-only alias?
I'm sure someone else has come across this before, what is the typical solution to make sure a user can always read data from the index with a higher priority over writes. I would consider increasing our server power, if required. Currently we have 2 m2x-large EC2 instances. One is the primary and the replica, each with 4 shards.
An example dump of cURL info from a failed request (with an error of Operation timed out after 30000 milliseconds with 0 bytes received):
{
"url":"127.0.0.1:9200\/_mget",
"content_type":null,
"http_code":100,
"header_size":25,
"request_size":221,
"filetime":-1,
"ssl_verify_result":0,
"redirect_count":0,
"total_time":30.391506,
"namelookup_time":7.5e-5,
"connect_time":0.0593,
"pretransfer_time":0.059303,
"size_upload":167002,
"size_download":0,
"speed_download":0,
"speed_upload":5495,
"download_content_length":-1,
"upload_content_length":167002,
"starttransfer_time":0.119166,
"redirect_time":0,
"certinfo":[
],
"primary_ip":"127.0.0.1",
"redirect_url":""
}
After more monitoring using the Paramedic plugin, I noticed that I would get timeouts when my CPU would hit ~80-98% (no obvious spikes in indexing/searching traffic). I finally stumbled across a helpful thread on the Elasticsearch forum. It seems this happens when the index is doing a refresh and large merges are occurring.
Merges can be throttled at a cluster or index level and I've updated them from the indicies.store.throttle.max_bytes_per_sec from the default 20mb to 5mb. This can be done during runtime with the cluster update settings API.
PUT /_cluster/settings HTTP/1.1
Host: 127.0.0.1:9200
{
"persistent" : {
"indices.store.throttle.max_bytes_per_sec" : "5mb"
}
}
So far Parmedic is showing a decrease in CPU usage. From an average of ~5-25% down to an average of ~1-5%. Hopefully this can help me avoid the 90%+ spikes I was having lock up my queries before, I'll report back by selecting this answer if I don't have any more problems.
As a side note, I guess I could have opted for more balanced EC2 instances (rather than memory-optimized). I think I'm happy with my current choice, but my next purchase will also take more CPU into account.

Ideal timeout period for dns lookup

In my rails app i do a nslookup using a ruby library resolv. If the site like dgdfgdfgdfg.com is entered its talking too long to resolve. in some instance like 20 sec.(mostly for non-existent sites) Because it cause the application to slowdown.
So i though of introducing a timeout period for the dns lookup.
What will be the ideal timeout period for the dns lookup so that resolution of actual site doesnt fail. will something like 10 sec will be fine?
There's no IETF mandated value, although §6.1.3.3 of RFC 1123 suggests a value not less than 5 seconds.
Perl's Net::DNS and the command line dig utility do default to 5 seconds between retries. Some versions of the Microsoft resolver appear to default to 3 seconds.
You can run some tests among the users to find out the right number compromising responsiveness / performance.
Also you can adjust that timeout dinamically depending on the network traffic.
For example, for every sucessful resolv, you save how much time it took you to resolv it. And every hour (for example) you can calculate an average and set double of its value as timeout (Remember that "average" is, roughly speaking, "the middle"). This way if your latency is high at some point, it autoadjust itself to increase the timeout period.

Resources