first some specs about my machine.
CPU: Ryzen 5 1600
RAM: Corsair Vengeance LPX 16 GB (2x8GB) DDR4 3200MHz C16 (and 8gb corsair with 3000mhz I think)
SSD: Force MP500 (Windows on it), SanDisk SSD PLUS 480gb (only games etc.)
HDD: Western Digital 1 GB
Graphics Card: Nvidia GeForce GTX 1070 Ti
Motherboard: GA-AB350-Gaming 3
Power supply: Thermaltake Hamburg 530W
I sometimes play Garry's Mod where my game crashes at a specific add-on, this was never a problem before. If I code a WPF Program in Visual Studio 2022, the Player Movement lags hard, and it's not processing it right. If I've opened a stream (netflix or amazon, something like that) in fullscreen on my second screen AND I'm tabbing out of my Fullscreen Game, the whole PC will crash and always ends with a weird screen at the end > https://imgur.com/a/9WU8251 (those are 2 crashes, same screen).
I think the problem is produced from the same Component and I think, it's the CPU. What are your thoughts?
Related
What is the PCI-E slot utilisation? During gaming, desktop work, 4k video watching, video converting or AI training? E.g. on my Dell Precision 7520 laptop with PCI-E gen3 x16 GPU is max. 10%. That means on PCI-E gen4 it would be 5%?
Thinking to use e.g. RTX 3060 or Arc770 for AI computing, connected to laptop thru miniPcie->pciex16 riser.
As far as I know from benchmarks, the gaming would be up to 20% fps slower, but AI IMHO doesn't load slot so much.
I am looking to upgrade an older machine we have at our lab to use for deep learning (PyTorch) in addition to my personal work station. Its an older Dell work station but the relevant specs are as follows:
PSU: 950W
RAM: 64 GB DDR4 ECC
CPU: Xeon Bronze 3104 #1.7 GHz
It even has an older NVIDIA GPU I can use for display output when the A4000 is fully loaded like I currently do on my personal setup.
Through the university we can acquire a RTX A4000 (I know not best price to performance), which is basically a 3070ti with more VRAM. I am concerned that the low clock speeds may cause a bottle neck. Does anyone have experience with a similar configuration?
Thank you for the help!
I have been facing an intriguing problem lastly.
I am working on a project with a pretty heavy front in Angular JS with a hundred of Jest tests. I have 16 Go of ram but the project is so heavy that sometimes it fills up completely the ram and often the computer cannot handle the project running plus a yarn test at the same time (which takes up to 3 to 4 Go of ram) or a cypress workflow test without big latency problems.
To avoid big freezes (up to several minutes) and crashes, I increased the swap to 16 Go.
That being said, for various reasons I had to work on the project on Windows 10 and faced none of these problems.
Everything runs smoothly, the graphical interface doesn't lag even with screen sharing even-though the ram is also completely filled up and the CPU at 100%.
I am even able to run 20 yarn test at the same time without much lag which seems completely impossible on Linux even with the increased swap.
I've seen that windows use ram compression by default and not linux but I only had up to 549 Mo of compressed ram during my comparisons.
I firstly though that it could be a problem with gnome which is known to be heavy and sometimes buggy but I also tested it with KDE and have the same results.
I also heard that windows allocate special resources to the graphical environment where linux may treat it like any other process but that alone cannot explain all the problems because the whole computer freezes on linux and not in windows.
So I'm starting to wonder if there is something about the memory or process management that windows do significantly better than linux.
My config :
Computer model : Dell XPS-15-7590
Processor : Intel core i7 9750H, 2,6 GHz, 4,5 GHz turbo max (6 cores, 12 threads)
RAM : 16 Go
Graphic card : GTX 1650M
Screen : 4K 16:9
SSD : NVME 512 Go
I was facing the same issue on Ubuntu 22.04 with 16GB RAM and Intel i5-12400 Processor
My solution was to limit the number of max workers on jest config
"maxWorkers": 4
I have developed a macOS app which is heavily relying on multithreading (a call center simulator). It runs fine on my iMac 2019 and fills up all cores nicely. In my test scenario it simulates app. 1.4 mio. telephone calls in total in 100 iterations, each iteration as a dispatch item on a parallel dispatch queue.
Now I have bought a new Mac mini with M1 Apple Silicon and I was eager to see how the performance develops on that test machine. Well, it’s not bad but not as good as I expected:
System
Duration
iMac 2019, Intel 6-core i5, 3.0 GHz, Catalina macOS 10.15.7
19.95 s
Mac mini, M1 8-core, Big Sur macOS 11.2, Rosetta2
26.85 s
Mac mini, M1 8-core, Big Sur macOS 11.2, native ARM
17.07 s
Investigating a little bit further I noticed that at the start of the simulation all 8 cores of the M1 Mac are filled up properly but after a few seconds only the 4 high efficiency cores are used any more.
I have read the Apple docs „Optimize for Apple Silicon with performance and efficiency cores“ and double checked that the dispatch queue for the iterations is set up properly:
let simQueue = DispatchQueue.global(qos: .userInitiated)
But no success. After a few seconds of running the high performance cores are obviously not utilized any more. I even tried to set up the queue with qos set to .userInteracive up that didn’t help either. I also flagged the dispatch items with proper qos but that didn’t change anything. It looks to me that other apps (e.g. XCode) do utilize the high performance cores even for a longer time.
Does anybody know how to force a M1 Mac to utilize the high performance cores?
"M1 8 core" is really "M1 4 performance + 4 power saving cores". I expect it to have be a bit more performance than an Intel 6 core, but not much. Exactly has you see, 15% faster than six Intel cores or about as fast as 7 Intel cores would be. The current M1 chips are low end processors. "A bit better than Intel six cores" is quite good.
Your code must be running on the performance cores, otherwise there would be no chance at all to come close to the Intel performance. In that graph, nothing tells you which cores are used.
What happens most likely is that all cores start running, each trying to do one eighth of the work, and after about 8 seconds the performance cores have their work done. Then the power saving cores move their work to the performance cores. And you are just misinterpreting the image as only low performance cores doing the work.
I would guess that Apple has put a preference on using efficiency cores over performance for many reasons. Battery life being one, and most likely thermal reasons as well. This is the big question mark with a SoC that originally was designed for smartphones and tablets. MacOS is a much heavier OS then IOS or iPad OS. Apple most likely felt that performance cores were only needed in the cases where maximum throughput was needed. No doubt, I think some including myself with a M1 Mac Mini would like a way to adjust this balance between efficiency and performance cores. Personally overall, I would prefer all cores be capable of switching between efficiency and performance such as in Intel's Speed shift technology. This may come along with the M1's advancements in terms of Mac Pro models and other Pro models.
I have an nvidia GTX 750 Ti card, which is advertised as having 640 CUDA cores. Indeed, the nvidia settings application also reports this.
I'm trying to use this card to do OpenCL development on Linux. Now, I have reported from the OpenCL environment (through PyOpenCL if it makes a difference) that the number of compute units is 5. My understanding is that one compute unit on an nvidia device maps to one multiprocessor, which I understand to be 32 SIMD units (which I assume is the a CUDA core).
Clearly, 5 * 32 is not 640 (rather a quarter of what is expected).
Am I missing something as regards the meaning of a a work unit on nvidia? The card is also driving the graphics output which will be using some of the computational capability - is a proportion of the processing capability reserved for graphics use? (if so, can I change this?).
NVIDIA have a whitepaper for the NVIDIA GeForce GTX 750 Ti, which is worth a read.
An OpenCL compute unit translates to a streaming multiprocessor in NVIDIA GPU terms. Each Maxwell SMM in your GPU contains 128 processing elements ("CUDA cores") - and 128*5 = 640. The SIMD width of the device is still 32, but each compute unit (SMM) can issue instructions to four different warps at once.