Hello I would like to increase the observation Space of Frozen-Lake v0 in open AI Gym.
Is there a way to do this in openai gymenvironment, using spaces like Discrete, Box, MultiDiscrete or some others?
Related
I'm working on a fullscreen Windows desktop application that's moderately graphics-intensive, it uses OpenGL but only renders 2D content. Nothing fancy, mostly pushing pixels to the screen (up to 4K, single monitor) and uploading textures. We're using VSync to control the rendering framerate, ie. calling SwapBuffers() at the end of rendering to block until the next VBlank.
The main requirement we have is that the app runs at a solid 60FPS as it's used with a touchscreen, and interactions need to be as fluid as possible.
Because it's pretty basic, the app runs just fine on a 8th gen Intel i7 CPU with integrated Intel HD Graphics 630 GPU. Neither the CPU or GPU are anywhere near peak usage, and we can see that we're hitting a comfortable 60FPS through our in-app FPS meter. I also have it running with similar results on my Surface Book 2 with Intel i7 and integrated Intel UHD Graphics 620 GPU.
However, what I've recently started noticing is that the app sometimes starts dropping to 30FPS, then staying there either for long periods of time or sometimes even permanently. Through our FPS meter, I can tell that we're not actually spending any time rendering, it's just our SwapBuffers() call that blocks arbitrarily for 2 frames, capping us at 30FPS. The only way to get back to 60FPS is to alt-tab with another app and back to ours, or simply bringing the Windows menu up then going back to the app.
Because of the app going back to 60FPS afterwards, I'm positive that this is an intended behavior of the Intel driver, probably meant for gaming (gamers prefer a stable 30FPS rather than irregular/occasional dropped frames which make the game look choppy).
However in our case, dropping an occasional frame isn't a big deal, however being capped at 30FPS makes our UI and interactions far less pleasing to the eye, especially when it could easily render at a smooth 60FPS instead.
Is there any way to switch the driver behavior to prefer pushing 60FPS with occasional drops rather than capping at 30FPS?
OK so I was able to figure this out with a little bit of tweaking and reverse-engineering: The answer is that yes this is an intended but unfortunate default behavior of the Intel driver, and it can be fixed via the Intel HD Graphics Control Panel app if available, or directly in the registry otherwise (which is the only way to fix the issue on the Surface Book and other Surface devices, where the custom Intel driver doesn't expose the Intel HD Graphics Control Panel app anymore).
Starting with the simple solution: In the Intel HD Graphics Control Panel app, go to "3D", then "Application Settings". You'll first need to create an application profile, by selecting the file on disk for the process that creates the OpenGL window. Once that's done, the setting you want to adjust is "Vertical Sync". By default, "Use Application Default Settings" is selected. This is the setting that causes the capping at 30FPS. Select "Use Driver Settings" instead to disable that behavior and always target 60FPS:
This should've been pretty obvious, if it wasn't for Intel's horrible choice of terms and incomprehensible documentation. To me it looks like the choices for the settings are inverted: I would expect the capping to happen when I select "Use Driver Settings", which then implies the driver is free to adjust buffer swapping as it sees fit. Similarly, "Use Application Default Settings" implies that the app decides when to push frames, which is precisely the opposite of what the setting does. Even the little help bubbles in the app seem to contradict what these settings do...
ps: I'll post the registry-based solution in a separate answer to keep it short
Here is the registry-based answer, if your driver does not expose the Intel HD control panel (such as the driver used on the Surface Book and possibly other Surface laptops), or if you want to make that fix programmatically via regedit.exe or the Win32 API:
The application profiles created by the Intel HD control panel are saved in the registry under HKCU\Software\Intel\Display\igfxcui\3D using a key with the process file name (e.g. my_game.exe) and a REG_BINARY value with a 536-byte data blob divided like this:
Byte 0-3: Anisotropic Filtering (0 = use app default, 2-10 = multiplier setting)
Byte 4-7: Vertical Sync (0 = use app default, 1 = use driver setting)
Byte 8-11: MSAA (0 = use app default, 1 = force off)
Byte 12-15: CMAA (0 = use app default, 1 = override, 2 = enhance)
Byte 16-535: Application Display Name (wide-chars, for use in the control panel's application list)
Note: all values are stored in little-endian
In addition, you need to make sure that the Global value under the same key has its first byte set to 1, which is a sort of global toggle.(the control panel sets it to 1 when one or more entries are added to the applications list, then back to zero when the last entry is deleted from the list).
The format of value is also a REG_BINARY value with 8 bytes encoded like this:
Byte 0-3: Global toggle for application entries (0 = no entries, 1 = entries)
Byte 4-7: Application Optimal mode (0 = enabled, 1 = disabled)
For example:
I am trying to use a whole city network for a particular analysis which I know is very huge. I have also set it as sparse network.
library(maptools)
library(rgdal)
StreetsUTM=readShapeSpatial("cityIN_UTM")
#plot(StreetsUTM)
library(spatstat)
SS_StreetsUTM =as.psp(StreetsUTM)
SS_linnetUTM = as.linnet(SS_StreetsUTM, sparse=TRUE)
> SS_linnetUTM
Linear network with 321631 vertices and 341610 lines
Enclosing window: rectangle = [422130.9, 456359.7] x [4610458,
4652536] units
> SS_linnetUTM$sparse
[1] TRUE
I have the following problems:
It took 15-20 minutes to build psp object
It took almost 5 hours to build the linnet object
every time I want to analyse it for a point pattern or envelope, R crashes
I understand I should try to reduce the network size, but:
I was wondering if there is a smart way to overcome this problem. Would rescaling help?
How can I put it on more processing power?
I am also curios to know if spatstat can be used with parallel package
In the end, what are the limitations on network size for spatstat.
R crashes
R crashes when I use the instructions from Spatstat book:
KN <- linearK(spiders, correction="none") ; on my network (linnet) of course
envelope(spiders, linearK, correction="none", nsim=39); on my network
I do not think RAM is the problem, I have 16GB RAM and 2.5GhZ Dual core i5 processor on an SSD machine.
Could someone guide me please.
Please be more specific about the commands you used.
Did you build the linnet object from a psp object using as.linnet.psp (in which case the connectivity of the network must be guessed, and this can take a long time), or did you have information about the connectivity of the network that you passed to the linnet() command?
Exactly what commands to "analyse it for a point pattern or envelope" cause a crash, and what kind of crash?
The code for linear networks in spatstat is research code which is still under development. Faster algorithms for the K-function will be released soon.
I could only resolve this with simplifying my network in QGIS with Douglas-Peucker algorithm in Simplify Geometries tool. So it is a slight compromise on the geometry of the linear network in the shapefile.
I'm using the top command in several distros to feed a Bash script. Currently I'm calling it with top -b -n1.
I'd prefer a unified output in KiB or KB. However, it will display large units in megabytes or gigabytes. Is there an option to avoid these large units?
Please consider the following example:
4911 root 20 0 274m 248m 146m S 0 12.4 0:07.19 example
Edit: To answer 123's question, I transform the columns and send them to a log monitoring appliance. If there's no alternative, I'll convert the units via awk beforehand as per this thread.
Consider cutting out the middleman top and reading directly from /proc/[1-9]*/statm. All those files consist of one line of numbers, of which the first three correspond with top's VIRT RES SHR, respectively, in units of pages, normally 4096 B, so that by multiplying with 4 you get units of KiB.
You need a config file. You can create it yourself as $HOME/.toprc or using top interactively. The latter is easy. You just need to press W while top is running in interactive mode.
But first you need to set top interactively to the state you want. To change the memory scale press e until you see what you want. (Then save with W.)
Either way, you need this set in your config: Task_mscale=0 for the lowest scale.
There are 3 parts to my application:
A numerical simulator solving a 21 variable diff equation by runge-kutta method - direct from numerical recipes in C, step size is 0.0001 s
A C code pinging a PIC based micrprocessor every 1s and receiving data at about 3600 samples per second over the USB-COM port; It sends relevant data to the front end over TCP/IP
A JAVA front end reading the data from the numerical simulator via SWIG (for the C code) and JNI, modifying the parameters with input from the microprocessor and finally plotting it to the GUI.
I want to recode the JAVA front end in C++ now, with the option of using HTML/Javascript for plotting.
Would rewriting the front end in C++ so that the numerical simulator runs on a separate thread be a good approach?
I don't understand threading though I have used it for the listening and plotting functions in the JAVA code. It seems like having it all run on multiple threads instead of separate processes would slow down my simulations.
Can I combine 1 , 2 and 3 into a single program or should they remain separate to retain the 0.0001 ms simulation speed and the ability to handle the large amount to microprocessor data.
Please help me pick a path forward!
Thanks in Advance!
On a multicore platform, multithreading will generally improve performance. However, GPOS such as Linux and Windows are not deterministic, so there are no guarantees.
That said, the computational performance of a modern PC is such that it will hardly be stretched by this task and data rate,so it hardly matters perhaps?
I have some serial code that I have started to parallelize using Intel's TBB. My first aim was to parallelize almost all the for loops in the code (I have even parallelized for within for loop)and right now having done that I get some speedup.I am looking for more places/ideas/options to parallelize...I know this might sound a bit vague without having much reference to the problem but I am looking for generic ideas here which I can explore in my code.
Overview of algo( the following algo is run over all levels of the image starting with shortest and increasing width and height by 2 each time till you reach actual height and width).
For all image pairs starting with the smallest pair
For height = 2 to image_height - 2
Create a 5 by image_width ROI of both left and right images.
For width = 2 to image_width - 2
Create a 5 by 5 window of the left ROI centered around width and find best match in the right ROI using NCC
Create a 5 by 5 window of the right ROI centered around width and find best match in the left ROI using NCC
Disparity = current_width - best match
The edge pixels that did not receive a disparity gets the disparity of its neighbors
For height = 0 to image_height
For width = 0 to image_width
Check smoothness, uniqueness and order constraints*(parallelized separately)
For height = 0 to image_height
For width = 0 to image_width
For disparity that failed constraints, use the average disparity of
neighbors that passed the constraints
Normalize all disparity and output to screen
Just for some perspective, it may not always be worthwhile to parallelize something.
Just because you have a for loop where each iteration can be done independently of each other, doesn't always mean you should.
TBB has some overhead for starting those parallel_for loops, so unless you're looping a large number of times, you probably shouldn't parallelize it.
But, if each loop is extremely expensive (Like in CirrusFlyer's example) then feel free to parallelize it.
More specifically, look for times where the overhead of the parallel computation is small relative to the cost of having it parallelized.
Also, be careful about doing nested parallel_for loops, as this can get expensive. You may want to just stick with paralellizing the outer for loop.
The silly answer is anything that is time consuming or iterative. I use Microsoft's .NET v4.0 Task Parallel Library and one of the interesting things about their setup is its "expressed parallelism." An interesting term to describe "attempted parallelism." Though, your coding statements may say "use the TPL here" if the host platform doesn't have the necessary cores it will simply invoke the old fashion serial code in its place.
I have begun to use the TPL on all my projects. Any place there are loops especially (this requires that I design my classes and methods such that there are no dependencies between the loop iterations). But any place that might have been just good old fashion multithreaded code I look to see if it's something I can place on different cores now.
My favorite so far has been an application I have that downloads ~7,800 different URL's to analyze the contents of the pages, and if it finds information that it's looking for does some additional processing .... this used to take between 26 - 29 minutes to complete. My Dell T7500 workstation with dual quad core Xeon 3GHz processors, with 24GB of RAM, and Windows 7 Ultimate 64-bit edition now crunches the entire thing in about 5 minutes. A huge difference for me.
I also have a publish / subscribe communication engine that I have been refactoring to take advantage of TPL (especially on "push" data from the Server to Clients ... you may have 10,000 client computers who have stated their interest in specific things, that once that event occurs, I need to push data to all of them). I don't have this done yet but I'm REALLY LOOKING FORWARD to seeing the results on this one.
Food for thought ...