where can we find the saving options documentation for pyspark?

where can we find the saving options documentation for pyspark? - apache-spark

I may have missed this, but where in the documentation (in the latest spark version, 3.1.2 at the moment of posting) can we find the options that we can pass to the option method:
df.write.format('parquet').option(???).mode(saveMode='overwrite').saveAsTable('tmp')
For example, I have seen that maxRecordsPerFile is probably a valid option (https://mungingdata.com/apache-spark/partitionby/). Where can we find the whole list of options and their meaning? Do we still need to go through the source (Where is the reference for options for writing or reading per format?)?
Also, is it better to pass options through .option() or within .saveAsTable()?

Related

Calling WriteRegMultiStr in NSIS properly

With version 3.02 of NSIS came the addition of the WriteRegMultiStr function. When the function is called in my script the script throws an error:
Usage: WriteRegMultiStr /REGEDIT5 rootkey subkey entry_name hex_string_like_660000000000
root_key=(HKCR[32|64]|HKLM[32|64]|HKCU[32|64]|HKU|HKCC|HKDD|HKPD|SHCTX)
The call itself looks like this:
WriteRegMultiStr /REGEDIT5 HKLM "System\CurrentControlSet\Services\SomeService" "DependsOnService" "service1 service2"
Since there is no documentation on this specific function which was added later on, long after WriteRegStr and WriteRegDWORD were available, I have to wonder - how does one use it?
So far with respect to entering REG_MULTI_SZ values I only found the directive to use a registry-NSIS -plugin. Yet the function exists, so how can it be used?
Addendum:
Encoding the string to hex and passing it with ot without quotation marks yields no desirable result either.

I was actually able to find an answer after digging through the depths of the internet. Since I don't think this has been answered on StackOverflow I will leave a response here, in case anyone wants to use this function.
The structure of the command as described in the opening post is basically correct, but the value must be encoded precisely. My command looks like this:
WriteRegMultiStr /REGEDIT5 HKLM "System\CurrentControlSet\Services\SomeService" "DependsOnService" 54,00,63,00,70,00,69,00,70,00,00,00,41,00,66,00,64
For anyone intending to test this string, this is
Tcpip
Afd
encoded in hexadecimal regedit format. Precisely this is Regedit Version 5.0 format, as opposed to REGEDIT4 format. A conversion editor can be used to achieve this, I used OTConvertIt.
The script should then compile, assuming you run NSIS version 3.02 or higher.

As you found out, the value data must be in the exact same format as .reg files from Windows 2000+.
The reason this instruction works this way is because it is actually the same as WriteRegBin under the hood and very little code was added to support this new functionality.
In the future you might be able to drop the /REGEDIT5 switch and give it plain strings but support for that has not been added yet.
The Registry plug-in does allow you to write these strings in a sane manner.

where to find all the "options" available on spark.read api on Spark DataFrame [duplicate]

I use Spark 1.6.1.
We are trying to write an ORC file to HDFS using HiveContext and DataFrameWriter. While we can use
df.write().orc(<path>)
we would rather do something like
df.write().options(Map("format" -> "orc", "path" -> "/some_path")
This is so that we have the flexibility to change the format or root path depending on the application that uses this helper library. Where can we find a reference to the options that can be passed into the DataFrameWriter? I found nothing in the docs here
https://spark.apache.org/docs/1.6.0/api/java/org/apache/spark/sql/DataFrameWriter.html#options(java.util.Map)

Where can we find a reference to the options that can be passed into the DataFrameWriter?
The most definitive and authoritative answer are the sources:
CSVOptions
JDBCOptions
JSONOptions
ParquetOptions
TextOptions
OrcOptions
...
Some description you may find in the docs, but there is no single page (that could possibly be auto-generated from the sources to stay up-to-date the most).
The reason being that the options are separated from the format implementation on purpose to have the flexibility you want to offer per use case (as you duly noted):
This is so that we have the flexibility to change the format or root path depending on the application that uses this helper library.
Your question seems similar to How to know the file formats supported by Databricks? where I said:
Where can I get the list of options supported for each file format?
That's not possible as there is no API to follow (like in Spark MLlib) to define options. Every format does this on its own...unfortunately and your best bet is to read the documentation or (more authoritative) the source code.

Convention for passing arguments to non-Silicon subblocks/helpers

Sorry if the title is a bit confusing, but what are the options/conventions that Origen provides for setting up subblocks that aren't necessarily silicon models, or are just general helpers?
For example, I have a scan helper plugin that guides the user through creating a scan test program. I'd like to add a list of options/customizations to the top-level app. There are a few ways to do this:
I can add a list of attr_readers/methods. I think this looks a bit ugly though and adds a bunch of stuff to the toplevel that isn't used by anything else, and it blows up $dut.methods.
I could use parameters as defined here: http://origen-sdk.org/origen/guides/models/parameters/ and just call of them in the scan tester app. But looking at the guides I don't think that is the desired use case. It looks more like context switching, but maybe that was just the example use case.
I could add a scan_tester.setup method or something on the toplevel. This just seems unnecessary though since its basically doing the same thing as #2, but requires a 'setup' method to be called. Yeah, its only 1 line, but if you mess up or forget to add that line then you've got some debug to do avoided by #2 (I can print a warning for example if the scan parameters aren't provided to help warn of typos, etc.).
I can set it up as a subblock (currently how I've got it), but this doesn't really fit. Scan isn't a silicon model, so base address is useless, but required. It has no registers, etc.
Then there's other 'Ruby' things I could do (setup via on_create, use global variable etc.) but these all seem not as great as any of the options above for one reason or another (mainly, more setup required on my part than using any of the existing options).
Any one of these would work. But from a convention standpoint, which direction should my scan tester setup go? Is there another option I hadn't considered? I'd lean towards option #2 as it looks the cleanest.
Thanks

This is a really good question.
There are actually two other options:
Add application config parameters from the plugin: http://origen-sdk.org/origen/release_notes/#v0_7_24
Define a constant as used by the JTAG and other early plugins: http://origen-sdk.org/jtag/#How_To_Use
I think #2 is using parameters in a way that was not originally intended, maybe it could work though but I just can't picture it.
I don't really like #5 or #6 since they provide application-level and class-level configuration, which is sometimes what you want, but often these days I see the need more for (DUT) instance-level configuration.
So, my best answer here is that I don't know, but you are touching on a good point that we need to have an official API or at least a recommendation for this.
I think you should be open to the possibility of adding something new to Origen for this if you can think of something better.
As I'm writing this, I suppose #5 would also support instance-level configuration, albeit a bit long-winded:
def initialize(options = {})
Origen.app.config.scan_chain_length = 6
end

My comment wouldn't keep its format, so here it is but looks better:
#Ginty
What would you think of a 'component' API. For example, we could have:
# components.rb
component(:scan, TIPScan::ScanTester,
# options
wgl_dir: ..., # defaults to Origen.app.root/pattern/wgl
custom_sort: proc do {|wgl_name| ...},
)
# then we can do things like:
$dut.scan #=> TIPScan instance
$dut.component(:scan) #=> same as above
$dut.components #=> [TIPScan instance, ...]
$dut.has_component(:scan) #=> true etc.
Pretty much just a stripped down subblock class to handle these. I think our IAR/C compilers and even CATI could benefit from this and make the setup cleaner and more customizable.

Sphinx4 figuring out correct models

I am trying to use the Sphinx4 library for speech recognition, but I cannot seem to figure out the correct combination of acoustic model-dictionary-language model. I have tried out various combinations and I get a different error every time.
I am trying to follow the tutorial on http://cmusphinx.sourceforge.net/wiki/tutorialsphinx4. I do not have a config.xml as I would if I was using ConfigurationManager instead of Configuration, because there is no perceivable way of passing the location of the config file to the Configuration itself (ConfigMgr takes it as an argument to the constructor); and that might be my problem right there. I just do not know how to point to one, and since the tutorial says "It is possible to configure low-level components of the application through XML file although you should do that ONLY IF you understand what is going on.", I assume having a config.xml file is not compulsory.
Combining the latest dictionary (7b - obtained from Sourceforge) with the latest acoustic model (cmusphinx-en-us-5.2.tar.gz - from SF again) and the language model (cmusphinx-5.0-en-us.lm.gz - from SF again) results in NullPointerException in startRecognition. The issue is similar to the problem here: sphinx-4 NullPointerException at startRecognition, but the link given in the answer no longer works. I obtained 0.7a from SF (since that is the dict the link seems to point at), but I am getting even earlier in the execution Error loading word: ;;; when I use that one. I tried downloading latest models and dict from the Github repo, that results in java.lang.IndexOutOfBoundsException: Index: 16128, Size: 16128.
Any help is much appreciated!

You need to use latest code from github
http://github.com/cmusphinx/sphinx4
as described by tutorial
http://cmusphinx.sourceforge.net/wiki/tutorialsphinx4
Correct models (en-us) are already included, you should not replace anything. You should not configure any XML files, use samples as provided in the sources.

Getting echofunc.vim to work

I came across echofunc.vim today (from a link in SO). Since I'm rubbish at remembering the order of function parameters, it looked like a very useful tool for me.
But the documentation is a bit lean on installation! And I've not been able to find any supplementary resources on the internet.
I'm trying to get it running on a RHEL box. I've copied the script into ~/.vim/plugin/echofunc.vim however no prompt when I type in a function name followed by '('. I've tried adding
let g:EchoFuncLangsUsed = ["php","java","cpp"]
to my .vimrc - still no prompting.
I'm guessing it needs to read from a dictionary somewhere - although there is a file in /usr/share/vim/vim70/ftplugin/php.vim, this is the RH default and does not include an explicit function list.
I'm not too bothered about getting hints on the functions/methods I've defined - just trying to get hints for the built-in functions. I can see there is a dictionary file available here which appears to provide the resources required for echofunc.vim, I can't see how I set this up.
TIA,

It expects a tags file, the last line of the description describes exactly how to generate it:
ctags -R --fields=+lS .
It works here with PHP but not with JS. Your mileage may vary.
I didn't know about this plugin, thanks for the info.
You should try phpcomplete.vim, it shows a prototype of the current function in a scratchpad. It is PHP only, though.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string