I just started looking at foundation db and I have some trouble understanding how the layers work.
Are foundationdb layers interoperable?
If I add data using sql, can i then query that data using the graph layer?
How does that conversion/mapping work?
Regards Oskar
Short answer regarding the SQL layer: not yet.
Longer answer:
The FoundationDB storage engine maintains a mapping from bytes to bytes, with no additional encoding or structure imposed on top of that. This being the case, interoperability between layers is certainly possible, and in some cases may be a design goal.
A common set of encodings used by many layers is provided by the Tuple Layer (https://foundationdb.com/documentation/data-modeling.html#tuples), so higher-level layers using the Tuple Layer will, for instance, pack identical primitive values to identical strings of bytes. For true interoperability between two layers, however, each layer will have to understand the logic by which the other represents its higher-level data structures in terms of Tuples.
As for the SQL layer, interoperability with other data model layers released by FoundationDB is definitely a medium-term goal. But you can't automatically in the current Alpha version.
Related
I want to train a Doc2Vec model with a generic corpus and, then, continue training with a domain-specific corpus (I have read that is a common strategy and I want to test results).
I have all the documents, so I can build and tag the vocab at the beginning.
As I understand, I should train initially all the epochs with the generic docs, and then repeat the epochs with the ad hoc docs. But, this way, I cannot place all the docs in a corpus iterator and call train() once (as it is recommended everywhere).
So, after building the global vocab, I have created two iterators, the first one for the generic docs and the second one for the ad hoc docs, and called train() twice.
Is it the best way or it is a more appropriate way?
If the best, how I should manage alpha and min_alpha? Is it a good decision not to mention them in the train() calls and let the train() manage them?
Best
Alberto
This is probably not a wise strategy, because:
the Python Gensim Doc2Vec class hasn't ever properly supported expanding its known vocabulary after a 1st single build_vocab() call. (Up through at least 3.8.3, such attempts typically cause a Segmentation Fault process crash.) Thus if there are words that are only in your domain-corpus, an initial typical initialization/training on the generic-corpus would leave them out of the model entirely. (You could work around this, with some atypical extra steps, but the other concerns below would remain.)
if there is truly an important contrast between the words/word-senses used in your generic and the different words/word-senses used in your domain corpus, influence of the words from the generic corpus may not be beneficial, diluting domain-relevant meanings
further, any followup training that just uses a subset of all documents (the domain corpus) will only be updating the vectors for that subset of words/word-senses, and the model's internal weights used for further unseen-document inference, in directions that make sense for the domain-corpus alone. Such later-trained vectors may be nudged arbitrarily far out of comparable alignment with other words not appearing in the domain-corpus, and earlier-trained vectors will find themselves no longer tuned in relation to the model's later-updated internal-weights. (Exactly how far will depend on the learning-rate alpha & epochs choices in the followup training, and how well that followup training optimizes model loss.)
If your domain dataset is sufficient, or can be grown with more domain data, it may not be necessary to mix in other training steps/data. But if you think you must try that, the best-grounded approach would be to shuffle all training data together, and train in one session where all words are known from the beginning, and all training examples are presented in balanced, interleaved fashion. (Or possibly, where some training texts considered extra-important are oversampled, but still mixed in with the variety of all available documents, in all epochs.)
If you see an authoritative source suggesting such a "train with one dataset, then another disjoint dataset" approach with the Doc2Vec algorithms, you should press them for more details on what they did to make that work: exact code steps, and the evaluations which showed an improvement. (It's not impossible that there's some way to manage all the issues! But I've seen many vague impressions that this separate-pretraining is straightforward or beneficial, and zero actual working writeups with code and evaluation metrics showing that it's working.)
Update with respect to the additional clarifications you provided at https://stackoverflow.com/a/64865886/130288:
Even with that context, my recommendation remains: don't do this segmenting of training into two batches. It's almost certain to degrade the model compared to a combined training.
I would be interested to see links to the "references in the literature" you allude to. They may be confused or talking about algorithms other than the Doc2Vec ("Paragraph Vectors") algorithm.
If there is any reason to give your domain docs more weight, a better-grounded way would be to oversample them in the combined corpus.
Bu by all means, test all these variants & publish the relative results. If you're exploring shaky hypotheses, I would ignore any advice from StackOverflow-like sources & just run all the variants that your reading of the literature suggest, to see which, if any actually help.
You're right to recognized that the choice of alpha parameters is a murky area that could majorly influence what impact such add-on training has. There's no right answer, so you'll have to search-for and reason-out what might make sense. The inherent issues I've mentioned with such subset-followup-training could make it so that even if you find benefits in some combos, they may be more a product of a lucky combination of data & arbitrary parameters than a generalizable practice.
And: your specific question "if it is better to set such values or not provide them at all" reduces to: "do you want to use the default values, or values set when the model was created, or not?"
Which values might be workable, if at all, for this unproven technique is something that'd need to be experimentally discovered. That is, if you wanted to have comparable (or publishable) results here, I think you'd have to justify from your own novel work some specific strategy for choosing good alpha/epochs and other parameters, rather than adopt any practice merely recommended in a StackOverflow answer.
Can anyone please explain why 1D Convolutional Neural Network sometimes perform well on tabular data (better than DNN)? I have seen this in some published papers (although the reason for using CNN1D is not provided), Kaggle competitions and also have seen questions in stack overflow about the input shape of CNN 1d in tabular data. (eg: Preparing feeding data to 1D CNN). While I know we use CNN 1D for sequence data like in time series and NLP, what is the intuitive idea behind using CNN 1D for tabular data? Why does it work? Is it due to a spatial correlation between features?
A large problem with tabular data is that it is not structured. That is, there is often no relationship within the ordering of the columns. I believe that when applying a 1DCNN to tabular data you first have a linear layer that then feeds into the 1DCNN layer. This enables the model to self-order the columns, creating a more structured dataset. Then patterns within the data can be found through the 1DCNN and remaining Dense layers.
I am not sure this is really an answer, but also the question is actually not a question... maybe I can at least help to explain.
This is not a general feature of CNN and/or DNN. It is very specific to the structure of the input data.
CNN are suited for data that contain structures/patterns with additional translations/symmetries. "Convolution" means to map very many different sub-parts of the input data ("windows" of any dimension) onto the same "kernel" network. Thus the network can learn universally, independent of location of the "window".
I think it is even misleading to distinguish CNN and DNN at all. DNN are multi-layer complex networks. CNNs are typical substructures/layers of DNNs.
In tensorflow 1.4, I found two functions that do batch normalization and they look same:
tf.layers.batch_normalization (link)
tf.contrib.layers.batch_norm (link)
Which function should I use? Which one is more stable?
Just to add to the list, there're several more ways to do batch-norm in tensorflow:
tf.nn.batch_normalization is a low-level op. The caller is responsible to handle mean and variance tensors themselves.
tf.nn.fused_batch_norm is another low-level op, similar to the previous one. The difference is that it's optimized for 4D input tensors, which is the usual case in convolutional neural networks. tf.nn.batch_normalization accepts tensors of any rank greater than 1.
tf.layers.batch_normalization is a high-level wrapper over the previous ops. The biggest difference is that it takes care of creating and managing the running mean and variance tensors, and calls a fast fused op when possible. Usually, this should be the default choice for you.
tf.contrib.layers.batch_norm is the early implementation of batch norm, before it's graduated to the core API (i.e., tf.layers). The use of it is not recommended because it may be dropped in the future releases.
tf.nn.batch_norm_with_global_normalization is another deprecated op. Currently, delegates the call to tf.nn.batch_normalization, but likely to be dropped in the future.
Finally, there's also Keras layer keras.layers.BatchNormalization, which in case of tensorflow backend invokes tf.nn.batch_normalization.
As show in doc, tf.contrib is a contribution module containing volatile or experimental code. When function is complete, it will be removed from this module. Now there are two, in order to be compatible with the historical version.
So, the former tf.layers.batch_normalization is recommended.
In the Spark official documentation,
VectorSlicer is a transformer that takes a feature vector and outputs a new feature vector with a sub-array of the original features. It is useful for extracting features from a vector column.
Does this select the important features from the set of features?
If that is the case how is it done without the mention of a dependent variable?
I am trying to perform data clustering and I need the important features which will contribute to the clusters better. Can I use VectorSlicer for this?
Does this select the important features from the set of features?
It doesn't. It literally slices the vector to select only specified indices.
and need the important features which will contribute to the clusters better.
If you have categorical data consider using ChiSqSelector.
Otherwise you can use dimensionality reduction like PCA. It won't be the same as feature selection but should provide similar benefits (keep only the most important signals, discard the rest).
I'm looking for a neural network model with specific characteristics. This model may not exist...
I need a network which doesn't use "layers" as traditional artificial neural networks do. Instead, I want [what I believe to be] a more biological model.
This model will house a large cluster of interconnected neurons, like the image below. A few neurons (at bottom of diagram) will receive input signals, and a cascade effect will cause successive, connected neurons to possibly fire depending on signal strength and connection weight. This is nothing new, but, there are no explicit layers...just more and more distant, indirect connections.
As you can see, I also have the network divided into sections (circles). Each circle represents a semantic domain (a linguistics concept) which is the core information surrounding a concept; essentially a semantic domain is a concept.
Connections between nodes within a section have higher weights than connections between nodes of different sections. So the nodes for "car" are more connected to one another than nodes connecting "English" to "car". Thus, when a neuron in a single section fires (is activated), it is likely that the entire (or most of) the section will also be activated.
All in all, I need output patterns to be used as input for further output, and so on. A cascade effect is what I am after.
I hope this makes sense. Please ask for clarification where needed.
Are there any suitable models in existence that model what I've described, already?
Your neural network resembles a neural network which is created using Evolutionary Algorithms for example genetic algorithm.
See following articles for details.
Han - Evolutionary neural networks for anomaly detection based on the behavior of a program
WHITLEY - Genetic Algorithms and Neural Networks
For a summary in this type of neural network. Neurons and their connections are created using evolutionary techniques. Therefore they do not have strict layer approach. Hans uses following technique:
"Genetic Operations:
The crossover operator produces a new descendant by exchanging partial sections between two neural networks. It selects two distinct neural networks randomly and chooses one hidden node as the pivot point.Then, they exchange the connection links and the corresponding weight based on the selected pivot point.
The mutation operator changes a connection link and the corresponding weight of a randomly selected neural network. It performs one of two operations: addition of a new connection or deletion of an existing connection.
The mutation operator selects two nodes of a neural network randomly.
If there is no connection between them, it connects two nodes with random weights.
Otherwise, it removes the connection link and weight information.
"
Following figure from Whitley's article.
#ARTICLE{Han2005Evolutionary,
author = {Sang-Jun Han and Sung-Bae Cho},
title = {Evolutionary neural networks for anomaly detection based on the behavior
of a program},
journal = {Systems, Man, and Cybernetics, Part B: Cybernetics, IEEE Transactions
on},
year = {2005},
volume = {36},
pages = {559 -570},
number = {3},
month = {june },
}
#article{whitley1995genetic,
title={Genetic algorithms and neural networks},
author={Whitley, D.},
journal={Genetic algorithms in engineering and computer science},
pages={203--216},
year={1995},
publisher={Citeseer}
}
All in all, I need output patterns to be used as input for further output, and so on. A cascade effect is what I am after.
That sounds like a feed-forward net with multiple hidden layers. Don't be scared of the word "layer" here, with multiple ones it would be just like you have drawn there.. something like a 5-5-7-6-7-6-6-5-6-5 -structured net (5 inputs, 8 hidden layers with varying amount of nodes in each and 5 outputs).
You can connect the nodes to each other any way you like from layer to another. You can leave some unconnected by simple using constant zero as a weight between them, or if object oriented programming is used, simply leave unwanted connections out of connection phase. Skipping the layers might be harder with a standard NN-model, but one way could be using a dummy node for each layer a weight needs to cross. Just copying the original output*weight -value from node to dummy would be same as skipping a layer and this would also keep the standard NN-model intact.
If you want the net just to output some 1's and 0's, a simple step-function can be used as an activation function in each node: 1 for values more than 0.5, 0 otherwise.
I'm not sure if this is want you want, but this way you should be able to build a net you described. However, I have no idea how are you planning to teach your net to produce some semantic domains. Why not just let the net learn its own weights? This can be achieved with simple input-output -examples and a backpropagation -algorithm. If you use standard model to do build your net, also the mathematics of the learning wouldn't be any different from any other feed-forward net. Last but not least, you can probably find a library that is suitable for this task with only minor or with no change at all to the code.
The answers involving genetic algorithms sound fine (especially the one citing Darrell Whitley's work).
Another alternative would be to simply randomly connect nodes? This is done, more or less, with recurrent neural networks.
You could also take a look at LeCun's highly successful convolutional neural networks for an example of an ANN with a lot of layers that is somewhat like what you've described here that was designed for a specific purpose.
your network also mimics this
http://nn.cs.utexas.edu/?fullmer:evolving
but doesn't really allow the network to learn, but be replaced.
which may be covered here
http://www.alanturing.net/turing_archive/pages/reference%20articles/connectionism/Turing%27s%20neural%20networks.html