Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 2 years ago.
Improve this question
I have a dataset containing general consumer reviews of products purchased by users. The dataset also includes the item name, price, stars given by the consumer to the product. Please suggest me a way to approach this problem so as to make clusters of similar users using the given information. As of now I'm extracting keywords from the reviews column. I have shared the dataset preview.
in my opinion you should try to use some text clustering methods. Probably, the most informative value in your dataset is a review part. So at first you could try to change representation of your input data (using e.q tokenization, word embeddings) and then use some clustering methods (DBSCAN, Kmeans, tsne) to show if there exist some distinction between these grousp.
Good starter should be:
https://www.kaggle.com/karthik3890/text-clustering
Related
Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 1 year ago.
Improve this question
I read several posts(here and here) online about LDA topic modeling. All of them only use uni-grams. I would like to know why bi-grams and tri-grams are not used for LDA topic modeling?
It's a matter of scale. If you have 1000 types (ie "dictionary words"), you might end up (in the worst case, which is not going to happen) with 1,000,000 bigrams, and 1,000,000,000 trigrams. These numbers are hard to manage, especially as you will have a lot more types in a realistic text.
The gains in accuracy/performance don't outweigh the computational cost here.
Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 2 years ago.
Improve this question
Which one among the following NLP topics will be easier to work with?
Question answering
Paraphrase detection
Short text conversation
Author identification
The final one, Author identification. You don't need to have any understanding of the language you are dealing with, which the first three presuppose.
There is already a lot of literature on the topic; generally you identify features in texts, and map these onto a set of authors' known features. This can easily be done with cluster analysis or Machine Learning. So, it's not actually as NLP-heavy as the others.
Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 9 years ago.
Improve this question
What would be a good or recommended way to model SVG DOM tree in Google's Realtime API? Specifically, stringify the SVG DOM tree and choose a collaborative string model or is there a better way? Thanks.
It depends on what you want to do with it. If all you want to do is to display something, without it being editable, then I would just store it is a blob. E.g., maybe just a static string.
If you want to be able to edit it, a collaborative string is problematic, as its hard to guarantee that the results of merging different collaborator's actions will result in well-formed XML.
Instead you could use custom objects to model the various nodes in the tree. You could do this either with a generic dom-like model where nodes have arbitrary attributes, or with specific classes different element types. I think the last would be the most powerful way to deal with it, and the nicest to work with, but also the most work to setup.
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 8 years ago.
Improve this question
What are applications where search techniques or more specifically planning techniques are used? I am most interested in examples in use.
I know that A* is used for path planning in Robotics, that planning is used in logistics (details would be great) but what other usages are there?
For Search in general Google, etc come to mind with their inverted indices. Again, where else is it used?
For planning examples, including logistics challenges, take a look at this list. Each use case comes with multiple datasets and a problem definition.
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 6 years ago.
Improve this question
i am sorry for asking such a straight forward and simple question but there is a lot of confusion regarding the use of partition in activity diagrams like :
is it really necessary to create partitions?
since each organisation has a number of working units/sub-units,and eventually they will have roles to play,would we
need to create partitions in literally every activity diagram we draw
for any process flow?
e.g- suppose we have draw an activity diagram for online shopping .customers browse and search for items and later buy it .now there are very fewer roles here so we can clearly draw the diagram without partitions but still we can create them ,in both case they will represent the system so does it really make sense to create them.
The answer to both questions is No.
Partitioning is an optional feature for an activity diagram.
UML is most of all a means of communication. When partitioning adds useful information to the ddiagram, then you should include it. When it doesn't add anything to the message presented by the diagram, then you don't add it.