Problem understanding dataset to start analytics - statistics

I have a data set (I have provided a link to a photo of the first 10 lines of the data). For my project I need to filter the “MS” column for “ES” (Spain) and I need to analyze the data and get some conclusions about the data and graphs, Visualizations ect.
But my problem is by looking at the data I just don’t know where to start because it’s just seems like there is nothing to do with the data that is there.
So I have come to all of you to see if you can give em some suggestions and to see if anyone can see any sort of starting point.
These are the details of my project if it helps
Objectives: Represent the protected spaces of Spain and explain the data
Evaluation criteria: Representation is comprehensive, “nice looking” (Visualizations), use of analytics (sql, python, excel ect). Adding to this a report with resources and the conclusion of the análisis with adición representation.
Notes: it’s importe that the cliente “ teachers” (also people from company’s in Spain) can understand the analysis with the report and Visualizations.
Thanks! I have 1863 lines of data (already filtered to ES (SPAIN))

Related

Grouping similar strings that have misspellings, spacing differences, etc

I have a data set of about 1 million employer names. These names are from a free-form text field so they include misspellings and variations in the way they are inputted (e.g. "Amazon" .. "Amzaon" .. "Amazon.com" .. "Amazon Web Services" .. "AWS").
I want to either A) group these 1 million so I have a somewhat accurate sense of how many unique employers are in the data set or B) be able to find all variations of any given employer.
So far, I've been using the data in Tableau, then filtering on "employer name" and searching all variations of the name I can think of. But it's tedious and I'm pretty sure I'm leaving many out.
I've also used the fuzzy add-in for excel but it hasn't worked that well on misspellings, special characters...
Tableau just isn't suited for doing this kind of analysis straight out of the box, and I would highly recommend doing some pre-processing on your data before putting trying to build a workbook around it.
Like another commenter said, you could look into using Tableau Prep Builder for a one-time transformation on your data set, but if you wanted to automate this process it costs extra to add functionality to whatever Tableau Server installation you have.
If you're familiar with Python or R (and the integration between Tableau Server and those services is supported by your organization), you could look into building a script to run the transformation real-time, but it probably won't be too efficient.
Try experimenting with Tableau Prep Builder - the companion tool that comes with your Tableau Creator license. It has a group feature that is designed for just these problems.
In Prep Builder, you’ll just need to connect to your data, add a cleaning step, and then add a group to your cleaning step.

Creating a DTM on Alteryx Designer

I am new to Alteryx and am trying to use it for analysing unstructured data. I have a column of description in text form and I intend to use the K-Means Clustering tool for topic modelling. For K-means to work on text, I will need to convert my text into a Document Term Matrix (DTM) so that they appear as continuous variables to the clustering tool. However, I am struggling to find a way I can convert my text to a DTM.
Does anyone know a way to do so? I am currently looking at the R tool but am not exactly sure how to start too. Hoping that all of you experts here can help me out!
I have looked through posts on text analysis and realized that most fell back on the Microsoft Azure ML Text Analysis Macro. However, I would like to avoid using the macro (to not be restricted to limited runs every month for scalability) and instead use tools that are available in Alteryx.
Thanks to everyone in advance!
with Alteryx being more of a pictoral drag-and-drop workflow, it's not trivial to explain here, however I've created the following workflow and included the actual workflow itself on the Alteryx forum here. The workflow utilizes term frequencies from Inauguration speeches but should apply to any collection of documents. It just splits the words based on various non-numeric characters and does a summary. This is what the workflow looks like:

I want to migrate an excel & word document report into a modern web app, but not sure where to start, can you help or point me in the right direction?

I have an excel audit form which has a lot of yes/no questions. These questions have three fields, compliant, impact, probability & category.
The compliant field is the yes / no and the impact & probability are set numbers from 1-5. Each item then gets a risk score assigned e.g. if impact is 1 and probability is 2 the risk score = 3
At the end it generates a risk score for each of the categories & a nice graph which shows the risk distribution between the categories.
Once the Excel form has been filled out I use the data to pre-fill out a word template report.
A lot of the data in the word document is standard for each report. However there are tables that get manually populated by copying the data from the excel file accross.
I was thinking of a web app where the Yes / No questions are asked and the answers are stored in a database. Once completed the report can be generated from that data and will save a lot of time having to manully create that word report.
I am not really sure where to start, I am not a programmer, I do have an IT degree and happy to spend time learning. The main requirements are for it to be easy to add & remove new questions & to be able to easily generate a nice client facing report.
Can you provide me with some guidance on what the best framework to use would be and if there are any good tutorials that I could follow?
Thanks
I would check out what you can do using a combination of a form builder (Google Forms, Wufoo, Jotform), Zapier and DocMerge. This would allow you to string together an application that suits your needs without doing any programming.
Here are some examples of what's possible with Zapier (no affiliation): https://zapier.com/zapbook/google-docs/wufoo/
There are many websites offering what you are asking for. eg surveymonkey

Text Mining - What is the best way to mine descriptive excel sheet data

I have university placement data pulled from databases in excel sheet. I need to text mine the job description offered by companies, which is a descriptive field for all the rows and then come up with the analysis of profiles in demand.
Here is a snapshot of the data
Could anyone help me to kick start this activity?
Thanks
Saurabh
I am not a data expert but I have some data mining experience. I would try following these steps for starters:
Excel is not a good for such an analysis. Find some tool dedicated to data mining e.g. RStudio. R has many useful out-of-the-box algorithms for data mining.
Cleanse the data e.g. all texts to lower case, remove stop words, remove punctuation, remove additional white spaces.
Tokenize the data e.g. 1 word tokens - "finance", "bachelor"
Decide on how you will assert if a certain profile is in demand or not? If by profile you mean that you need the information on the frequency of certain tokens appearing in the data more often then others e.g. "finance", "bachelor" etc. then simply create a frequency matrix. R allows you to create a visualisation of this - Word Clouds.
This is to start you off :). I am sure there is much more to be suggested in this matter.

Converting data into information:Where to start?

We (my company) runs a website which have lots of data recorded like user registration, visits, clicks, what the stuff they post etc etc but so far we don't have a tool to find out how to monitor entire thing or how to find patterns in it so that we can understand what kind of information we can get from it? So that Mgmt can take decisions based on it. In short, the people do at Amazon or Google based on data they retrieve, we want a similar thing.
Now, after the intro, I would like to know what technology could it be called;is it Data Mining,Machine Learning or what? Where should we start to convert meaningless data into useful Information?
I think what you need enters in the "realm" of: parsing data, creating graphs, showing statistics about some elements, etc.
There is no "easy" answer, I can only answer parts of your question.
There are no premade magical analytical tools, big companies have their own backend tools tunned to parse the large amounts of data and spit out data summaries that are then used to build graphs or for statistical analysis.
I think the domain you are searching for is statistical data analysis. But there are many parts that go together here.
Best advice I can give you is to set up specific goals for you analysis and then try to see what is the best solution, you question is too open.
ie. if you are interested in visits/clicks/website related statistics Google Analytics is a great tool, and very easy to use.

Resources