I have a dataset that has a list of states its cotton and corn output. I wish to create a graph of the U.S to visibility show the geographical distribution of the output. I feel I should use Graph -> Map Chart->Data (once I select the 2D map), there I am stuck.
Any help will be greatly appreciated .
Let's suppose you want to create a map based on MAPS.US dataset.
Your dataset with cotton and corn output should contain a common variable with MAPS.US that indicates state - statecode (or state).
Then, when having your dataset opened, you go to Graph -> Map Chart -> Data.
Next, edit "Map Data Source" and select MAPS.US dataset. Then for statecode variable choose the "Id" role and for corn or cotton variable the "Response" role. Note that only 1 variable can be chosen as Response.
Click "Run" and see the resulting map.
Related
Currently, I'm trying to create a combination chart (graph) in Spotfire where a specific record could be bucketed into one or more categories. For example, I have a category for newly created invoices as well as a category for invoices that are in process. Easily, you can have an invoice that is both newly created and in process, and I want to show these both these categories as separate series so that I can monitor each category independently (i.e. if I have a population of 600 invoices of which 500 are newly created and 300 are in process, I want to be able to drill down into the 500 newly created invoices or the 300 in process ones, regardless of overlap)
Currently, I can create the graph by using a CASE statement for the Y-Axis (i.e. if it's a "New Invoice" Then 1, ELSE 0) so I can get the graph to show the correct number of records. However, with this method, Spotfire doesn't know that only records that satisfy the case statement should be marked; therefore, if I try to mark these specific transactions, I get detail for all transactions.
Has anyone figured out a way to get around this? Obviously, if each criteria was independent, the marking could be done really easily, but since it isn't, I can't seem to crack what seems like a very simple problem.
I would suggest creating a calculated column that splits your data set accordingly. For example, use an if statement or case statements to have your calculated column return "New Invoice", "In-progress Invoice", or "New and In-progress Invoice" based on your specific evaluation criteria for determining which bin each invoice falls in to. Then you could series the graph by that new column and it should work when you mark the specific invoices.
See if creating a hierarchy like [OLD or NEW], and place [STATUS] (which may be Open, or In Progress, or Closed) below the [OLD or NEW] column. Place this hierarchy in X axis.
You would see like OLD which can be further translated to either Open, In Progress, or Closed, and the same with New.
I am trying to rank subsets of my data in spotfire.
Lets say I have a data table that has the following 6 columns:
Individual, City, Zip Code, State, Amount1,and Amount2.
There are thousands of Unique Individuals in each Zip Code and many Zip Codes within each State. How would I display only the data from the top 5 Zip Codes within each State (as defined by the SUM()of Amount1)?
To summarize the order of operations; I want to sum up Amount1 for each Zip Code, then Rank the Zip Codes in desc order within each State (just an intermediate step for explanatory purposes) and finally, only display the top 5 Zip Codes within each State.
All I could think of was to create a calculated column that would return the Zip Code if it satisfied my conditions and NULL if it did not. I don't think its the best but here is the code I started with:
case WHEN DenseRank(Sum(Sum([Amount1]) over [ZipCode]) over [State],"desc")<6 then [ZipCode] ELSE NULL END
Any help would be great. Thanks!
Thanks for the clarification in the comments.
DenseRank(Sum([A1]) OVER ([ZipCode]),"desc",[State]) as [Rank]
The above function will give rank your [ZipCode] within its respective [State] based on the SUM() of an amount in column [A1]. DenseRank() will NOT skip a ranking number if there is a tie. The means you could have more than 5 [ZipCode] in your top 5. Use Rank() if you want to avoid this.
Then, you can create a calculated column for your filter panel, or just filter it in the "Limit Data using Custom Expressions" section of your chart.
If([Rank] < 6,"Top 5", "Other") as [Zip Rank in State]
I have created a new experiment in Azure Machine Learning and added two datasets by manually uploading csv's.
One is from a customer of which I'd like to predict which products he will order next.
The second dataset has the same type of data, only then from all other customers as reference for learning.
I have productid, amount, and orderdate and orderid for grouping and putting it on a timeframe.
The customer (dataset one) is always several months behind with ordering the latest products. therefor I added the dataset two with all other customers as reference.
Also because the reference can tell which products are more popular (ordered more and by several customers) so perhaps I should add a customerid column to the dataset.
I know how to start and get the data in, and I do know that it is common to split the data for training, feed it to the train model with a Ilearnerdotnet type and give the output to the score model and evaluate the model.
I do not know how to choose a classification type and how this can give an output for the next three months of order. I have read some tutorials, but I just need someone who can give me some pointers.
edit I have added the customerid to the dataset so that I have just one set now which I should split to focus on a specific customer.
edit2 found these templates. will look into it https://stackoverflow.com/a/36552849/169714
Go over this http://download.microsoft.com/download/0/5/A/05AE6B94-E688-403E-90A5-6035DBE9EEC5/machine-learning-basics-infographic-with-algorithm-examples.pdf
If above infographic doesn't help, then you can try all of the learners by going over this experiment and use the one with best results - https://gallery.cortanaintelligence.com/Experiment/Algo-Evaluater-Compare-Performance-of-Multiple-Algos-against-Your-Data-1
I am interested in creating a gvNIX/Roo application which shows the location of health facilities in Tanzania on a map. I am trying the tutorial available here. However my data is in the format shown below where my location data is in two columns (southings and eastings). The tutorial shows how to create three data types:
field geo --fieldName location --type POINT --class ~.domain.Owner
field geo --fieldName distance --type LINESTRING --class ~.domain.Owner
field geo --fieldName area --type POLYGON --class ~.domain.Owner
Am assuming I need the POINT data type to hold data on a health facility location but am not sure how to get the below 2 columns (southings and eastings) into a single POINT variable. Am pretty new to GIS as well. The data is as below (csv format):
outlet_name,Status ,southings,eastings,streetward,name_of_outlet
REHEMA MEDICS,02,2.49993,32.89512,K/POLISI,REVINA
KIRUMBA MEDICS,02,2.50023,32.89503,K/POLISI,GEDION
KIRUMBA PHARMACY,02,2.50152,32.89742,K/POLISI,MAURETH
TULI MEDICS,02,2.48737,32.89686,KITANGIRI,TULI
JULLY MEDICS,02,2.53275,32.93855,BUZURUGA,JULLY
MAGOMA MEDICS,02,2.53181,32.94211,BUZURUGA,MAGOMA
MECO PHARMACY,02,2.52923,32.94730,MECCO,DORCAS
UPENDO MEDICS,02,2.52923,32.94786,MECCO,UPENDO
DORIS MEDICS,02,2.49961,32.89191,KABUHORO,DORIS
SOPHIA MEDICS,02,2.49975,32.89120,KABUHORO,ESTER
MWALONI PHAMCY,02,2.56351,32.89416,MWALONI,ESTER
SILVER PHAMACY,02,2.51728,32.90614,K/KILOMERO,WANDWATA
KIBO PHARMACY,02,2.51688,32.90710,MISSION,MARIAM
Thanks
You need to transform your coordinates to WKT format (Well Known Text) in order to insert them in a column in your database (a postgresql database with postgis support). In order to achieve this you need to follow these steps:
Find the SRID of your coordinates reference system (CRS). That is, the identificator which define your coordinates system. Otherwise, your points won't match the real coordinates. You'll need the SRID in the last step.
Transform your data to WKT. The data needed for inserting the points is in the southings and eastings columns (I suppose they are equal to latitude and longitude, that are the most common used), so you'll need to transform these columns in one single column with WKT format. e.g. for your first row of data: Point(32.89512 2.49993). Note the space between them and the switch between the numbers.
Proceed with the inserts with SQL syntax, but using postgis functions. An example for your first row would be: INSERT into health_facilities (outlet_name, Status, streetward, location) VALUES ('REHEMA MEDICS', 02, 'K/POLISI', ST_GeomFromText('Point(32.89512 2.49993)', 4326));. Where "4326" are the numbers of the SRID you have to find (supossing it is the most common -> EPSG:4326).
You can find more info here and here. Also there are several pages where you can check coordinates and transform them between diferent CRS, like this and this.
I have a data set, and I would like to produce a prediction model based on that data-set, usimg Microsoft Azure
This data-set contains some group of events that together make a bigger event, for example - few lines in the data-set that are close in time (there is a time column) create together one event in time.
does anybody know the method for how can I do it? is there anyway to create a prediction model that learns not from a certain column, but from a different data-set (of results, for that matter)
thanks
Yes, this does exist using Azure Machine Learning, have a look in the Gallery for various examples where this has been shown.
I think for your specific question it will be good to have a look at the Bike regression example. In that example they show how they aggregate information from multiple rows to score on a single feature.