Filtering block data in Excel by rows ( weather forecast data from National Weather Service) - excel

I need to get data for weather forecasts for a period of several years for 18 US cities. I can obtain the data from the National Weather Services but the problem is that the Excel file contains data for all US cities and the data is presented in blocks. I got the data in uncompressed format from here: http://www.mdl.nws.noaa.gov/~mos/archives/mex.html. For example, here is the data for January 2006: https://www.dropbox.com/s/lf552bgbwdyusli/01.2006.xlsx. I need only information about min and max temperature for 18 particular cities, which means I need only the first 5 rows of data for each city. The data cannot be sorted by columns and cannot be transposed as it contains text cells.
I was wondering whether there is a faster way to select the data I need than searching for each particular city and copy and pasting the data in a new worksheet. This method would be fine if I had only a couple of files of data, but I have around 100 of them.
Thank you very much for your help!

Related

Create a table that mimics an Excel Pivot Table with the Product Value Field for a too large set of data

I have a table with more than 1 million rows of data so I can't put the information into Excel. The first column is the identifier and the second column is the percent increase on that identifier. The table records all the increases over a year so the ID entries in the first column can have more than one increase during the year.
I want to calculate the total increase. If the data was small enough to fit in excel I would just pivot the table and make the rows the ID in the Rows bucket and the Product of the Increase in the Values bucket. This give me one row for each unique ID and the total percentage increase for the year.
The matrix visualization in Power BI doesn't help since it doesn't have a similar Product summarization. That and there are more than 1 million unique IDs in the dataset so I can't export it due to the 150K row limit. I need to create a new table in BI that does the same thing because I want to bring in related data from another table that includes categories and then average the category.
Is there a way to do this? Please let me know any questions you have and if you need any additional information or clarification. Thanks.

How to Group Measures and Make Columns Table Source Name

I'm new to Tableau! I hope this is a simple answers. Thanks in advance!
I'm working with employee data and I need to create a matrix of headcount totals across years and months.
Final Matrix Output Example
I'm starting with 6 tables listing all active employees at the beginning of each year from 2015 through 2020. I then have a list of employees and the date that were hired; so all employee additions. I then have the same thing for terminations. All 8 of these tables are in the same Excel file but different tables.
List of Data Tables
How can I take this data and create the matrix I linked above? I've tried creating calculated fields to count the number of active employees for each time period, but I can't then seem to get the matrix to organize itself correctly in a table.
Current Status
I feel like the easiest solution would be to query this so that I just have a snapshot of all active employees at the beginning of each month and year with month and year columns, but I'm not sure how to convert what I have now, into that sort of structure.
Thanks again.
I fear you have to extensively restructure your data before proceeding to build a view/crosstab, as is evident from the current status of your data (screenshot shared by you). You can do it much easily in excel. Meanwhile, I recommend/suggest you to read the paper by Hadley Wickham, renowned statistician/data scientist at this page https://vita.had.co.nz/papers/tidy-data.pdf
Still, I am trying to give you the steps which you can follow-
Step-1 Rename all columns of headcount tables by removing years from these. (Keep year names in sheets instead). This will give same column names for your all headcount tables.
Step-2 UNION all these headcount tables in data-tab of tableau. Keep sheet_names in a separate columns which will later-on be used to extract years' values.
Step-3 PIVOT all months columns to rows (In data tab only)
Step-4 Extract year names from file/sheetname column
Step-5 This will give a table structure with three useful columns to build your crosstab i.e. 1. Year (to be placed in columns); 2. Months (to be placed in rows) and 3. Headcount value (to be placed on viz/text marks card)

Extract data from sample sheet with another sheet telling what factor to multiply the data and when to extract

I have three source sheets that show expected performance of three types of cities for 24 months (Small, Medium and Large).
Typical view - this tab's called small city sample.
I have a 'city launch' sheet. Essentially it is a matrix that shows a timeline of possible city launches, and amount of specific city launches in a month (factor + timeline spreading of sample sheet).
For example. Jan-20 2 small cities in country A, 1 large city in country A, 1 medium city in Country B etc, with a total (sumifs) at the top.
I'm trying to figure out a macro that based on the amount of 'cities' launched in the specific month (from city launch tab), extracts from city tabs, multiples by the factors, then spreads and adds preceding data in to the consolidated tab.
The data in is static, as they are the base.
The consolidated tab is essentially a detailed version of sample city tabs where the 'levers' are essentially in .
How could I attempt this?
I would look at writing a function in the Consolidated tab which takes in the "small", "medium" and "large" tables as arguments and then takes the city/size as an argument too. Then use the city/size argument to look at the relevant small/medium/large table and iterate over the headers to find the correct factor then carry out your arithmetic in the function

Converting Hourly Data to Daily Data for many different Excel files

I have been downloading and organizing hourly water quality data into Excel for many different states, and have organized them by year. I have done data prep for them to make sure there are no zeros/every day of the year (DOY) has 24 values, but the time series plots were too noisy so want me to get the daily average values instead.
All of the sites annual data is different in terms of how many days are available, and sometimes they are missing whole months due to no recordings.
So my question is, how can I develop a code to give me the average daily value linked to a specific DOY that I can apply to many different Excel sheets. The data appears like this:
And the files are saved like like CA1_2012 (California Site 1 hourly data from 2012)
I know there is a lot on this topic but I have been trying everything and I can't get a code that works!
You can get the summation of the second column based on values in the first column in matlab using accumarray;
[m,~,n] = unique(data(:,1));
sumdata = [m, accumarray(n,data(:,2))];
for mean I would suggest grpstats:
avgdata = grpstats(data, DOY, {'mean'});
or as #gnovice suggested:
avgdata = accumarray(DOY, data, [], #mean);
You can also get what you want by using Pivot Table in excel and group data by DOY and get the mean value for them in the table. (No coding required).

Selectively copying a data list in Excel

I'm looking to selectively copy a list of data in Excel for the purposes of reducing the quantity.
In the first column I have Date/time and in the second column I have a data value, in this case it's electrical meter readings.
The data is currently given very 15 minutes and what I'm trying to do is reduce that to every hour. i.e. effectively create a new column which extracts only the data from the original list for every hour (Also with no gaps in the rows, therefore condensing the length of the list).
Any advice much appreciated!

Resources