I've got a spreadsheet with a few thousand tweets in it, each one is time-stamped like so:
18/05/2014 21:14
Amongst other functions I've sorted out, I'd like to plot all the tweets on a graph grouped together by say 15-minute segments.
How can I group those different times(tweets) together?
I found a solution to this on another forum:
http://www.mrexcel.com/forum/excel-questions/359025-pivot-table-grouping-5-min-intervals.html
See the final answer.
It suggests adding a column to your raw data that contains a formula to round each time down to the previous increment.
For you, the formula would be as follows, where A1 contains the time stamp.
=TIME(HOUR(A1),FLOOR(MINUTE(A1),15),0)
In your example, 21:14 would become 21:00. 21:16, meanwhile, would become 21:15.
If you format your data as a table, the formula will auto-fill for all of the tweets.
Then, you can use a pivot chart based on the new column to create your graph. Since everything will be rounded down, the label 21:00 would include all tweets posted from 21:00 to 21:15.
Related
I'm new to Excel so I don't know if it's even possible to do what I'm trying to.
I wanna build a graph with 3 columns of direct data and the last one should be an average of older data and every week I should add a new column with a newest one and that should update the graph becoming the first column, sending the previous one to the second one and so on until the previous 3rd one becomes part of the average column. I'm uploading some images to make things clearer.
Sorry about the paint images =/
If not clear, the average should be the mean values of all older columns and my graph should have always 4 columns no matter the amount of older ones.
I am open to other implementation methods that could give me a similar solution.
I'm new to Tableau and new to posting in Stackoverflow so bear with me.
I have a dataset with variables such as State, County, Organization, 2020 Enrollment, 2021 Enrollment, and Delta (change in enrollment over those two years). What I want is a column that gives the percent delta in enrollment over these two years.
The first thing I tried was calculating a column just using the growth formula:
(ZN([2021Enrolled])-ZN([2020Enrolled]))/ZN([2020Enrolled])
In the Data View this works great, because nothing is being summed, I get the correct delta. But when I use this formula in my worksheet, what happens is that the formula is being calculated across all the observations (there are several observations per county, per organization, for example) and then summed up. This gives an incorrect delta for year over year.
What I am looking for is a way to calculate the % delta column based on the total enrollments for 2020 and 2021 in order to achieve the correct % delta.
I included two screenshots below showing what Tableau is giving, and then an Excel spreadsheet of the same data filtered on just one county to show the problem a little better.
Maybe a similar question has been asked before, but I was unsure just how to search this up. Any help would be appreciated.
Thanks!
Sam
Tableau view
Excel view
I found the answer: I was trying to create a calculated column in Data View, what I needed to do was create a calculated column in my worksheet view, so that it would only work on the data presented there.
Let's say we have time slots documented in which a production line was running. In between each product maufactured are time slots in which the machine was idling.
I now want to plot the machine status over time, basically as a boolean value (running vs idling).
I get the machine log and need the chart on the right.
The machining duration will ultimately be logged including seconds and may vary for each product.
The first - and probably biggest - challenge for me is to find a smart way to extract the status from the time stamps. My current first step ist to create a table row for each minute and use the if statement in H4 to check wether article 1 was being manufactured.
IF(AND([#Time]>Machine_log[#Start],[#Time]<Machine_log[#Finish]);;)
However, since the final list will range over 24 hours or more and the number of articles quickly reaches 50 and more, I would love to avoid using nested IFs on this one..
I'm thankfull for any input and open for inspiration :)
Thank you all in advance!
PS: Anyone know how a better way than a scatter chart with two values per X-Value to display the chart as vertical lines/right angles like this?
One option is to add only those points that are necessary to the Status extraction table (which I named "Status"). (I named the Machine log table "Log").
Note: it looks like you are using a semicolon list separator, so you'll need to change the commas in the formulas below to semicolons.
Formula for the Time column:
=IF(ROW()=ROW(Status),MIN(Log[Start])-1/144,IFERROR(INDEX(Log[[Start]:[Finish]],INT((ROW()-ROW(Status)-1)/4)+1,MOD(INT((ROW()-ROW(Status)-1)/2),2)+1),MAX(Log[Finish])+1/144))
Formula for the Production running? column (enter into H4 and fill down):
=IF(SUMPRODUCT(--(Log[[Start]:[Finish]]=[#Time])),IF([#Time]=G3,3-H3,H3),1)
These formulas will pad your plot with 10 minutes of off time on either side.
To answer your question about avoiding two points for each x-value: no, each point on the plot has to have a corresponding data pair.
UPDATE IN RESPONSE TO COMMENT: I failed to mention that the above solution assumes the time data in the Machine log table are in ascending order. This means that if your data span more than one day, they will need to contain a date component or you can get plots where the line crosses back to the beginning. For example, if you have 23:57:00 followed by 00:10:00 with no date component, Excel treats these as 11:57 pm on 1 January 1900 and 12:10 am on 1 January 1900. (To see this, change the format to "General", and you'll see the values that Excel uses to encode date-time aren't in ascending order.) The solution is to enter the dates as "8/16/2020 23:57:00" and "8/17/2020 00:10:00" in the formula bar. If you're copying over from another data source, the date needs to be copied with the time. If the dates and times are in separate columns, your Start and Finish columns would each be a date column plus a time column.
I have been downloading and organizing hourly water quality data into Excel for many different states, and have organized them by year. I have done data prep for them to make sure there are no zeros/every day of the year (DOY) has 24 values, but the time series plots were too noisy so want me to get the daily average values instead.
All of the sites annual data is different in terms of how many days are available, and sometimes they are missing whole months due to no recordings.
So my question is, how can I develop a code to give me the average daily value linked to a specific DOY that I can apply to many different Excel sheets. The data appears like this:
And the files are saved like like CA1_2012 (California Site 1 hourly data from 2012)
I know there is a lot on this topic but I have been trying everything and I can't get a code that works!
You can get the summation of the second column based on values in the first column in matlab using accumarray;
[m,~,n] = unique(data(:,1));
sumdata = [m, accumarray(n,data(:,2))];
for mean I would suggest grpstats:
avgdata = grpstats(data, DOY, {'mean'});
or as #gnovice suggested:
avgdata = accumarray(DOY, data, [], #mean);
You can also get what you want by using Pivot Table in excel and group data by DOY and get the mean value for them in the table. (No coding required).
I am trying to plot some time series data, but in a way that has stumped me so far. The salient part here is that each data point is associated with an open date and a closed date. I would like a time series line graph that counts the number open on a given date.
Example: Open - Close
first record: 2/10/2013 - 3/1/2013
second record: 2/15/2013 - 3/5/2013
The graph I'm looking for would start at 0, rise to 1 on 2/10 rise again to 2 on 2/15 then drop down 1 on 3/1 and back to 0 on 3/5.
The actual dataset contains hundreds of records, so manual processing is out of the question. I'm sure there must be an easy way to do it, but I have not found it yet. Tried help and google search, but I'm not exactly sure what I'm looking for.
Use the CountIfs() function like so:
So, you specify the category labels, and then use the COUNTIFS() function to evaluate, for each category label, how many records are open at that time.
You can use the result of the Countifs function as the frequency for a histogram, time series, bar chart, etc.
Then, plot the data in columns E & F (or however your sheet happens to be arranged) to create the chart.
Edit
To include blank values in the count, modify the formula thusly:
=COUNTIFS($B$3:$B$7,"<="&E3,$C$3:$C$7,">="&E3)+COUNTIFS($B$3:$B$7,"<="&E3,$C$3:$C$7,"")