I have a large data set of smart meter which has more than a million rows. Data looks like this
customer number time load
1000 19501 1.5
.... ..... ...
1000 19548 1.5
1000 19600 1.5
1000 ... ..
1000 19648 1.5
. . .
1001 19501 1.5
. . .
Where first column is customer number, second column shows datand time and third column shows load. The date time starts from 19501, goes upto 48 and then it moves to 19600 and similarly for 7 days. Now i want to analsye this data in matlab using clustering. Firstly the data is in .txt format and due to its large number of rows, it does not open in matlab.
I opened it in excel (although it does not read it fully but still a million row data is good enough for me). I have reduced the number of rows so that they are readable to matlab and arranged the data using filters to have data arranged for individual customer from time 19501 till last reading for this customer and then second customer and so on.For my matlab clustering, I need the data from 19501-19548 hrs in 1 row, then next 48 readings for same customer in next row and so on till last customer.
Is it possible to have a matlab code which can do it automatically or shall I look for something in excel?
Related
I have a large data file from a test where I send a voltage that is increment by 1mv every 30s from 0-5V) to test the accuracy of my system. The computer outputs a file that has over 70000 rows of data but all I am really concerned with is data that occurs every 30s. Is there a way to filter for only the data that aligns with the 30s interval and ideally having around 5000 rows of data?
I am stuck and I really don't want to manually sort through 70000 lines of data, any help is greatly appreciated..
So you want to filter and only see the rows that occur every 30 seconds? You can add a calculated column in Excel to extract the seconds and filter by that column:
=RIGHT(TEXT(A1, "hh:mm:ss"),2)
This will extract the seconds from a time, and then you can filter where the seconds are 30. Replace A1 with your correct column.
I need to find out how many days (not times) when temperature have been below -15 degrees.
I have found data from a website that gives me the temperature every half hour but that gives me 48 values for each date.
If the temperature have been below -15 more than one time during the date it should only be counted as one.
Any ideas?
This is my data:
More example data
More example data
According to your sample image's columns, put this in an unused cell.
=SUMPRODUCT((D$2:D$999<=-15)/(COUNTIFS(B$2:B$999, B$2:B$999, D$2:D$999, "<=-15")+(D$2:D$999>-15)))
Adjust the rows to your actual data range to minimize calculating blank rows.
I have been collecting data over the past few data from an energy metering project i have set up.
The values that are recorded are saved in a CSV file and then extracted through usb drive.
i have opened and assessed the CSV file in excel and the data hasnt been recorded how i would like.
Instead of logging once every minute it has logged every 7 seconds.
This has created a problem as the template csv file i have created to average these figures wont work now.
i am trying to create a VBA marco to assess all the data and where the second value is higher then 6 seconds, i want it to delete it for example:
here is some of the values i am working with:
16:29:05 PAC3 239.8030701 50.01350021 1073.719116 4.450771332 0
16:29:05 PAC2 239.2398834 50.01499939 3046.500732 12.62684536 0
Above is how i would like it to look.
but it currently looks like below where there are several entries under the 16:30 time
16:30:02 PAC3 239.6912689 50.06306076 1092.592651 4.229027748 0
16:30:02 PAC2 238.8809052 50.06230927 3535.760254 14.82234478 0
16:30:09 PAC3 239.8191681 50.07057571 999.7850342 4.125905514 0
16:30:09 PAC2 239.2037506 50.06982422 2644.371338 11.05446911 0
because it is logging every 7 seconds i am getting about 7 - 8 logs per minute per PAC
so where the second value is greater than 6 seconds i would like the whole row to be removed. and continue to cycle through the entire column and remove them. These cells are formatted into a time format but do contain a time and data value
I have searched for ways to complete this task but have found no solutions
any help appreciated.
If you use =SECOND() function you can extract the seconds from the time value. Then loop up the column starting at the bottom and delete the rows that contain the values you don't want.
I'm looking to selectively copy a list of data in Excel for the purposes of reducing the quantity.
In the first column I have Date/time and in the second column I have a data value, in this case it's electrical meter readings.
The data is currently given very 15 minutes and what I'm trying to do is reduce that to every hour. i.e. effectively create a new column which extracts only the data from the original list for every hour (Also with no gaps in the rows, therefore condensing the length of the list).
Any advice much appreciated!
I made a little test machine that accidentally created a 'big' data set:
6 columns with +/- 550.000 rows.
The end result I am looking for is a graph with 6 lines, horizontal axis 1 - 550.000 measurements and vertically the values in the rows. (capped at 200 or so). Data is a resistance measurement that should be between 0 - 30 or very big (borken), the software writes 'inf' in these cases.
My skill is limited to excel, so what have I done until now:
Imported in Excel. The measurements are valuable between 0 - 30 and inf is not good for a graph, so I did: if(cell>200){200}else{keep cell value}.
Now making a graph is a timely exercise and excel does not like this, result is not good.
So I would like to take the average value of 60 measurements to reduce the rows to below 10.000. So =AVERAGE(H1:H60)
But I cannot get this to work.
Questions:
How do I reduce this data set and get a good graph.
Should I switch
to other software that is more applicable?
FYI: I already changed the software of the testing device to take the average value of a bunch of measurements the next time... But I cannot repeat this test.
Download link of data set comma separated file 17MB
I think you are on the right track, however my guess is that you only want to get an average every 60 rows and are unsure how to do this.
Using MOD(Number, Divisor) inside an if statement will let you specify that the average should be calculated only once in every x number of cells.
Assuming you'll have one row above your data table for headers, you are looking for something along the lines of:
=IF(MOD(ROW(A61),60) = 1,AVERAGE(H2:H61),"")
Once you have this you can filter your average column to non-blank values and use this to create your graph.