I'm not sure if this is the correct place to ask this, but basically I have a .txt file containing values that came from 2 separate sensors.
Example of some data:
{"t":3838202,"s":0,"n":"x1","v":-1052}
{"t":3838203,"s":0,"n":"y1","v":44}
{"t":3838204,"s":0,"n":"z1","v":-84}
{"t":3838435,"s":0,"n":"x1","v":-1052}
{"t":3838436,"s":0,"n":"y1","v":36}
{"t":3838437,"s":0,"n":"z1","v":-80}
{"t":3838670,"s":0,"n":"x1","v":-1056}
{"t":3838671,"s":0,"n":"y1","v":52}
{"t":3838672,"s":0,"n":"z1","v":-88}
{"t":3838902,"s":0,"n":"x1","v":-1052}
{"t":3838903,"s":0,"n":"y1","v":48}
{"t":3838904,"s":0,"n":"z1","v":-80}
{"t":3839136,"s":0,"n":"x1","v":-1056}
{"t":3839137,"s":0,"n":"y1","v":40}
{"t":3839138,"s":0,"n":"z1","v":-80}
x2:-944
y2:108
z2:-380
{"t":3839841,"s":0,"n":"x1","v":-1052}
{"t":3839842,"s":0,"n":"y1","v":44}
{"t":3839843,"s":0,"n":"z1","v":-80}
x2:-948
y2:100
z2:-380
{"t":3840541,"s":0,"n":"x1","v":-1052}
{"t":3840542,"s":0,"n":"y1","v":40}
{"t":3840543,"s":0,"n":"z1","v":-84}
{"t":3840774,"s":0,"n":"x1","v":-1052}
{"t":3840775,"s":0,"n":"y1","v":40}
{"t":3840776,"s":0,"n":"z1","v":-84}
x2:-948
y2:108
z2:-368
I'm trying to get the data into excel, so that for each "chunk" of data in the x1y1z1 section, I take the last set of recorded data and discard the rest and "pair" it with the next set of x2y2z2 data. I don't think I'm explaining it very well, but I basically want to take that text file and get this in excel:
+---------+-------+----+-----+------+-----+------+
| t | x1 | y1 | z1 | x2 | y2 | z2 |
+---------+-------+----+-----+------+-----+------+
| 3839138 | -1056 | 40 | -80 | -944 | 100 | -380 |
| 3839843 | -1052 | 44 | -80 | -948 | 100 | -380 |
| 3840776 | -1052 | 40 | -84 | -948 | 108 | -368 |
+---------+-------+----+-----+------+-----+------+
I'm really stuck as to where I should even start
I think like a programmer, so I would approach this problem in steps. If you are not a programmer, this might not be so helpful to you, and I am sorry for that.
First, define the data. How does each line of data get read and understood.
Second, write a parsing utility. A piece of code which interprets the data as it is read in and stores it in the form you want for your output
Third, import data into Excel.
So, based on the limited data you provided, I am not sure how you are able to determine the x1,y1,z1,x2,y2,z2 for each t, but I assume that the values enclosed in curly braces have something to do with that based on the values for s, n, and v I'm seeing in there. So, first of all you need to clearly determine the way you read the data. Take it one line at a time, and determine how you would build your output table based on each line of data. I assume you would treat the lines enclosed in curly braces differently from the lines with standalone x/y/z values for example.
I hope this points you in the right direction.
Related
I do wonder how it is possible to make sliding windows in Pandas.
I have a dataframe with three columns.
Country | Number | DayOfTheYear
===================================
No | 50 | 0
No | 20 | 1
No | 37 | 2
I would love to see 14 day chunks for every country and day combination.
The country think can be ignored for the moment, since I can filter those manually in some way. But imagine there is only one country, is there a smart way to get some sort of summed up sliding window, resulting in something like the following?
Country | Sum | DatesOftheYear
===================================
No | 504 | 0-13
No | 207 | 1-14
No | 337 | 2-15
I would also accept if if they where disjunct, being only 0-13, 14-27, etc.
But I just cannot come along with Pandas. I know an old SQL solution, but is there anybody having a nice idea for Pandas?
If you want a rolling windows of your dataframe, you can simply use the .rolling function of pandas : https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.rolling.html
In your case : df["Number"].rolling(14).sum()
I am printing a "Table" to the console. I will be using this same table structure for several different variables. However as you can see from Output below, the lines don't all align.
One way to resolve it would be to increase the number of decimal places (e.g. 6.730000 for Standard Deviation) which would push the line into place.
However, I do not want this many decimal places.
Is it possible to add extra 0s to the end of a number, and make these invisible?
I am planning on using this table structure for several variables, and the length of Mean, Stddev, and Median will likely never be more than 6 characters.
EDIT - I would really like to ensure that each value which appears in the table will be 6 characters long, and if it is not 6 characters long, add additional "invisible" zeros.
Input
# Create and structure Table to store descriptive statistics for each variable.
subtitle = "| Mean | Stddev | Median |"
structure = '| {:0.2f} | {:0.2f} | {:0.2f} |'
lines = '=' * len(subtitle)
# Print table.
print(lines)
print(subtitle)
print(lines)
print(structure.format(mean, std, median))
print(lines)
Output:
======================================
| Mean | Stddev | Median |
======================================
| 181.26 | 6.73 | 180.34 |
======================================
Didn't really figure this out - but found a workaround.
I just did the following:
"| {:^6} | {:^6} | {:^6} | {:^6} | {:^6} |"
This keeps the width between | consistent.
Slightly wordy title but here goes
I have a grid in excel which includes 3 columns (media spend, marginal revenue returns & media channel invested in) and I want to create the column below called desired cumulative spend
The reason the grid is structured in this way it does is that it represents an optimised spend laydown ordered by how much of each media channel's budget should be invested in until the marginal returns diminish such that it should be substituted for another media channel.
It is possible that this substitution can then be reversed back to the original channel if the new channel has a sharply diminishing curve, such that all marginal benefit associated to the new channel diminishes and the total spend level still means it is mathematically sensible to switch back to the original curve (maybe it has a lower base level but reduces less sharply). It is also possible that at the point in which the marginal benefit associated to the new channel diminishes, the best next step is to invest in a third channel.
The desired new spend column has two elements to it
it is a simple accumulation of spend from row to row when the
media channel is constant from row to row
it is a slightly more tricky accumulation of spend when the media
channel changes - then it needs to be able to reference back to the
last spend level associated to the channel which has been
substituted in. For row 4, the logic I am struggling with would need
to the running total from row 3 plus the new spend level associated
to row 4 minus the spend level the last time this channel was used
(row 2)
|spend | mar return | media | desired cumulative spend |
|------ |----------- |-------| ----------------------------------------- |
1 | £580 | 128 | chan1 | 580 |
2 | £620 | 121 | chan1 | 580+(620-580) |
3 | £900 | 115.8 | chan2 | 580+(620-580)+900 |
4 | £660 | 115.1 | chan1 | 580+(620-580)+900+(660-620) |
5 | £920 | 114 | chan2 | 580+(620-580)+900+(660-620)+(920-900) |
6 | £940 | 112 | chan2 | 580+(620-580)+900+(660-620)+(920-900)+(940-920) |
If my comment is the correct sugestion, then something like this should do it (£580 is at A2, so the first output is D2):
D2 =A2
D3 =D2+A3-IF(COUNTIF($C$2:C2,C3),INDEX(A:A,MAX(IF($C$2:C2=C3,ROW($A$2:A2)))))
D3 contains an array formula and must be confirmed with ctrl+shift+enter.
Now you can simply copy down from D3.
I have a column of values in Excel that I need to modify by a scale factor. Original column example:
| Value |
|:-----:|
| 75 |
| 25 |
| 25 |
| 50 |
| 0 |
| 0 |
| 100 |
Scale factor: 1.5
| Value |
|:-----:|
| 112.5 |
| 37.5 |
| 37.5 |
| 75 |
| 0 |
| 0 |
| 150 |
The problem is I need them to be within a range of 0-100. My first thought was take them as percentages of 100, but then quickly realized that this would be going in circles.
Is there some sort of mathematical method or Excel formula I could use to handle this so that I actually make meaningful changes to the values, such that when these numbers are modified, 150 is 100 but 37.5 might not be 25 and I'm not just canceling out my scale factor?
Assuming your data begin in cell A1, you can use this formula:
=MIN(100,A1*1.5)
Copy downward as needed.
You could do something like:
ScaledValue = (v - MIN(AllValues)) / (MAX(AllValues) - MIN(AllValues)) * (SCALE_MAX - SCALE_MIN) + SCALE_MIN
Say your raw data (a.k.a. AllValues) ranges from a MIN of 15 to a MAX of 83, and you want to scale it to a range of 0 to 100. To do that you would set SCALE_MIN = 0 and SCALE_MAX = 100. In the above equation, v is any single value in the data.
Hope that helps
Another option is:
ScaledValue = PERCENTRANK.INC(AllValues, v)
In contrast to my earlier suggestion, (linear --- preserves relative spacing of the data points), this preserves the order of the data but not spacing. Using PERCENTRANK.INC will have the effect that sparse data will get compressed closer together, and bunched data will get spread out.
You could also do a weighted combination of the two methods --- give the linear method a weight of say 0.5 so that relative spacing is partially preserved.
So for example purposes, I have the following table:
| | A | B |
| |------------|----------|
| 1 |Description |Amount |
| 2 |------------|----------|
| 3 |Item1 | 5.00|
| 4 |Item2** | 29.00|
| 5 |Item3 | 1.00|
| 6 |Item4** | 5.00|
| 7 |------------|----------|
| 8 |Star Total | 34.00|
| 9 |------------|----------|
I want to create a formula in B8 that calculates the sum of the amounts if the description of that amount contains "**" (or some other denoting text). In this particular example I would like a formula that returns 34 since only Item2 and Item4 contain "**".
I tried to use something like this, but it only worked based on the value in A3:
=SUMIF(A3:A6, ISNUMBER(SEARCH("**", A3)), B3:B6)
Any suggestions would be appreciated!
The asterisk is the wildcard symbol that can be used in Sumif(), so you may want to change the denoting text to some other symbols, for example ##. Then this formula will work:
=SUMIF(A2:A10,"*##*",B2:B10)
If you want to keep the asterisks, the formula gets a bit curlier.
=SUMIF(A2:A10,"*~*~**",B2:B10)
The two middle asterisks are escaped with the tilde character.
You can escape the wildcard character and turn it into a literal * by prefixing it with a swung dash (tilde, ~) and so leave your data unchanged:
=SUMIF(A2:A7,"*~*~*",B2:B7)
IMO worthwhile because astrisks are relatively 'elegant'.