VBA code not working in Excel with dataset

VBA code not working in Excel with dataset - excel

I am working on a machine learning project and am using Excel to handle the dataset. I am new to both Excel and VBA.
So I am using this dataset, and I just copy pasted the whole thing into an excel spreadsheet. I did text to columns. Here's a snapshot of some of the data:
Snapshot of data
I want to reformat the data in the spreadsheet so that all of the data goes into a single row, then starts a new row after the "name" keyword.
For example, I want this:
1 2 3 4 5 6 7 8 9 10
11 12 13 14 15 16 17 18
19 20 21 22 23 name
to become:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 name (all on one line)
without having to do it manually line by line.
I used the below VBA code to format the data how I want it:
Sub separateByName()
Dim lRow As Long
Dim lCol As Long
Dim lCol2 As Long
k = 1
lRow = Cells(Rows.Count, 1).End(xlUp).Row
For i = 1 To lRow
lCol = Cells(i, Columns.Count).End(xlToLeft).Column
For j = 1 To lCol
lCol2 = Sheets("Sheet2").Cells(k, Columns.Count).End(xlToLeft).Column
Sheets("Sheet2").Cells(k, lCol2 + 1).Value = Cells(i, j).Value
If Cells(i, j).Value = "name" Then k = k + 1
Next j
Next i
End Sub
However, when I run I'm getting problems in that the result seems randomly patterned.
This:
1 0 63 1 -9 -9 -9
-9 1 145 1 233 -9 50 20
1 -9 1 2 2 3 81 0
0 0 0 0 1 10.5 6 13
150 60 190 90 145 85 0 0
2.3 3 -9 172 0 -9 -9 -9
-9 -9 -9 6 -9 -9 -9 2
16 81 0 1 1 1 -9 1
-9 1 -9 1 1 1 1 1
1 1 -9 -9 name
2 0 67 1 -9 -9 -9
-9 4 160 1 286 -9 40 40
0 -9 1 2 3 5 81 0
1 0 0 0 1 9.5 6 13
108 64 160 90 160 90 1 0
1.5 2 -9 185 3 -9 -9 -9
-9 -9 -9 3 -9 -9 -9 2
5 81 2 1 2 2 -9 2
-9 1 -9 1 1 1 1 1
1 1 -9 -9 name
Became this:
1 0 63 1 -9 -9 -9 1 0 63 1 -9 -9 -9 -9 1 145 1 233 -9 50 20 1 -9 1 2 2 3 81 0 0 0 0 0 1 10.5 6 13 150 60 190 90 145 85 0 0 2.3 3 -9 172 0 -9 -9 -9 -9 -9 -9 6 -9 -9 -9 2 16 81 0 1 1 1 -9 1 -9 1 -9 1 1 1 1 1 1 1 -9 -9 name
-9 1 145 1 233 -9 50 20 2 0 67 1 -9 -9 -9 -9 4 160 1 286 -9 40 40 0 -9 1 2 3 5 81 0 1 0 0 0 1 9.5 6 13 108 64 160 90 160 90 1 0 1.5 2 -9 185 3 -9 -9 -9 -9 -9 -9 3 -9 -9 -9 2 5 81 2 1 2 2 -9 2 -9 1 -9 1 1 1 1 1 1 1 -9 -9 name
The "name" is correctly at the end, but the actual data is messed up.
Could anyone help me to fix this code for my dataset?
Thanks!

I also tested your code with data and i got it to work just fine, just make sure on sheet 1 you have the data and you have empty sheet 2, then use the macro while sheet 1 is open. then your data is in sheet 2.

Related

How to return first item when the items in the pandas dataframe window are the same?

I am a python beginner.
I have the following pandas DataFrame, with only two columns; "Time" and "Input".
I want to loop over the "Input" column. Assuming we have a window size w= 3. (three consecutive values) such that for every selected window, we will check if all the items/elements within that window are 1's, then return the first item as 1 and change the remaining values to 0's.
index Time Input
0 11 0
1 22 0
2 33 0
3 44 1
4 55 1
5 66 1
6 77 0
7 88 0
8 99 0
9 1010 0
10 1111 1
11 1212 1
12 1313 1
13 1414 0
14 1515 0
My intended output is as follows
index Time Input What_I_got What_I_Want
0 11 0 0 0
1 22 0 0 0
2 33 0 0 0
3 44 1 1 1
4 55 1 1 0
5 66 1 1 0
6 77 1 1 1
7 88 1 0 0
8 99 1 0 0
9 1010 0 0 0
10 1111 1 1 1
11 1212 1 0 0
12 1313 1 0 0
13 1414 0 0 0
14 1515 0 0 0
What should I do to get the desired output? Am I missing something in my code?

import pandas as pd
import re
pd.Series(list(re.sub('111', '100', ''.join(df.Input.astype(str))))).astype(int)
Out[23]:
0 0
1 0
2 0
3 1
4 0
5 0
6 1
7 0
8 0
9 0
10 1
11 0
12 0
13 0
14 0
dtype: int32

how to delete the sample that don't have data for the whole period?

I have a dataset that looks like this.
sample day
1 -10
1 -9
. .
. .
. .
1 10
2 -10
3 -10
. .
. .
. .
3 10
I want only the sample with whole period from -10 to 10. In this case the sample 2 must be deleted. But the missing period for each sample is different some go from -10 to 0, some -10 to -8 (number of rows for each sample is varied). How should I write in pandas or excel to delete incomplete samples?

IIUC, you need to use a boolean expression, if the period is alwas -10 to 10 then the sum of these numbers should always be 0
print(df)
sample day
0 1 -10
0 1 -9
0 1 -8
0 1 -7
0 1 -6
0 1 -5
0 1 -4
0 1 -3
0 1 10
.......
1 2 4
1 2 5
df1 = df[df.groupby(['sample'])['day'].transform('sum').eq(0)]
print(df1)
sample day
0 1 -10
0 1 -9
0 1 -8
0 1 -7
0 1 -6
0 1 -5
0 1 -4
0 1 -3
0 1 -2
0 1 -1
0 1 0
0 1 1
0 1 2
0 1 3
0 1 4
0 1 5
0 1 6
0 1 7
0 1 8
0 1 9
0 1 10

Create dataframe column based on the progression values of another column?

I've the following dataframe:
car_id time(seconds) is_charging
1 1 65 1
2 1 70 1
3 1 67 1
4 1 71 1
5 1 120 0
6 1 124 0
7 1 117 0
8 1 80 1
9 1 74 1
10 1 62 1
11 1 130 0
12 1 124 0
I want to create new column to enumerate the charging and discharging periods of the 'is_charging' column so later on i can groupby that new column and compute means, max, min values, etc, of each period.
The resulting dataframe should be like this:
car_id time(seconds) is_charging periods_id
1 1 65 1 1
2 1 70 1 1
3 1 67 1 1
4 1 71 1 1
5 1 120 0 2
6 1 124 0 2
7 1 117 0 2
8 1 80 1 3
9 1 74 1 3
10 1 62 1 3
11 1 130 0 4
12 1 124 0 4
I've done this using for statment, like this:
df['periods_ids] = 0
period_id = 1
previous_charging_state = df.at[0,'is_charging']
def computePeriodIDs():
for ind in df.index:
if df.at[index, 'is_charging'] != previous_charging_state:
previous_charging_state = df.at[index, 'is_charging']
period_id = period_id + 1
df.at[index, 'periods_id'] = period_id
else:
df.at[index, 'periods_id'] = period_id
This is way too slow for the amount of rows that i have. I'm trying to use a vectorize function, especially the apply() one but due to my lack of understanding i haven't had much success and i can not find a similar problem online.
Can someone help me optimize this problem?

Try this:
df.is_charging.diff().ne(0).cumsum()
Out[115]:
1 1
2 1
3 1
4 1
5 2
6 2
7 2
8 3
9 3
10 3
11 4
12 4
Name: is_charging, dtype: int32

Incorrect logical indexing?

For the code:
dataset = pd.read_csv("/Users/Akshita/Desktop/EE660/donor_raw_data_medmean.csv", header=None, names=None)
# Separate data and label
X_label = dataset[1:19373][0]
X_data = dataset[1:19373]
print(X_data[X_label==1])
I get the output:(There are actually 4000~ samples with label=1)
0 1 2 3 4 5 6 7 8 9 ... 51 52 53 54 55 56 57 58 \
16386 1 17 60 0 1 0 0 0 0 1 ... 0 20 20 20 5 10 15 15
16396 1 137 60 0 1 0 0 0 0 1 ... 15 25 10 15 6 14 16 120
16399 1 89 54 0 1 0 0 0 0 1 ... 10 15 5 15 6 14 16 79
16402 1 89 75 0 1 0 0 0 0 1 ... 25 35 10 35 6 13 15 79
..
..
19356 1 101 80 1 0 0 1 0 0 2 ... 25 30 5 28 7 16 18 101
19363 1 65 70 1 0 0 1 0 0 1 ... 7 12 5 10 4 8 20 63
19372 1 29 70 0 0 0 1 0 0 2 ... 0 25 25 25 4 9 24 24
..
[859 rows x 61 columns]
and for
print(X_data[X_label==0])
I get the output:(There are about 15000~ samples with label=0)
0 1 2 3 4 5 6 7 8 9 ... 51 52 53 54 55 56 57 58 \
16384 0 17 74 0 1 0 0 0 0 1 ... 0 15 15 15 4 10 17 17
16385 0 17 60 0 1 0 0 0 0 2 ... 0 15 15 15 4 11 17 17
16387 0 29 67 0 1 0 0 0 0 1 ... 0 20 20 20 5 11 23 28
16388 0 53 60 0 1 0 0 0 0 1 ... 5 30 25 30 5 11 26 52
16389 0 65 49 0 1 0 0 0 0 1 ... 30 35 5 27 6 13 16 56
..
..
19369 0 137 77 1 0 1 0 0 0 1 ... 9 10 1 10 6 13 21 130
19370 0 29 60 1 0 0 1 0 0 1 ... 0 15 15 15 3 9 23 23
19371 0 129 78 1 0 0 1 0 0 2 ... 20 25 5 25 7 24 8 129
What can I be doing wrong?

Replace 0 with -9

My data contain 0 which I want to remove with -9, but not those data point which are like 220 or 120. How to do it? For example data are like:
M1 M2 M3 M4
120 0 125 0
0 123 123 0
123 0 0 123
to
M1 M2 M3 M4
120 -9 125 -9
-9 123 123 -9
123 -9 -9 123

You would search for " 0 " and replace with " -9 "

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

VBA code not working in Excel with dataset - excel

I also tested your code with data and i got it to work just fine, just make sure on sheet 1 you have the data and you have empty sheet 2, then use the macro while sheet 1 is open. then your data is in sheet 2.

Related

How to return first item when the items in the pandas dataframe window are the same?

how to delete the sample that don't have data for the whole period?

Create dataframe column based on the progression values of another column?

Incorrect logical indexing?

Replace 0 with -9

Categories

Resources