If I want to run a code in python to select all features in a dataset exluding the target variable which is the last 2 columns and exclude the first column, what would the code be for that line. I tried to run the code below and got an error. The total number of columns in the dataframe is 38.
In short I want to define 'features' as every column except the 1st and the last 2 columns.
features = df_model.loc[:,0:38]
Any help would be appreciated.
Related
I try to append below 2 files' data. But the column "ytd" is appended in different column. And the columns reorder by alphabet. Would appreciate some guidance on how to keep the columns order as the original file and make the ytd column of the 2 files together. Many thanks
My code is as below:
dfaq=pd.read_csv('aq2.csv')
dfmg1=pd.read_csv('test10320_2.csv')
dfaq1=dfmg1.append(dfaq, ignore_index=True)
aq.csv
mg10320.csv
** incorrect Result**
I have a jumbled time series data as I have shown above in the first image. What I want to do is to check and flag is some of the keys do not change value over time.
Actual Data
First I tried splitting the data for every unique key as separate time series into different worksheets but I have over 17k unique keys so rather impossible to split them as individual worksheets and process them all due to ram issues.
To explain the situation better, for example as you can see for the key - 123abc in column A, I have different values in some number (column C) at different points in time(column B).(Image 2)
No static values
But in the case of key - 431iow the values in some number(column C) was same for 3 consecutive quarters. (Image 3)
Some static values
I would want to try and flag the rows which didn't change over last quarter, and leave others blank and add this flag variable as a new column.(Image 4)
Flags/Desired Outcome
And also make a consolidated file with unique entries from Column A with just flags as shown in Image 5.
Flag/Desired result
Can someone help me with this very specific issue?
Any help is appreciated.
This is my excel file:
Here I read the entire column A of the Excel sheet named "Lazy_eight" but the problem is that I have different sheets in which the column A has a different number of elements. So, I want to import only the numbers without specifing the length of the column vector.
I use the function readmatrix with the following syntax in order to read the entire column:
p_time = readmatrix('Input_signals.xlsx','Sheet','Lazy_eight','Range','A:A')
I get this in matlab workspace:
So, I wish to give to the "readmatrix" function only the first element of the column I want to import but I want that it stops at the last element, without specifing the coordinate of the last element in order to avoid the NaN that you can see in the last image. I want to import only the numbers without the NaN value.
I cannot read the initial and the last element (in this way: 'Range', 'A3: A13') beacuse in every sheet the column A (as the other ones) has a different number of elements.
I've solved the problem by using the “rmmissing” function, that removes “NaN” values from an array.
The issue is sorting an array that is generated automatically from an data source using a formula that extracts unique data points. (Data points are date/time)
The data is being extracted with this fomula.
=INDEX(Table_ExternalData_1[SampleDateTime],MATCH(0,INDEX(COUNTIF($G$2:G2,Table_ExternalData_1[SampleDateTime]),0,0),0))
Once extracted, the data is not sorted right away. The current data is extracted from a database via an SQL string that pulls in data corresponding to the data and time that the data point was created.
Because of this, the extracted points are not in the correct order. I am attempting to sort the extracted data points from earliest to latest to continue with the data sorting, but need the date/times to be sorted in a separate row.
I have attempted to use a pivot table, but it isn't exactly what I need and ends up being a messier end product than I need.
All assistance is appreciated.
Example is below.
1
2
3
5
1
2
3
4
6
5
3
I need this.
1
2
3
4
5
6
I did end up finding a solution that I will be able to modify. Using a single row of a pivot table, I took just the date/time column and had the PivotTable function sort the data to be utilized as necessary.
Thank you.
The fact that the range in the example you give:
1) Consists of entries of a numeric datatype only
2) Does not contain any blanks
means that the solution is relatively simple.
Assuming that data is in A1:A11, first use a single cell somewhere within the worksheet to count the number of expected returns. For example, using B1 for this purpose, enter this formula in that cell:
=SUM(IF(FREQUENCY(A1:A11,A1:A11),1))
Your main formula is then:
=IF(ROWS($1:1)>B$1,"",SMALL(IF(FREQUENCY(A$1:A$11,A$1:A$11),A$1:A$11),ROWS($1:1)))
the latter being copied down until you start to get blanks for the results.
Regards
I have a spreadsheet with six columns, and I want to remove duplicates for each line which contains values within columns 3 and 4 that match the values of 3 and 4 on another line. For example, these two lines would need to be deduped:
alpha.txt, beta.txt, 03/12/15, exit, gamma.txt, bravo.txt
gamma.txt, bravo.txt, 03/12/15, exit, alpha.txt, beta.txt
Since columns 3 and 4 match, I want these to be deduped. I tried using the Data > Remove Duplicates feature within Excel (selecting the entire table but removing duplicates only by columns 3 and 4) but it fails to remove all of the duplicates.
Does anyone have an alternative method in Excel or perhaps via sort or some other Linux utility?