file Columns not append correctly - python-3.x

I try to append below 2 files' data. But the column "ytd" is appended in different column. And the columns reorder by alphabet. Would appreciate some guidance on how to keep the columns order as the original file and make the ytd column of the 2 files together. Many thanks
My code is as below:
dfaq=pd.read_csv('aq2.csv')
dfmg1=pd.read_csv('test10320_2.csv')
dfaq1=dfmg1.append(dfaq, ignore_index=True)
aq.csv
mg10320.csv
** incorrect Result**

Related

How to rename one column(based on given position) if there are duplicate column names

I have a dataframe with 2 duplicate columns. I need to rename one of the column based on the position given in a configuration file. Using rename() is not working because it is renaming both the columns, even though I am passing the position of the column which needs to be renamed. Could you please suggest how to achieve this?
I want the logic to be generic as I have multiple files and each file has duplicates but different column names which are mentioned in the configuration file.
columns - state city country state
config file -
position HEADER
0 statename
Df.rename(columns = {Df.columns[position]:row['HEADER']}, inplace = True)
It is not working because both the columns are renamed even when position is passed.
IIUC you can convert the names of the column to a numpy array with to_numpy and alter the column names there. This will also the change the column name in the specified position without reassigning the new column names. You can also insert multiple positions with to_numpy unlike to_list:
Df.columns.to_numpy()[position] = row['HEADER']

How to remove duplicates rows from CSV file using Node.js

I have a csb files with 2,000,000 entries. Some of rows are duplicated rows which I wanted to find and remove it.
Logic
The data will be like in the below format
column 1...column 2 ...column 2 ...column 4
XXX ...123 ...abc ...a
YYY ...456 ...def ...b
ZZZ ...123 ...abc ...c
ZZZ ...123 ...abc ...d
So, I wanted to find out duplicates without considering last column. So as per the above table, last 2 entries will be duplicate entry.
In these duplicates, I wanted to remove the first entry and retain the 2nd one.
Please suggest which method will better achieve this solution.

Feature Selection to add specific coumns

If I want to run a code in python to select all features in a dataset exluding the target variable which is the last 2 columns and exclude the first column, what would the code be for that line. I tried to run the code below and got an error. The total number of columns in the dataframe is 38.
In short I want to define 'features' as every column except the 1st and the last 2 columns.
features = df_model.loc[:,0:38]
Any help would be appreciated.

Combine multiple files csv into one using awk

I want to combine two .csv files based on the unique id that exists in both files.
First file consist of 17 columns and the second one in 2 columns where in both files the first column is the same unique id.
In the to be created file 3 i would like 18 columns.
I have been trying paste
paste -d ' ' SPOOL1.csv SPOOL2.csv > MERGED.csv
but that of course does not take the unique columns into consideration.
Not proficient in awk so all help is appreciated.
Thanks
sounds like if the files are sorted then
join SPOOL1 SPOOL2 > MERGED
should get you closer if you deal with the delimiters not shown

Specify rows to read in from excel using pd.read_excel

I have a large excel file and want to select specific rows (not continuous block) and columns to read in. With columns this is easy, is there a way to do this for rows? or do I need to read in everything and then delete all the rows I don't want?
Consider an excel file with structure
,CW18r4_aer_7,,,,,,,
,Vegetation ,,,,,,,
Date,A1,A2,B1,B2,C1,C2,C3,C4
1/7/86,3.80,8.02,7.94,9.81,9.82,4.19,3.88,0.87
2/7/86,0.50,2.02,5.26,3.70,8.59,8.61,9.86,3.27
3/7/86,4.75,3.88,0.46,5.95,9.45,9.62,4.33,1.63
4/7/86,7.64,6.93,2.71,9.96,1.25,0.35,1.84,1.02
5/7/86,3.33,8.24,7.36,7.86,0.43,2.32,2.18,1.91
6/7/86,1.96,1.78,7.45,2.28,5.27,9.94,0.22,2.94
7/7/86,4.67,8.41,1.49,5.48,5.46,1.39,1.85,7.71
8/7/86,8.07,5.60,4.23,3.93,3.92,9.09,9.90,2.15
9/7/86,7.00,5.16,6.10,8.86,7.18,9.42,8.78,5.42
10/7/86,7.53,9.81,3.33,1.50,9.45,6.96,5.41,5.25
11/7/86,0.95,3.84,3.52,5.94,8.77,1.94,5.69,8.62
12/7/86,2.94,3.07,5.13,8.10,6.52,9.93,5.85,3.91
13/7/86,9.33,7.03,5.80,2.45,2.86,7.32,5.00,0.17
14/7/86,7.39,4.85,9.15,2.23,1.70,9.42,2.72,9.32
15/7/86,3.38,4.67,6.63,2.12,5.09,7.71,0.99,9.72
16/7/86,9.85,6.68,3.09,5.05,0.34,5.44,5.99,6.19
I want to take the headers from row 3 and then read in some of the rows and columns.
import pandas as pd
df = pd.read_excel("filename.xlsx", skiprows = 2, usecols = "A:C,F:I", userows = "4:6,13,17:19")
Importantly, this is not a block that can be described by say [A3:C10] or the like.
The userows option does not exist. I know I can skip rows at the top, and at the bottom - so presumably can make lots of data frames and knit them together. But is there a simple way to just read in what you need once? My workaround is to just create lots of excel spreadsheets that just have what I need for different data frames, but this leaves things very open to me making a mistake I can't find.

Resources