I would like to exclude some columns when running prcomp.
for example, my csv File has columns A to I. but for prcomp, I would only like to have columns B to H.
Column A has no column heading it is just a bunch of ID numbers that are not part of the datafame.
I've attached a pic of what the csv file looks like and of what the dataframe looks like in R
I'm using this command. is it correct?
pca.mydata <- prcomp(mydata[, 2 : 8],
scale= TRUE)
CSV File
pic of R studio
Related
I have a "Find and replace " tool in Alteryx which finds the Col value of csv file1 and replaces it with the look up csv file2 which has 2 columns like
Word and ReplacementWord.
Example :
Address is a col in Csv file1 which has value like St.Xyz,NY,100067
And Csv file 2 has
Word ReplacementWord
NY NewYork
ZBW Zimbawe etc....
Now the final Output should be
Address
St.Xyz,NewYork,100067
Please help guys .
Hey here's the problem .I have a "Find and replace " tool in Alteryx which finds the Col value of csv file1 and replaces it with the look up csv file2 which has 2 columns like
Word and ReplacementWord.
Example :
Address is a col in Csv file1 which has value like St.Xyz,NY,100067
And Csv file 2 has
Word ReplacementWord
NY NewYork
ZBW Zimbawe etc....
Now the final Output should be
Address
St.Xyz,NewYork,100067
Please help guys .
I tried to reproduce your scenario in my environment to achieve the desired output I Followed below steps:
In dataflow activity I took 2 Sources
Source 1 is the file which contain the actual address.
Source 2 is the file which contain the country codes with names.
After that I took lookup to merge files based on the country code. In lookup condition I provided split(Address,',')[2] to split the address string with comma and get the 2nd value from it Which will be the country code based on this : Xyz,NY,100067 and column_1 of 2nd source.
Lookup data preview:
Now took Derived Column and gave column name as Address with the expression replace(Address, split(Address,',')[2], Column_2) It will replace the What we split in lookup from Address string to value of Column_2
Derived column preview:
then took select and deleted the unwanted columns
Select Preview:
now providing this to sink dataset
Output
I have an excel as shown below:
Input File
Now I want to filter fruits first from "Items" column and check which one in list of "list" column is not present in the list. For example: here "grapes" is not present in "Name" column. So I want grapes as output in next column as shown below.
Expected Output Shown
The same is to be done for many by filtering each items one by one as I have many items.
Please suggest or give some hints so that i can start this code.
I am naming the excel as Book1
import pandas as pd
frame = pd.read_excel("Book1.xlsx")
frame_list_as_String = frame.list.tolist()
frame_list = [x.split(',') for x in frame_list_as_String]
frame_Name = frame.Name.tolist()
frame_col3=[]
for item in frame_list :
frame_col3.append(list(set(items)-set(frame_Name)))
frame["col3"]=frame_col3
frame.to_excel("df.xlsx", index = False)
I have a text file which is tabulated. When I open the file in python using pandas, it shows me that the file contains only one column but there are many columns in it. I've tried using pd.DataFrames, sep= '\s*', sep= '\t', but I can't select the column since there is only one column. I've even tried specifying the header but the header moves to the exterior right side and specifies the whole file as one column only. I've also tried .loc method and mentioned specific column number but it always returns rows. I want to select the first column (A, A), third column (HIS, PRO) and fourth column (0, 0).
I want to get the above mentioned specific columns and print it in a CSV file.
Here is the code I have used along with some file components.
1) After opening the file using pd:
[599 rows x 1 columns]
2) The file format:
pdb_id: 1IHV
0 radii_filename: MD_threshold: 4
1 A 20 HIS 0 MaximumDistance
2 A 21 PRO 0 MaximumDistance
3 A 22 THR 0 MaximumDistance
Any help will be highly appreciated.
3) code:
import pandas as pd
df= pd.read_table("file_path.txt", sep= '\t')
U= df.loc[:][2:4]
Any help will be highly appreciated.
If anybody gets any file like this, it can be opened and the column can be selected using the following codes:
f=open('file.txt',"r")
lines=f.readlines()
result=[]
for x in lines:
result.append(x.split(' ')[range])
for w in result:
s='\t'.join(w)
print(s)
Where range is the column you want to select.
I have many test files. I want to import only one column from each test file, and write it to an an Excel file. How do I do this?
For example, text file 1 (1).text and (2).text file.
file 1(1).txt column A,B,C,D
file 1(2).txt column A1,B1,C1,D1
I would like to get
A.exls == B B1
The data is imported as .mat from .csv by using xlsread, and looks like this:
0203.ENG
0412.DXY
....
How to return the row and column of '0412.DXY'?
Thanks!
I assume you've got a cell array reading raw data by XLSREAD.
[num,txt,raw] = xlsread(filename);
Then you can find row and column of a particular string with
[r, c] = find(strcmp(raw,'0412.DXY'));