Specific column import form text file - excel

I have many test files. I want to import only one column from each test file, and write it to an an Excel file. How do I do this?
For example, text file 1 (1).text and (2).text file.
file 1(1).txt column A,B,C,D
file 1(2).txt column A1,B1,C1,D1
I would like to get
A.exls == B B1

Related

Find and replace function in Alteryx -How it can be done in Azure Data Flow

I have a "Find and replace " tool in Alteryx which finds the Col value of csv file1 and replaces it with the look up csv file2 which has 2 columns like
Word and ReplacementWord.
Example :
Address is a col in Csv file1 which has value like St.Xyz,NY,100067
And Csv file 2 has
Word ReplacementWord
NY NewYork
ZBW Zimbawe etc....
Now the final Output should be
Address
St.Xyz,NewYork,100067
Please help guys .
Hey here's the problem .I have a "Find and replace " tool in Alteryx which finds the Col value of csv file1 and replaces it with the look up csv file2 which has 2 columns like
Word and ReplacementWord.
Example :
Address is a col in Csv file1 which has value like St.Xyz,NY,100067
And Csv file 2 has
Word ReplacementWord
NY NewYork
ZBW Zimbawe etc....
Now the final Output should be
Address
St.Xyz,NewYork,100067
Please help guys .
I tried to reproduce your scenario in my environment to achieve the desired output I Followed below steps:
In dataflow activity I took 2 Sources
Source 1 is the file which contain the actual address.
Source 2 is the file which contain the country codes with names.
After that I took lookup to merge files based on the country code. In lookup condition I provided split(Address,',')[2] to split the address string with comma and get the 2nd value from it Which will be the country code based on this : Xyz,NY,100067 and column_1 of 2nd source.
Lookup data preview:
Now took Derived Column and gave column name as Address with the expression replace(Address, split(Address,',')[2], Column_2) It will replace the What we split in lookup from Address string to value of Column_2
Derived column preview:
then took select and deleted the unwanted columns
Select Preview:
now providing this to sink dataset
Output

How to exclude Column in R when using prcomp

I would like to exclude some columns when running prcomp.
for example, my csv File has columns A to I. but for prcomp, I would only like to have columns B to H.
Column A has no column heading it is just a bunch of ID numbers that are not part of the datafame.
I've attached a pic of what the csv file looks like and of what the dataframe looks like in R
I'm using this command. is it correct?
pca.mydata <- prcomp(mydata[, 2 : 8],
scale= TRUE)
CSV File
pic of R studio

Schema for the csv file

I have one text file having one column,
I have another csv files which is having data.
I need to print the schema from text file and merge this with csv file.
Is this automatically possible without using 'StructType' or using 'Case Class',like it just reads the text file,copy the whole column and transpose and paste it as 1st row for that CSV file.
Text file
Column Header
Name
Age
Roll Number
Section
CSV File
Fred 25 123 A
Eyaz 26 456 B
O/P
Name Age Roll_Number Section
Fred 25 123 A
Eyaz 26 456 B
Any help would be highly appreciated.
Thanks for the time!
val dd= y.collect()
val schema=StructType( dd.map(fieldName => StructField(fieldName,StringType,nullable=true)))
println(schema)

How to select a column from a text file which has no header using python

I have a text file which is tabulated. When I open the file in python using pandas, it shows me that the file contains only one column but there are many columns in it. I've tried using pd.DataFrames, sep= '\s*', sep= '\t', but I can't select the column since there is only one column. I've even tried specifying the header but the header moves to the exterior right side and specifies the whole file as one column only. I've also tried .loc method and mentioned specific column number but it always returns rows. I want to select the first column (A, A), third column (HIS, PRO) and fourth column (0, 0).
I want to get the above mentioned specific columns and print it in a CSV file.
Here is the code I have used along with some file components.
1) After opening the file using pd:
[599 rows x 1 columns]
2) The file format:
pdb_id: 1IHV
0 radii_filename: MD_threshold: 4
1 A 20 HIS 0 MaximumDistance
2 A 21 PRO 0 MaximumDistance
3 A 22 THR 0 MaximumDistance
Any help will be highly appreciated.
3) code:
import pandas as pd
df= pd.read_table("file_path.txt", sep= '\t')
U= df.loc[:][2:4]
Any help will be highly appreciated.
If anybody gets any file like this, it can be opened and the column can be selected using the following codes:
f=open('file.txt',"r")
lines=f.readlines()
result=[]
for x in lines:
result.append(x.split(' ')[range])
for w in result:
s='\t'.join(w)
print(s)
Where range is the column you want to select.

Python Pandas check cells for a range of numbers copy or skip if not there

I would use pandas isin or iloc functions but the excel format is complex and there are sometimes data followed by cols of no info, and the main pool of entries are cols with up to 3 pieces of data in a cell with only a '|' to separate them. Some of the cells are missing a number and I want to skip those but copy the ones with them.
Above is my current code. I have a giant excel with thousands of entries and worse, the column/rows are not neat. There are several pieces of data in each column cell per row. What I've noticed is that a number called 'tail #' is missing in some of them. What I want to do is search for that number, if it has it then copy that cell, if it does not then go to the next column in the row. Then repeat that for all cells. There is a giant header, but when I transformed it into CSV, I removed that with formatting. This is also why I am looking for a number because there are several headers. for example, years that say like 2010 but then several empty columns till the next one maybe 10 cols later. Also please not that under this header of years are several columns of data per row that are separated by two columns with no info. Also, the info in a column looks like this, '13|something something|some more words'. If it has a number as you see, I want to copy it. The numbers seem to range from 0 to no greater than 30. Lastly, I'm trying to write this using pandas but I may need a more manual way to do things because using isin, and iloc was not working.
import pandas as pd
from pandas import ExcelWriter
from pandas import ExcelFile
import os.path as op
from openpyxl import workbook
import re
def extract_export_columns(df, list_of_columns, file_path):
column_df = df[list_of_columns]
column_df.to_csv(file_path, index=False, sep="|")
#Orrginal file
input_base_path = 'C:/Users/somedoc input'
main_df_data_file = pd.read_csv(op.join (input_base_path, 'som_excel_doc.csv '))
#Filter for tailnumbers
tail_numbers = main_df_data_file['abcde'] <= 30
main_df_data_file[tail_abcd]
#iterate over list
#number_filter = main_df_data_file.Updated.isin(["15"])
#main_df_data_file[number_filter]
#print(number_filter)
#for row in main_df_data_file.values:
#for value in row:
# print(value)
#print(row)
# to check the condition
# Product of code
output_base_path = r'C:\Users\some_doc output'
extract_export_columns(main_df_data_file,
['Updated 28 Feb 18 Tail #'],
op.join(output_base_path, 'UBC_example3.txt'))
The code I have loads into csv, and successfully creates a text file. I want to build the body function to scan an excel/csv file to copy and paste to a text file data that contains a number.
https://drive.google.com/file/d/1stXxgqBeo_sGksVYL9HHdn2IflFL_bb8/view?usp=sharing

Resources