I am looking to merge about 15 different excel files to create one dataset. I know the variables in the coding are the same in each file. The problem is that the start rows for all of the data is inconsistent for each xls. Is there a way to use proc import and identify specific rows to import for each file?
Thanks!
Assuming you are using DBMS=EXCEL, you have the RANGE option available to you:
proc import file="myfile.xlsx" out=mydataset dbms=excel replace;
range="'Sheet1$A1:Z1000'";
run;
Obviously change Sheet1, A1, and Z1000 to match what you need.
This manual page contains further information on other DBMS options, including for DBMS=XLS.
Related
I am coming from java background and have minimal idea regarding python. I have to read an excel file and validate one of it's column values in the DB to verify that those rows exist in the DB or not.
I know the exact libraries and steps in java using which I can do this work.
But I am facing problems in choosing the ways to do this work in python.
till now I am able to identify some things which I can do.
Read excel file in python using python.
Use pyodbc to validate the values.
Can pandas help me to refine those steps. Rather doing things the hard way.
Yes pandas can help. But you phrase the question in a "please google this for me" way. Expect this question to be down-voted a lot.
I will give you the answer for the excel part. Surely you could have found this yourself with a little effort?
import pandas as pd
df = pd.read_excel('excel_file.xls')
Read the documentation.
Using xlrd module, one can retrieve information from a spreadsheet. For example, reading, writing or modifying the data can be done in Python. Also, a user might have to go through various sheets and retrieve data based on some criteria or modify some rows and columns and do a lot of work.
xlrd module is used to extract data from a spreadsheet.
# Reading an excel file using Python
import xlrd
# Give the location of the file
loc = ("path of file")
# To open Workbook
wb = xlrd.open_workbook(loc)
sheet = wb.sheet_by_index(0)
# For row 0 and column 0
sheet.cell_value(0, 0)
put open_workbook under try statement and use pyodbc.Error as exe in except to catch the error if there is any.
I'm looking for a way to automatically do a find and replace after I have imported data from a CSV file. The date data in my CSV file has a time stamp that I do not want to use. Yes, I can do it manually but would like to automate it if possible.
The data in the left column is what I want to change to match what is in the right column.
Sample Data
Thanks.
Part of my job is to pull a report weekly that lists patching information for around 75000 PCs. I have to filter some erroneous data, based on certain criteria, and then summarize this data myself and update it in a separate spreadsheet. I am comfortable with pivot tables / formulas, but it ends up taking a good couple of hours.
Is there a way to import data from a CSV file into a template that already has in place my formulas/settings, etc. if the data has the same columns, but a different amount of rows each time?
If you're confortable with programming, then, you can use macros, on this case, you will connect to your CSV file, then extract the information and put it in the corresponding places on your spreadsheet, on this question you can find most of what you need to start off: macro to Import csv file into an excel non active worksheet.
How can I save data from an Excel sheet to .RData file in R? I want to use one of the packages in R and to load my dataset as data(dataset) i think i have to save the data as .RData file and then load that into the package. My data currently is in an Excel spreadsheet.
my excel sheets has column names like x, y , time.lag.
I have saved it as .csv
then i use:
x=read.csv('filepath', header=T,)
then i say
data(x)
and it shows dataset 'x' not found
There are also several packages that allow directly reading from XLS and XLSX files. We've even had a question on that topic here and here for example. However you decide to read in the data, saving into an RData can be handled with save, save.image, saveRDS and probably some others I'm not thinking about.
save your Excel data as a .csv file and import it using read.csv() or read.table().
Help on each will explain the options.
For example, you have a file called myFile.xls, save it as myFile.csv.
library(BBMM)
# load an example dataset from BBMM
data(locations)
# from the BBMM help file
BBMM <- brownian.bridge(x=locations$x, y=locations$y, time.lag=locations$time.lag[-1], location.error=20, cell.size=50)
bbmm.summary(BBMM)
# output of summary(BBMM)
Brownian motion variance : 3003.392
Size of grid : 138552 cells
Grid cell size : 50
# subsitute locations for myData for your dataset that you have read form a myFile.csv file
myData <- read.csv(file='myFile.csv', header=TRUE)
head(myData) # will show the first 5 entries in you imported data
# use whatever you need from the BBMM package now ....
Check RODBC package. You can find an example in R Data Import/Export. You can query data from excel sheet as if from a database table.
The benefit of reading Excel sheet with RODBC is that you get dates (if you work with any) in a proper format. With intermediate CSV, you'd need to specify a column type, unless you want it to be a factor or string. Also you can query only a portion of your data if you need so thus making subset() unnecessary.
I have little to no experience in SAS. But what I would like to do is read in 2 excel spreadsheets into 2 separate temporary datasets.
The files names are C:\signature_recruit.xls and C:\acceptance_recruit.xls.
How do I accomplish this?
For simplicity, you will want your excel files to look like a SAS data set. That means that you should only have rows and columns of data. If desired, the first row can be the names of the columns(variables).
Now you can either write proc import code yourself to read the excel file, or you can use the Import wizard to click through the process. This has a helpful feature in that after you click though dialog, you can have it save a program that contains the proc import code that the wizard generated to read the excel file. You can then save and reuse this code if needed.
To start the import wizard, go to File->Import Data. The default option is to import an Excel file. Browse to the spreadsheet and answer the questions. Repeat for both spreadsheets.
With luck, this should be all you need to do to get the file into SAS. Here is a link to some more info and examples.
An alternative to cmjohns PROC IMPORT approach above is to use DDE. It's an older technology and is more difficult to use but it does provide greater flexibility for complex scenarios.
Plenty has been written on doing this. For example:
http://www.lexjansen.com/wuss/2010/DataPresentation/3015_4_DPR-Smith.pdf
Cheers
Rob