I am totally new to Stata and am wondering how to import .xlsx data in Stata. Let's say the data is in the subdirectory Data and has name "a b c.xlsx". So, from working directory, the data is in /Data
I am trying to do
import excel using "\Data\a b c.xlsx", sheet("a")
but it's not working
it's not working
is anything but a useful error report. For future questions, please report the exact error given by Stata.
Let's say the file is in the directory /home/roberto then
clear
set more off
import excel using "/home/roberto/a b c.xlsx"
list
should work.
If you are already in /home/roberto (which you can verify using display c(pwd)), then
import excel using "a b c.xlsx"
should work.
Using backslashes to refer to directories is not encouraged. See Stata tip 65: Beware the backstabbing backslash, by Nick Cox.
See also help cd.
Related
I frequent a real estate website that shows recent transactions, from which I will download data to parse within a Pandas dataframe. Everything about this dataset remains identical every time I download it (regarding the column names, that is).
The name of the Excel output may change, though. For example, if I already have download a few of these in my Downloads folder, the file that's exported may read "Generic_File_(3)" or "Generic_File_(21)" if I already have a few older "Generic_File" exports in that folder from a previous export.
Ideally, I'd like my workflow to look like this: export this Excel file of real estate sales, then run a Python script to read in the most recent export as a Pandas dataframe. The catch is, I don't want to have to go in and change the filename in the script to match the appending number of the Excel export everytime. I want the pd.read_excel method to simply read the "Generic_File" that is appended with the largest number (which will obviously correspond to the most rent export).
I suppose I could always just delete old exports out of my Downloads folder so the newest, freshest export is always named the same ("Generic_File", in this case), but I'm looking for a way to ensure I don't have to do this. Are wildcards the best path forward, or is there some other method to always read in the most recently downloaded Excel file from my Downloads folder?
I would use the OS package and create a method to read to file names in the downloads folder. Parsing string filenames you could then find the file following your specified format with the highest copy number. Something like the following might help you get started.
import os
downloads = os.listdir('C:/Users/[username here]/Downloads/')
is_file = [True if '.' in item else False for item in downloads]
files = [item for keep, item in zip(is_file, downloads) if keep]
** INSERT CODE HERE TO IDENTIFY THE FILE OF INTEREST **
Regex might be the best way to find matches if you have a diverse listing of files in your downloads folder.
I'm trying to display data from Python in Excel. Ideally, a pandas dataframe's worth of data would appear in a new, unsaved excel instance. My search has turned up ways to create excel files, and lots of ways to 'open' an excel file to read data from it, but no way to display it. My current approach was to create a file and then figure out how to open it, but I consider that approach second-best.
I found this. Another way would be to export your data to CSV and import it in Excel.
I found you can 'run' files with the OS library. As long as your computer knows what to do with it, you can create the xlsx file with whatever method and then run it to display:
import xlsxwriter
import os
w = xlsxwriter.Workbook(r"C:\Temp\test.xlsx")
s=w.add_worksheet("Sheet1")
s.write(1,3,"7")
w.close()
os.startfile(r"C:\Temp\test.xlsx")
Still not sure if you can work with an unsaved open instance of excel
Using an existing SSIS package, I was trying to import .xlsx files we received from a client. I received the error message:
External table is not in the expected format
These files will open in XL
When I use XL (currently XL2010) to Save As... the file without making any changes:
The new file imports just fine
The new file is 330% the size of the original file
When changing .xlsx to .zip and investigating the contents with WinZip:
The original file only has 4 .xml files and a _rels folder (with 2 .rels files):
The new file has the expected .xlsx contents:
Does anyone know what kind of file this could be?
It would be nice to develop my SSIS package to work with these original files, without having to open and re-save each file. There are only 12 files, so if there are no other options, opening/saving each file is not that big of deal...and I could automate it with VBA going forward.
Thanks for any help anyone can provide,
CTB
There are many Excel file formats.
The file you are trying to import may have another excel format but the extension is changed to .xlsx (it could be edited by someone else) , or it could be created with a different Excel version.
There is a Third-Part application called TridNet File Identifier which is an utility designed to identify file types from their binary signatures. you can use it to specify the real extension of the specified file.
Also after a simple search on External table is not in the expected format this error is thrown when the definition (or version) of the excel files supported in the connection string is different from the file selected. Check the connection string used in the excel connection manager. It might help to identify the version of the file.
I am trying to import a tab of a spreadsheet into SAS. I am using the code below.
proc import
datafile="[directories....]\4. Media\2. Campaign Info\Tracking_Sheet_JA.xlsx"
out=test1
dbms=xlsx replace;
sheet="db_data";
run;
and get the error message
ERROR: XLSX file does not exist -> [directories....]\4. Media\2.
Campaign Info\Tracking_Sheet_JA.xlsx
I thought it might be something do do with the spaces or . in the directory, however when i pick another random file from that directory and import using exactly the same code as above it works fine. For Example
proc import
datafile="[directories....]\4. Media\2. Campaign Info\9feb.xlsx"
out=test1
dbms=xlsx replace;
sheet="uk";
run;
I am certain the file extensions and tab references are correct. Could there be something to do with the settings of my excel file that would cause this?
Thanks.
There could be a few issues.. 1: Someone else could have the file open (shared-drive.. I assume you are at work?). 2: You could have the file open while trying to import it into SAS. 3: The file name could simply be wrong. In Excel, is your file saved with the underscores? 4: Excel permissions can be tricky at work.. but if other excel files are importing, then I don't think there should be an issue, unless maybe someone has locked the file? I'm not too sure about permissions stuff.
I am trying to use the the Utilities > Member Import Utility to create an XML file that I can then use to import member data.
I have about seventy members to import. I was able to work through the mapping with what appeared to be a good match, but when I click the button, I get the following error:
Line does not match structure
I am using a .csv file to bring the data and I have selected comma as the deliminator. I can map the fields but when I click Create XML I get the Parse error.
Any suggestions on how to resolve this?
Thanks.
I found the answer. I appears to automatically understand that it is relative. When I simply put the name of the file in it went in with error.
So the correct path is: customer.txt
However, because the username is a number and not alpha numeric it cannot be imported.