I'm trying to generate some reporting from existing reports in CSV format. These CSV files don't contain just data, but report name, report date, multiple data sets- each line doesn't necessarily contain the same number of fields or consistent data per row.
I was curious if there was some spreadsheet type library available, this is how I would imagine it to work.
load some csv file into spreadsheet
report_title = spreadsheet("A1")
report_date = spreadsheet("B2")
sales_data_spreadsheet = spreadsheet("A6:E22)")
sales_total = sales_data_spreadsheet("SUM(E1:E17)")
expenses_data_spreadsheet = spreadsheet("A26:E38")
expenses_total = expenses_data_spreadsheet("SUM(E1:E11")
Microsoft Excel?
You don't have spreadsheets, you have described flat files with mixed formats and some metadata. What in your flat files says that the sales data is in A6:E22? In fact what does A6:E22 mean outside the context of Microsoft Excel and in the context of your data?
There are lots of ways for you to handle this data from parsing it yourself long-hand and supplying code to manage the data formats to loading it into a set of database tables and using SQL to break it into pieces. Which you choose depends on what the data is, where it comes from and what you are going to do with it. If you provide a bit more of that sort of information a choice of approach may be easier to recommend.
I've authored such library in Python & Google Spreadsheets API. An interface of the library is not exactly the same as in your example, but it uses objects for sheets and cells representation.
The library's API is pretty straightforward even if you're new to a Python.
First, you have to upload your CSVs into Google Spreadsheets, and then you can access it:
# Load the module
import gspread
# Login with your Google account
gc = gspread.login('_your_google_account_email_','password')
# Open a spreadsheet and worksheet
wks = gc.open("name of the spreadsheet").sheet1
wks.update_acell('B2', "it's down there somewhere, let me take another look.")
# Select a range
sales_data = [float(c.value) for c in worksheet.range('E1:E17')]
# Sum it up
sales_total = sum(sales_data)
Alternatively you can import your CSV data into Excel file and use this wonderful Python library: xlrd
Related
In Visio I am creating an Org Chart, using the 'Import Organization Data', and using 'Information that's already stored in a file or database'. When I select my xlsx file, it pulls in all of the data. However, what if I wanted to only create an org chart out of a subset of the data? Currently I'm applying a filter to the data in Excel, copying the result to a new Excel file, and using that new file to import into Visio. A slightly less bad version of this would be if I could at least copy the filtered data into a different sheet in the same file, but the Visio Import doesn't even seem to let me select which sheet to use. This is very annoying - is there a better way?
Though I could never get Visio to ask me which table/sheet I wanted to use within a single file, I found what I consider an acceptable workaround using inspiration from #y4cine's suggestion.
I created separate "slice" xlsx files, where in each of those I used a Power Query against the data in the main xlsx file. Then I can point Visio to one of those slice files and it will happily make an org chart with the slice of data I was interested in.
A bit clunky, but it sure beats repeated copy/pasting :)
It is known that you could upload an Excel file in Visual Analyzer as a dataset and use that Excel file in Analysis as a separate Subject Area.
However, there was no way (or at least we couldn't find it) to make any connections between this Excel dataset and other subject areas, for example setting connectiong between Excel file's date column with OBIEE's Caledar.Day column, etc.
With new OAS, is there any update on this? Can we somehow make relationships between user-defined datasets and subject areas from rpd? Or is this feature not implemented?
Once you're on OAS you can create data sets which mash up any data source you want. Excel uploaded as a data set can be combined with other uploaded data sets, data sets created by data flows as well as Subject Areas. You have full freedom.
I believe only way to make relationship between data sources is to import them into repository of Analytics.
Maybe if you can import excel as data source into repository, you can manage to relate with other data sources. Here are some links :
https://datacadamia.com/dat/obiee/obis/obiee_excel_importation
https://www.ascentt.com/importing-excel-file-into-obiee-11g/
I hope these helps.
Hakan
How can I save data from an Excel sheet to .RData file in R? I want to use one of the packages in R and to load my dataset as data(dataset) i think i have to save the data as .RData file and then load that into the package. My data currently is in an Excel spreadsheet.
my excel sheets has column names like x, y , time.lag.
I have saved it as .csv
then i use:
x=read.csv('filepath', header=T,)
then i say
data(x)
and it shows dataset 'x' not found
There are also several packages that allow directly reading from XLS and XLSX files. We've even had a question on that topic here and here for example. However you decide to read in the data, saving into an RData can be handled with save, save.image, saveRDS and probably some others I'm not thinking about.
save your Excel data as a .csv file and import it using read.csv() or read.table().
Help on each will explain the options.
For example, you have a file called myFile.xls, save it as myFile.csv.
library(BBMM)
# load an example dataset from BBMM
data(locations)
# from the BBMM help file
BBMM <- brownian.bridge(x=locations$x, y=locations$y, time.lag=locations$time.lag[-1], location.error=20, cell.size=50)
bbmm.summary(BBMM)
# output of summary(BBMM)
Brownian motion variance : 3003.392
Size of grid : 138552 cells
Grid cell size : 50
# subsitute locations for myData for your dataset that you have read form a myFile.csv file
myData <- read.csv(file='myFile.csv', header=TRUE)
head(myData) # will show the first 5 entries in you imported data
# use whatever you need from the BBMM package now ....
Check RODBC package. You can find an example in R Data Import/Export. You can query data from excel sheet as if from a database table.
The benefit of reading Excel sheet with RODBC is that you get dates (if you work with any) in a proper format. With intermediate CSV, you'd need to specify a column type, unless you want it to be a factor or string. Also you can query only a portion of your data if you need so thus making subset() unnecessary.
I want to import data from Excel into corresponding tables based on different column data's on based on ID's like customer data on based on CustomerID present in Customer table.
Means we have to extract data from the table and Excel source on basis of ID's.
Could you please help me out on this?
Use the SQL Server Data Import Wizard - see an article on it here.
(source: databasedesign-resource.com)
This wizard allows you to define your Excel file to import, it allows you to define the target where to put the data, it allows you to define mappings between columns in Excel and columns in your SQL table, and much more.
Update: based on your comment to the other answer, if you need to import the Excel sheet and match it up to some pre-existing lookup data, then you should definitely look at the SQL Server Integration Services (SSIS) which are there exactly for this kind of import/lookup scenario.
Your question's gamma is a bit all over the place so not entirely sure what you are asking about but here goes.
You can save you excel spreadsheet as a CSV file and then import that into your database. There a number of tutorials on this if you search google. Try searching "import CSV into database".
I need to import tabular data into my database. The data is supplied via spreadsheets (mostly Excel files) from multiple parties. The format of each of these files is similar but not the same and various transformations will be necessary to massage the data into the final format suitable for import. Furthermore the input formats are likely to change in the future. I am looking for a tool that can be run and administered by regular users to transform the input files.
Now let me list some of the transformations I am looking to do:
swap columns:
Input is:
|Name|Category|Price|
|data|data |data |
Output is
|Name|Price|Category|
|data|data |data |
rename columns
Input is:
|PRODUCTNAME|CAT |PRICE|
|data |data|data |
Output is
|Name|Category|Price|
|data|data |data |
map columns according to a lookup table, like in the above examples:
replace every occurrence of the string "Car" by "automobile" in the column Category
basic maths:
multiply the price column by some factor
basic string manipulations
Lets say that the format of the Price column is "3 x $45", I would want to split that into two columns of amount and price
filtering of rows by value: exclude all rows containing the word "expensive"
etc.
I have the following requirements:
it can run on any of these platform: Windows, Mac, Linux
Open Source, Freeware, Shareware or commercial
the transformations need to be editable via a GUI
if the tool requires end user training to use that is not an issue
it can handle on the order of 1000-50000 rows
Basically I am looking for a graphical tool that will help the users normalize the data so it can be imported, without me having to write a bunch of adapters.
What tools do you use to solve this?
The simplest solution IMHO would be to use Excel itself - you'll get all the Excel built-in functions and macros for free.Have your transformation code in a macro that gets called via Excel controls (for the GUI aspect) on a spreadsheet. Find a way to insert that spreadsheet and macro in your client's Excel files. That way you don't need to worry about platform compatibility (it's their file, so they must be able to open it) and all the rest. The other requirements are met as well. The only training would be to show them how to enable macros.
The Mule Data Integrator will do all of this from a csv file. So you can export your spreadsheet to a CSV file, and load the CSV file ito the MDI. It can even load the data directly to the database. And the user can specify all of the transformations you requested. The MDI will work fine in non-Mule environments. You can find it here mulesoft.com (disclaimer, my company developed the transformation technology that this product is based on).
You didn't say which database you're importing into, or what tool you use. If you were using SQL Server, then I'd recommend using SQL Server Integration Services (SSIS) to manipulate the spreadsheets during the import process.
I tend to use MS Access as a pipeline between multiple data sources and destinations - but you're looking for something a little more automated. You can use macros and VB script with Access to help through a lot of the basics.
However, you're always going to have data consistency problems with users mis-interpreting how to normalize their information. Good luck!