How can I recover "raw" data from an excel spreadsheet? - excel

I have the following problem:
Part number: 625009E11
Excel rep: 6.25009E+16
I want to recover the original information. Is this possible? Or, does Excel automatically dump the original data if it can format things as a number? (I also have another, similar, problem with leading zeroes.)

You might be able to recover part numbers with some clever scripting, but for leading zeroes you're pretty much screwed. Excel "helpfully" tries to cast strings it recognizes as other data types and does not keep a copy of the original. This has been a known problem for almost a decade, especially in bio research: http://www.biomedcentral.com/1471-2105/5/80
Don't use Excel as a database, kids.

Related

Excel to PSPP without with hassle with variable names

I have some data in excel file. Now I need to find their significance value which is not possible with excel. It is only possible with PSPP. But when I import my excel file (after converting to csv file) to pspp it is making hell lot of problems specially with variable names.
Could anyone please tell me some easy solutions?
Excel to PSPP: Make sure your excel file is prepared:
Red boxes in the import-process of the cvs-file indicate problems in the rows, like words added in what otherwise looks like a variable with numeric values.
The variable names must not be too long so fix that first, it works like a very old version of SPSS in this respect.
Then look carefully at the steps when importing.
Choose the second row to be the first (as the first row is variable names, and not a case).
Then click the box for choosing the top row as variable names.
It is less smooth than SPSS in the initial procedure, but so fantastic with a free alternative to the SPSS.
When the dataset is ready, I found it worked well to do the same analyzes as in SPSS.
I suppose your problem is solved a long time ago, but maybe someone else may benefit from this.
A site for sharing information about the PSPP would be great...
One should first convert excel file into CSV (maybe through Apple Mac software Number) and then import the converted into PSPP software...easy

print to Excel file instead of pdf in SAP

I have a transaction in SAP - ZHR_TM01 (possibly built by our IT department) that prints the timesheets of our employees that are swiping a card.
I need all this data in excel format but the problem is that the only option I know is to type "PDF!" in the command bar when I'm on the print preview menu of the timesheet, so it will convert all selected timesheets to pdf format. In order to have this data in excel format i need to use acrobat converter. This option is somewhat unprofessional and working with the sheet becomes very "convert dependent" because every time I use this method the conversion is slightly different compared to previous conversions: the columns/rows are not consistent etc.
What I ask is is there a way to directly retrieve the data in some readable consistent format since it is obvious that the data exists.
If there is a analogous command like the PDF! to convert to excel format or any other?
It will help me big time.
Thanks!!
If the function code PDF! works, the printout is most likely implemented using a Smart Form. In this case, it should be possible to create an alternative download function, e. g. SALV. I'd recommend contacting the person who originally developed the transaction to get an estimate - I'm not qualified to get into the details of HR...
See if you can convert to a .csv or .txt file. Once you have it in either of those formats you should be able to import them into Excel and delimit the columns with greater accuracy.

Excel - Variable number of leading zeros in variable length numbers?

The format of our member numbers has changed several times over the years, such that 00008, 9538, 746, 0746, 00746, 100125, and various other permutations are valid, unique and need to be retained. Exporting from our database into the custom Excel template needed for a mass update strips the leading zeros, such that 00746 and 0746 are all truncated to 746.
Inserting the apostrophe trick, or formatting as text, does not work in our case, since the data seems to be already altered by the time we open it in Excel. Formatting as zip won't work since we have valid numbers less than five digits in length that cannot have zeros added to them. And I am not having any luck with "custom" formatting as that seems to require either adding the same number of leading zeros to a number, or adding enough zeros to every number to make them all the same length.
Any clues? I wish there was some way to set Excel to just take what it's given and leave it alone, but that does not seem to be the case! I would appreciate any suggestions or advice. Thank you all very much in advance!
UPDATE - thanks everybody for your help! Here are some more specifics. We are using a 3rd party membership management app -- we cannot access the database directly, we need to use their "query builder" tool to get the data we want to mass update. Then we export using their "template" format, which is called XLSX but there must be something going on behind the scenes, because if we try to import a regular old Excel, we get an error. Only their template works.
The data is formatted okay in the database, because all of the numbers show correctly in the web-based management tool. Also, if I export to CSV, save it as a .txt and import it into Excel, the numbers show fine.
What I have done is similar to ooo's explanation below -- I exported the template with the incorrect numbers, then exported as CSV/txt, and copied / pasted THOSE numbers into the template and re-imported. I did not get an error, which is something I guess, but I will not be able to find out if it was successful until after midnight! :-(
Assuming the data is not corrupt in the database, then try and export from the database to a csv or text file.
The following can then be done to ensure the import is formatted correctly
Text file with comma delimiter:
In Excel Data/From text and selected Delimited, then next
In step 3 of the import wizard. For each column/field you want as text, highlight the column and select Text
The data should then be placed as text and retain leading zeros.
Again, all of this assumes the database contains non-corrupt data and you are able to export a simple text or csv file. It also assumes you have Excel 2010 but it can be done with minor variation across all versions.
Hopefully, #ooo's answer works for you. I'm providing another answer mainly for informational purposes, and don't feel like dealing with the constraints on comments.
One thing to understand is that Excel is very aggressive about treating "numeric-looking" data as actual numbers. If you were to open the CSV by double-clicking and letting Excel do its thing (rather than using ooo's careful procedure), those numbers would still have come up as numbers (no leading zeros). As you've found, one way to counteract this is to append clearly nonnumeric characters onto your data (before Excel gets its grubby hands on it), to really convince Excel that what it's dealing with is text.
Now, if the thing that uploads to their software is a file ending in .xlsx, then most likely it is the current Excel format (a compressed XML document, used by Excel 2007 and later). I suppose by "regular old Excel" you mean .xls (which still works with the newer Excels in "compatibility mode").
So in case what you've tried so far doesn't work, there are still avenues to explore before resorting to appending characters to the end of your data. (I'll update this answer as needed.)
You're on the right track with the apostrophe.
You'll need to store your numbers in excel as text at the time they are added to the file.
What are you using to create the original excel file / export from database?
This will likely be where your focus needs to be regarding your export.
For example one approach is that you could potentially modify the database export to include the ' symbol prefix before the numbers so that excel will know to display them as text.
I use the formula =text(cell,"# of zeros of the field") to add preceding zeros.
Example, Cell C2 has 12345 and I need it to be 10 characters long. I would put =text(c2,"0000000000").
The result will be 0000012345.

Parsing CSV with commas in fields

I have a csv file with commas inside of fields that are non-enclosed. I unfortunately must parse this file and cannot get it replaced with a properly formatted one.
I really don't even know where to begin.
OK. What I'm seeing is the following: You have about 8,000 rows that essentially have a CSV syntax error in them. You can manually figure out which they are, but manually fixing 8,000 entries is a bit much.
The obvious first approach would be to try to see how it is that you can manually figure out which columns have this issue. If it is something you can define rules for, you are in business. If its simple enough, you can write a small text editor macro to go through the file and do it for you. If your text editor doesn't support macros. Use awk. If you are on Windows and don't have awk, then go get it.
If it is too complicated for that, fix your real problem. Go fix whatever generated this CSV file to generate it right. If it was someone else's code you don't have access to, tell them to fix it. "You are generating 8,000 unparsable entries" seems like a pretty good argument in my book. Sooner or later they will probably generate a new revision of this file for you to process, so this is really the Right Thing to do.
There's probably nothing you can do with it short of analyzing the records manually in a text editor. The comma delimiters are essentially useless if there is no discernable way to distinguish them from valid commas in the data.
If you can get a cleaner file from whoever created the bad one, that's probably far less trouble than trying to fix up the one you've got.
You could run an excel macro to reformat the comma's to some other character (let's say $, something not in your file) for the time being, then once you've parsed the file you could run the results through some code to reformat the character back into the original commas.
EDIT: I am assuming that you have access to the original file seeing as you've tagged excel here?
I think the best you can hope for is 80% automatic, which means you'll be doing over 1,000 manually best case. You just need to be clever about the data that's there. Read each line in and count the commas. If it's the right amount, write it out to a new file. If it's too many, send it to the exception handler.
Start with what you absolutely know about the data. Is the first column a TimeStamp? If you know that, you can go from "20 commas when there should be 18" to "19 commas when there should be 17". I know that doesn't exactly lift your spirits but it's progress. Is there a location, like a plant name, somewhere in there? Maybe you can develop a list from the good data and search for it in the bad data. If column 7 should be the plant name, go through your list of plant names and see if one of them exists. If so, count the commas between that and the start and between that and the end (or another good comma location that you've established).
If you have some unique data, you can regex to find it's location in the string and again, count commas before and after to see if it's where it should be. Like if you have a Lat/Long reading or a part number that's in the format 99A99-999.
If you can post five or ten rows of good data, maybe someone can suggest more specific ways to identify columns and their locations.
Good luck.

Convert richtext strings to excel

I have a form that has TinyMCE for richtext formatting. All of our data is available to export as an HTML report, PDF Report, and Excel Spreadsheet (report).
The fields, that we allow richtext in, show up as the formatted values in both the HTML and PDF reports, but in Excel we show them as strings. For instance:
<b>this part is bold</b><br />line 2 here.
I need a way to make that show up as bold/line-break in excel rather then just showing that string, or at least a way to strip the HTML tags out of there and just show plain text (though I would really like to at least keep the line breaks). Is there some type of macro I can include in the excel download or some C++ program that can convert it or something?
Thanks for your time!
I've done something similar with PHPExcel
The trick is to take your formatted data and find a pattern. In your case, it would probably be table rows/table cells. Iterate through that structure setting the excel cell values as you go. For complex formatting you could fairly simply regex replace what is necessary to get formatted as you desire. The theory may sound a little complicated, but once you get down to it, it's only an hour or two's worth of work.
Certainly there are equivalent programs based on other server technologies. But this one has worked brilliantly for me over the years, and I trust it to work on sites for very big clients with crazy inbound traffic numbers...and it's never failed. It's the only reliable way I've found to write perfect, properly formatted Excel without requiring the user to jump through hoops to get a specific browser.

Resources