having an issue with handling oddly formatted excel data and writing to CSV in a string format. In my sample data, the excel table I am importing has a column ('Item_Number') and the odd data in the cell looks like: ="0001", ="00201", 2002AA, 1003B.
When I try to output to csv, the results look like: 1, 201, 2002AA, 1003B.
When I try to output to excel, the results are correct: 0001, 00201, 2002AA, 1003B.
All of the dtypes are objects. Am I missing a parameter in my .to_csv() command?
df = pd.read_excel(filename,sheetname='Sheet1', converters= {'Item_Number':str})
df.to_csv('Test_csv.csv')
df.to_excel('Test_excel.xlsx')
Tried different iterations of replacing the "=" and " " " but no response.
df.Item_Number.str.replace('=','')
Currently using the excel output but curious if there is a way to preserve string formatting in CSV. Thanks :)
Opening an excel spreadsheet with Python 3 Pandas that has data that looks like ="0001" will go to the dataframe correctly. CSV will turn it back to "1". Keeping the same format to CSV is apparently a known issue (from my comment above). To keep the formatting I have to add =" " back into the data like this:
df['Item_Number'] = '="' + df['Item_Number'] + '"'
Not sure if there is a cleaner version to that will have an Excel opened CSV file show 0001 without the quotes and equals sign.
Related
I have the following default data set:
I convert from .TXT to .CSV format but I get one row of all data. how can divide into a different row?
I have 200 000 data.
My default CSV data :
A
ath;006400005;1
I want to CSV data :
A B C
ath 006400005 1
in GS:
=ARRAYFORMULA(IFERROR(SPLIT(A1:A; ";")))
Separate it by comma, not semicolon.
From your .txt file, find and replace ";" to "," then convert it to .csv
I am trying to convert an excel file to csv.
In this file i have a column with dates, sometimes they are filled in, but sometimes we do not have a date yet.
But i cannot get it to work as expected.
Besides the python file we also have a json file with lots of config in because we have to swap out a lot of values during the conversion with new values.
Attempt 1
def format_dates():
for datecolumn in config["dateformats"]:
if datecolumn["column"] in columns:
outputdata[datecolumn["column"]] = outputdata[datecolumn["column"]].dt.strftime(datecolumn["format"])
This converted the dates but if a date in a row was empty the script failed.
For our next attempt we did it like this.
for datecolumn in config["dateformats"]:
if datecolumn["column"] in columns:
outputdata[datecolumn["column"]] = pd.to_datetime(outputdata[datecolumn["column"]], format = datecolumn["format"], errors='coerce')
This did not yield any errors but after a check in the converted file it seems that there where no dates migrated at all, so this was a bust.
In the json file we have following config for the date format.
"dateformats": [
{
"column": "date column",
"datatype": "date",
"format": "%d/%m/%Y %H:%M"
}
So what i need is that this column is migrated to the csv file, with the correct date format, if the column is not filled in it should be empty in the new file.
All help will be greatly appreciated.
Regards
Dave
Since attempt 1 works unless the date column is empty, you just have to modify that a bit.
If the empty date column causes an exception you could catch that and just substitute an empty value for the date field in that row of the CSV file.
Or you could test that the value of the date column is not the empty string and then convert.
Since your question does not contain the actual trace of the failure, I cannot be more specific.
I need to import an excel, the excel has a few columns and the 1st column A is a date column. Column A has the date format DDMMMYYYY e.g. '01Jan2017' and in excel the data type is date type. But when I import it to SAS, all the other columns remain the same data type (numeric, character, etc.) and value. But column A becomes a number e.g. ('42736' for '01Jan2017'). How do I import the data as it is and without converting the data type to other types?
libname out '/path';
proc import out=out.sas_output_dataset
datafile='/path/excel_file.xlsx'
DBMS=XLSX
REPLACE;
sheet="Sheet1";
run;
It is hard to know without seeing the data. The below is general information, it may not answer your precise problem.
To avoid common errors you should set mixed=yes in your libname. You may also want to include stringdate=yes statement.
The mixed=yes allows for any out of range excel date values.
stringdates=yes brings all dates into SAS in character format, so you will need to use the input() function to convert this into a SAS date.
Date = input( Date , mmddyy10. )
I would suggest that you import the excel with the import wizard in SAS. Afterwards right-click on the query and extract the code, see here: SAS Import Query DE
In the generated code itself you can format each imported column into the desired format.
For the possible format see: https://documentation.sas.com/doc/en/pgmsascdc/9.4_3.5/leforinforref/n0p2fmevfgj470n17h4k9f27qjag.htm
Hope this helps.
A value of '42736' for '01Jan2017' is an indication that the column in the Excel file has a mix of cells with date values and cells with string values. In that case SAS will make the variable character and store the date values as a digit string that represents the raw number excel uses for the date. To convert '42736' to a date value you need to first convert it to a number and then adjust the number for the difference in the base date used by Excel.
date_value = input(date_string,32.) + '30DEC1899'd ;
To convert the strings that look like '01JAN2017' use the DATE informat instead.
date_value = input(date_string,date11.);
You could add logic to do both to handle a column with mixed values.
date_value = input(date_string,??date11.);
if missing(date_value) then
date_value = input(date_string,??32.) + '30DEC1899'd
;
To have the new variable print the date values in a human readable style attach a date type format to the variable.
format date_value date9. ;
I just started using Matlab (R2015a) a few weeks ago, and although I have searched for an answer to this question (and tried a few workarounds) I haven't had any luck. Hopefully, it's an easy fix!!
I am trying to write one column of serial dates at high precision (I need milliseconds) and many columns of data to a .csv file. I don't want insane precision for everything, just the first column of dates.
Here's what I've found:
- csvwrite doesn't allow for differing precisions.
xlswrite doesn't have enough precision (even though my serial date is a double, and yes I looked at the spreadsheet cell)
dlmwrite appends data in row format, so writing the dates and then appending the rest of the data doesn't work (though soooo close!)
Now I'm trying with fprintf:
hz_time is the serial date (double)
data1 and data2 are 4x25 (double) and 4x7 (double) respectively
hz_time = 1.0e+05 *
[7.357583607870371, 7.357583607928241, 7.357583607986110, 7.357583608043980]
STR_data = [data1, data2];
filename = (strcat('Processed_',files(k1).name));
file = fopen(filename,'w');
fprintf(file,'%.20f\n',hz_time);
fprintf(file,'%f%f%f%f%f%f%f%f%\n',STR_data);
fclose('all')
Currently, this code appends data1 and data2 in one cell at the end of the STR_date_time column. When I try concatenating hz_time and the data matrices together (using strcat) I fail:
STR_data = strcat([hz_time, data1, data2])
Warning: Out of range or non-integer values truncated during conversion to character.
I'm sure it's probably my formatting...
My end goal is to export this data (into a .csv or excel spreadsheet or something) so that the first column has the serial date (loads of precision) and columns 2-8 have the other data in it.
Any help would be much appreciated.
Thanks in advance!
if I load a csv file into excel, value 123.320000 will become 123.32.
i need to view all contents as they are. any way to stop excel from hiding trailing zeros?
reading other posts, i found that doing something like this could work "=""123.3200000" but that would mean running regex on the file every time i want to view it.. since it comes in xxxx|###|xxx format and i have no control over the generation part.
How exactly are you loading the CSV file?
If you import it as "Text" format then Excel will retain all formatting, including leading/trailing zeros.
In Excel 2010 you import from the "Data" tab and choose "From Text", find your CSV file then when prompted choose to format the data as "Text"
I'm assuming that once the imported values are in the sheet, you want to treat them as numbers and not as text, i.e. you want to be able to sum, multiply, etc. Loading the values as text will prevent you from doing this -- until you convert the values back to numbers, in which case you will lose the trailing zeros, which brings you back to your initial conundrum.
Keeping in mind that there is no difference between the values 123.32 and 123.3200000, what you want is just to change the display format such that the full precision of your value is shown explicitly. You could do this in VBA like so:
strMyValue = "123.3200000"
strFormat = "#."
' Append a 0 to the format string for each figure after the decimal point.
For i = 1 To Len(strMyValue) - InStr(strMyValue, ".")
strFormat = strFormat & "0"
Next i
With Range("A1")
.Value = CDbl(strMyValue)
.NumberFormat = strFormat
'Value now shown with same precision as in strMyValue.
End With