After concat() in Pandas, the new DF is missing "00" - python-3.x

I ran into this problem:
After webscraping (bs4) i have now two Excel-Files: All works fine.
When I concat() or even just append with pandas, i get the "Major-Excel". Unfortunately, in the Column "EAN" many rows show a missing "0" or even "00" in the beginning.
For Example:
Product 1: 36073...
Product 2: 883...
Product 3: 7370...
It should be like this:
Product 1: 36073...
Product 2: 00883...
Product 3: 07370...
This problem only happens when the ean starts with "0" or "00".
Besides i also checked the single Excel Files that i want to concat(): The "0" are also there so this problem only happens when i want to put the two into 1 file.
I am actually just using this standard script:
d = pd.read_excel('one.xlsx')
d1 = pd.read_excel('two.xlsx')
three = pd.concat([d, d1])
three.to_excel('major.xlsx')
Anybody experienced the same problem?
Thanks and best regards!

Maybe the column type is default to integer and dropping the leading 0's. Declare the column as a varchar before the concatenation?
d = pd.read_excel('one.xlsx', converters={'EAN':str})

Related

Create a text cell value based on row entries and corresponding columns

I understand this is a tough way of wording the problem I have. Please try and help me.
I want to create a Column called Orders which contains cells based on corresponding item values.
So if I have columns: FlatNo, Truffle, Pineapple, Mango, Chocochips; I want to create a column called Orders which has value:
FlatNo - A51
Mango - 1
Chocochips - 1
(if no values in the Pineapple & Truffle Columns, none show up in Orders columns)
See image
How do I do that ? Thank you in advance
You can use IF and &. & simply puts the different desired things altogether.
Hope the following formula will get you the result for column orders. I have put the number of each item ordered inside parentheses before the item.
="Flat No. "&A2&IF(ISBLANK(B2),"","-("&B2&")"&$B$1)&IF(ISBLANK(C2),"","-("&C2&")"&$C$1)&IF(ISBLANK(D2),"","-("&D2&")"&$D$1)&IF(ISBLANK(E2),"","-("&E2&")"&$E$1)
For instance the third order is shown like this: Flat No. E-23-(1)Truffle -1 Pc Rs 60-(3)Mango -1 Pc Rs 60

Pandas: get first datetime-in and last datetime-out in one row

First of all thanks in advance, there are always answers here so we learn a lot from the experts. I'm a noob using "pandas" (it's super handie for what i tried and achieved so far).
I have these data, handed to me like this (don't have access to the origin), 20k rows or more sometimes. The 'in' and 'out' columns may have one or more data per date, so when i get a 'in' the next data could be a 'out' or a 'in', depending, leaving me a blank cell, that's the problem (see first image).
I want to filter the first datetime-in, to left it in one column and the last datetime-out in another but the two in one row (see second image); the data comes in a csv file. I am doing this particular work manually with LibreOffice Calc (yeap).
So far, I have tried locating and relocating, tried merging, grouping... nothing works for me so i feel frustrated, ¿would you please lend me a hand? here is a minimal sample of the file
By the way english is not my language. ¡Thanks so much!
First:
out_column = df["out"].tolist()
This gives you all the out dates as a list, we will need that later.
in_column = df["in"].tolist() # in is used by python so I suggest renaming that row
I treat NaT as NaN (Null) in this Case.
Now we have to find what rows to keep, which we do by going through the in column and only keeping the rows after a NaN (and the first one):
filtered_df = []
tracker = False
for index, element in enumerate(in):
if index == 0 or tracker is True:
filtered_df.append(True)
tracker = False
continue
if element is None:
tracker = True
filtered_df.append(False)
Then you filter your df by this Boolean List:
df = df[filtered_df]
Now you fix up your out column by removing the null values:
while null in out_column:
out_column.remove(null)
Last but not least you overwrite your old out column with the new one:
df["out"] = out_column

Excel formulas - count based on two criteria

I've been struggling to get this formula to work. I have a spreadsheet where I need to find out how many of one column (BH2:BH915) contain a value (X) if another column (N2:N915) contains either 1 or 0. I've tried a bunch of versions to get it to work - this is the latest:
=sum(countifs(N2:N915="1","0") and (BH2:BH915="X"))
Can anyone tell me where I'm going wrong?
You'll need to add together the count of records with 0 and X and the count of records with 1 and X
=COUNTIFS(N2:N915, 0, BH2:BH915, "X") + COUNTIFS(N2:N915, 1, BH2:BH915, "X")
You could also use
=SUM(COUNTIFS(BH2:BH915,"x",N2:N915,{0,1}))
or if you knew that the numbers in N2:N915 were integers
=COUNTIFS(BH2:BH915,"x",N2:N915,">="&0,N2:N915,"<="&1)

Spotfire extract decimal numbers from string column

I have a string column that looks like this:
ColumnA
"POINT (10.203942930 45.2903203)"
"POINT (11.356898730 25.2548565)"
from which I would like to extract the numbers and create two separete columns:
column1
10.203942930
11.356898730
column2
45.2903203
25.2548565
I have tried RXReplace, but I get one single number which is not decimal...
RXReplace([col], "[^0-9]", "", "g")
Any help will be really appreciate it.
Thanks in advance.
#thundermils - Please try this solution.
Step 1: Create a calculated column which separates text 'POINT' from "POINT (10.203942930 45.2903203)"
left(right([Column A],Len([Column A]) - Find("(",[Column A])),-1)
Now, separate the two numbers into two separate columns.
Step 2: Create 'calc1' column with the below custom expression
Trim(left([calc],Find(" ",[calc])))
Step 3: Create 'calc2' column with the below custom expression
Trim(right([calc],Len([calc]) - Find(" ",[calc])))
Here is the final output:

Find string (from table) in cell in matlab

I want to find the location of one string (which I take it from a table) inside of a cell:
A is my table, and B is the cell.
I have tested :
strncmp(A(1,8),B(:,1),1)
but it couldn't find the location.
I have tested many commands like:
ismember,strmatch,find(strcmp),find(strcmpi)find(ismember),strfind and etc ... but they all give me errors mostly because of the type of my data !
So please suggest me a solution.
You want strfind:
>> strfind('0123abcdefgcde', 'cde')
ans =
7 12
If A is a table and B a cell array, you need to index this way:
strfind(B{1}, A.VarName{1});
For example:
>> A = cell2table({'cde'},'VariableNames',{'VarName'}); %// create A as table
>> B = {'0123abcdefgcde'}; %// create B as cell array of strings
>> strfind(B{1}, A.VarName{1})
ans =
7 12
Luis Mendo's answer is absolotely correct, but I want to add some general information.
Your problem is that all the functions you tried (strfind, ...) only work for normal strings, but not for cell array. The way you index your A and B in your code snippet they still stay a cell array (of dimension (1,1)). You need to use curly brackets {} to "get rid of" the cell array and get the containign string. Luis Mendo shows how to do this.
Modified solution from a Mathworks forum, for the case of a single-column table with ragged strings
find(strcmp('mystring',mytable{:,:}))
will give you the row number.

Resources