i have a huge file containing numbers with decimals that looks like this
31.3043 31.3043 31.3043 31.3043 31.3043 31.3043 31.3043
200 200 200 200 200 200 200
121.739 121.739 121.739 121.739 121.739 121.739 121.739
10.4348 10.4348 10.4348 10.4348 10.4348 10.4348 10.4348
5.2174 5.2174 5.2174 5.2174 5.2174 5.2174 5.2174 5.2174
what i only want to search is for the numbers above 10
31.3043 31.3043 31.3043 31.3043 31.3043 31.3043 31.3043
200 200 200 200 200 200 200
121.739 121.739 121.739 121.739 121.739 121.739 121.739
10.4348 10.4348 10.4348 10.4348 10.4348 10.4348 10.4348
excluding numbers
5.2174 5.2174 5.2174 5.2174 5.2174 5.2174 5.2174 5.2174
This does the job:
Ctrl+F
Find what: (?<![.\d])\d{2,}(?:\.\d+)?
check Wrap around
check Regular expression
Search
Explanation:
(?<![.\d]) : negative lookbehind, make sure we have not digit or dot before current position
\d{2,} : 2 or more digits
(?:\.\d+)? : optional non capture group, for decimal part
Related
I have a dataframe which looks like:
A B C
a 100 200
a NA 100
a 200 NA
a 100 100
b 200 200
b 100 200
b 200 100
b 200 100
I use the aggregate function on column B and column C as:
ag=data.groupby(['A']).agg({'B':'sum','C':'sum'}).reset_index()
Output:
A B C
a NULL NULL
b 700 600
Expected Output:
A B C
a 400 400
b 700 600
How can I modify my aggregate function so that NULL values are ignored?
Maybe you already though about this but is not possible in your problem, but you can replace the NA values by 0 in the dataframe before this operation. If you donĀ“t want to change the original dataframe you can transform it in a copy.
ag=data.replace(np.nan,0).groupby(['A']).agg({'B':'sum','C':'sum'}).reset_index()
I have a Dataframe like
A% B %
2 3
- 2.1
100 0
- 5
i want the output exported excel to be like
A% B%
2.00% 3.00%
- 2.10%
100.00% 0.00%
- 5.00%
I have these A% and B% by using the following logic
((df['some_values1'] / df['some_values2']) *100.00).round(2).astype(str)+('%')
but when i make the excel of the df the - get converted to nan and the single digits like 2, 100, 0 does not have two decimals like 2.00% 100.00% & 0.00% it is 2%,100% and 0% only whereas it works fine with 2.1 and other fractional digits.
Thanks in advance for any help.
You can define a function that suits your needs:
def to_percent_format(p):
if str(p).strip() != "-":
return "{:.2%}".format(p/100)
else:
return p.strip()
Indeed:
>>> to_percent_format(3)
'3.00%'
>>> to_percent_format("-")
'-'
And just apply it to your dataframe!
df.apply(to_percent_format, axis=1)
I am having trouble using reshaped data with pandas. Imagine I have a dataframe in long format like:
town year type var1 var2
a 2010 a 100 200
b 2010 a 100 200
c 2010 a 100 200
a 2011 a 100 200
b 2011 a 100 200
c 2011 a 100 200
a 2010 b 100 200
b 2010 b 100 200
c 2010 b 100 200
a 2011 b 100 200
b 2011 b 100 200
c 2011 b 100 200
I then reshape it into wide format like so:
df = pd.pivot_table(df, index="town", columns=["year", "type"], values=["var1", "var2"]
var1 var2
year 2010 2011 2010 2011
type a b a b a b a b
town
a 100 200 100 200 100 200 100 200
b 100 200 100 200 100 200 100 200
c 100 200 100 200 100 200 100 200
How do I then access the resulting dataframe? For instance if I wanted to get data for all the towns, but only for the year 2010 and type b? I have tried using df.query but that results in a buffer type mismatch. I have tried using:
df[df["year"] == 2010]
But that results in a key error. Any help would be gratefully received. Thanks
Use slicers:
idx = pd.IndexSlice
df = df.loc[:, idx[:, 2010, 'b']]
print (df)
var1 var2
year 2010 2010
type b b
town
a 100 200
b 100 200
c 100 200
Or DataFrame.xs:
df = df.xs((2010, 'b'), axis=1, level=[1,2])
print (df)
var1 var2
town
a 100 200
b 100 200
c 100 200
Solution with filtering by Index.get_level_values and chained boolean mask by & for bitwise AND, but because filter columns need DataFrame.loc (first : means all rows):
m1 = df.columns.get_level_values('year') == 2010
m2 = df.columns.get_level_values('type') == 'b'
df = df.loc[:, m1 & m2]
print (df)
var1 var2
year 2010 2010
type b b
town
a 100 200
b 100 200
c 100 200
import pandas as pd
df = pd.read_csv('test.csv')
df1 = df.groupby(['year', 'type']).sum()
df1
df can get the table, then just use groupby,i think it's easier.
what i get is
var1 var2
year type
2010 a 300 600
b 300 600
2011 a 300 600
b 300 600
In theory for every possible color-altering css filter functions like grayscale, invert, opacity, saturate, sepia there exist equivalent transformation achievable through svg filter feColorMatrix.
Actually almost all such operations are described here.
For instance, sepia is a shorthand for:
<filter id="sepia">
<feColorMatrix type="matrix"
values="(0.393 + 0.607 * [1 - amount]) (0.769 - 0.769 * [1 - amount]) (0.189 - 0.189 * [1 - amount]) 0 0
(0.349 - 0.349 * [1 - amount]) (0.686 + 0.314 * [1 - amount]) (0.168 - 0.168 * [1 - amount]) 0 0
(0.272 - 0.272 * [1 - amount]) (0.534 - 0.534 * [1 - amount]) (0.131 + 0.869 * [1 - amount]) 0 0
0 0 0 1 0"/>
</filter>
With hue-rotate though it's slightly more complicated, the actual definition is:
This is pretty much how it's implemented in Chromium.
My question would be - what is exact math behind this coefficients - why exactly them has been chosen? Do they stand for approximations of some irrational numbers or whatever?
I have a table, which is as under:
TABLE1
5670 Paid for A 1000 A PK1
5670 Paid for B 200 B PK2
5120 Paid for C 300
5120 Paid for D 400 PK2
5120 Paid for E 500 E PK2
5120 Paid for F 600 F
TABLE2
T1CODE Name
A Group A
B Group B
E Group E
F Group F
The output from the above data:
Row Labels Sum of Amount
(blank) 700
A 1000
B 200
E 500
F 600
Grand Total 3000
Required output using DAX:
Row Labels Sum of Amount
A 1000
B 200
E 500
F 600
Grand Total 3000
Try this:
Sum Of Values =
CALCULATE (
SUM ( 'TABLE1'[Amount] ),
FILTER ( 'TABLE1', 'TABLE1'[Row Labels] <> BLANK () )
)