Append dataframes while ignoring column names

Append dataframes while ignoring column names - python-3.x

I have a loop which produces a dataframe at every iteration.
DimensionAll=[]
for i in range(0,10):
###code here###
DimensionAll.append(MatrixDimension)
where MatrixDimension is as follows:
Cameroun Rwanda Niger Zambia Mali Angola Ethiopia
ECON 0.092811 0.088966 0.077843 0.101176 0.080969 0.045516 0.084101
FOOD 0.052086 0.035915 0.037474 0.025168 0.039382 0.015079 0.083499
ENV 0.018479 0.043677 0.004737 0.003744 0.009258 0.034044 0.010285
HEA 0.000061 0.029189 0.012335 0.001238 0.019010 0.007995 0.017359
PERS 0.056941 0.005222 0.048715 0.030879 0.070985 0.064726 0.023330
COM 0.000000 0.000000 0.000000 0.000000 0.009809 0.005251 0.099614
POL 0.025177 0.090846 0.005273 0.029481 0.001929 0.065365 0.034342
How can I ignore the column names when appending? is there a different way to append or concatenate the dataframes after each iteration (while keeping the column names at top without repetition)?

This piece of code gave me the answer i was looking for :
DimensionAll=[]
for i in range(0,10):
###code here###
DimensionAll.append(MatrixDimension)
DimensionAll= pd.concat(DimensionAll)

Related

How to search for specific text in csv within a Pandas, python

Hello I want to find the account text # in the title column, and save it in the new csv. Pandas can do it, I tried to make it but it didn't work.
This is my csv http://www.sharecsv.com/s/c1ed9790f481a8d452049be439f4e3d8/Newnormal.csv
this is my code:
import pandas as pd
data = pd.read_csv("Newnormal.csv")
data.dropna(inplace = True)
sub ='#'
data["Indexes"]= data["title"].str.find(sub)
print(data)
I want results like this
From, to, title Xavier5501,KudiiThaufeeq,RT #KudiiThaufeeq: Royal
Rape, Royal Harassment, Royal Cocktail Party, Royal Pedo, Royal
Bidding, Royal Maalee Bayaan, Royal Slavery..et
Thank you.

reduce records to only those that have an "#" in title
define new column which is text between "#" and ":"
you are left with some records where this leave NaN in to column. I've just filtered these out
df = pd.read_csv("Newnormal.csv")
df = df[df["title"].str.contains("#")==True]
df["to"] = df["title"].str.extract(r".*([#][A-Z,a-z,0-9,_]+[:])")
df = df[["from","to","title"]]
df[~df["to"].isna()].to_csv("ToNewNormal.csv", index=False)
df[~df["to"].isna()]
output
from to title
1 Xavier5501 #KudiiThaufeeq: RT #KudiiThaufeeq: Royal Rape, Royal Harassmen...
2 Suzane24979006 #USAID_NISHTHA: RT #USAID_NISHTHA: Don't step outside your hou...
3 sandeep_sprabhu #USAID_NISHTHA: RT #USAID_NISHTHA: Don't step outside your hou...
4 oliLince #Timothy_Hughes: RT #Timothy_Hughes: How to Get a Salesforce Th...
7 rismadwip #danielepermana: RT #danielepermana: Pak kasus covid per hari s...
... ... ... ...
992 Reptoid_Hunter #sapiofoxy: RT #sapiofoxy: I literally can't believe we ha...
994 KPCResearch #sapiofoxy: RT #sapiofoxy: I literally can't believe we ha...
995 GreySparkUK #VoxSmartGlobal: RT #VoxSmartGlobal: The #newnormal will see mo...
997 Gabboa10 #HuShameem: RT #HuShameem: One of #PGO_MV admin staff test...
999 wanjirunjendu #ntvkenya: RT #ntvkenya: AAK's Mugure Njendu shares insig...

How can I import Excel file columns in Python and find correlation coefficient between them?

I have a lot of columns of numbers (for example, AAA, BBB, CCC, DDD and EEE) in Excel file.
I need to import these columns into Python and find correlation coefficient between every 2 columns.
Only show columns which have correlation coefficient from +0.5 to +1 and -0.5 to -1.
import pandas as pd
data = pd.read_excel('SO.xlsx')
df = pd.DataFrame(data)
df.corr()

Here is a really simple solution to this issue; I don't have your data so I've done it with sample data I found. Here we go:
import pandas as pd
data = pd.read_excel('https://global.oup.com/us/companion.websites/fdscontent/uscompanion/us/static/companion.websites/9780199734177/Example_1_rawdata.xls')
df = pd.DataFrame(data)
df.corr()
The output looks like this:
Hugs Comps PerAd SocAc ProAd ComSt PhyHlp Encour Tutor
Hugs 1.000000 0.666100 0.149995 0.616721 0.541132 0.653129 0.473344 0.549393 0.565627
Comps 0.666100 1.000000 0.247194 0.575720 0.509667 0.642069 0.424696 0.543826 0.487571
PerAd 0.149995 0.247194 1.000000 0.222337 0.081263 0.163510 0.090505 0.181000 0.120080
SocAc 0.616721 0.575720 0.222337 1.000000 0.409031 0.559579 0.338293 0.447923 0.348733
ProAd 0.541132 0.509667 0.081263 0.409031 1.000000 0.666905 0.733851 0.464976 0.754339
ComSt 0.653129 0.642069 0.163510 0.559579 0.666905 1.000000 0.595900 0.540038 0.671789
PhyHlp 0.473344 0.424696 0.090505 0.338293 0.733851 0.595900 1.000000 0.432037 0.717585
Encour 0.549393 0.543826 0.181000 0.447923 0.464976 0.540038 0.432037 1.000000 0.412042
Tutor 0.565627 0.487571 0.120080 0.348733 0.754339 0.671789 0.717585 0.412042 1.000000
If you add the following it will replace all the the values with a Pearson correlation below 0.5 with nulls:
df[df > 0.5]

How can I read and process 100 bytes at a time from a large CSV file?

The below csv is only a snippet of my main data file.
customer.csv
customer_id,order_id,number_of_items
10,4736,9
5,3049,1
1,4689,3
6,4114,9
1,4524,15
2,3727,16
3,3507,7
7,3988,3
5,4993,16
6,1945,4
7,3081,7
3,3707,2
5,1739,12
9,4167,17
7,3242,12
2,3109,10
10,2197,20
10,3528,13
8,4917,2
5,1713,19
8,4224,4
7,2160,2
10,2044,19
10,2956,8
3,3906,2
5,2288,16
7,1854,20
7,4404,2
9,1622,2
7,3685,2
10,2755,10
3,3390,10
6,1424,6
3,2127,15
4,1221,15
9,2994,14
1,1413,13
7,2771,7
3,4579,13
10,2208,4
CURRENTLY ALL I HAVE
import os
os.path.getsize("customer.csv") # outputs, 424 bytes
HOW I THINK I NEED TO PROCEED
I think I need to do something with open csv and read bytes? Then look at each row bit wise?
Please note, I am not looking specifically for someone to just give me an answer on how to do this (although that would be appreciated). Therefore, if someone could just point me in the right direction or give me some topics to look into that would be great. Side note, I know I am supposed to use encoding and decoding somewhere for this task.

This script will use the csv to load the data from customer.csv and compute the average using the builtin statistics module:
import csv
from statistics import mean
with open('customer.csv', newline='') as csvfile:
data = csv.DictReader(csvfile)
# group the customers by customer_id
customers = {}
for order in data:
customers.setdefault(order['customer_id'], []).append(int(order['number_of_items']))
# print the `average`:
print('{:<15} {}'.format('customer_id', 'average'))
for k, v in customers.items():
print('{:<15} {:.2f}'.format(k, mean(v)))
Prints:
customer_id average
10 11.86
5 12.80
1 10.33
6 6.33
2 13.00
3 8.17
7 6.88
9 11.00
8 3.00
4 15.00

how to find exponential weighted moving average using dataframe.ewma?

Previously I used the following to calculate the ewma
dataset['26ema'] = pd.ewma(dataset['price'], span=26)
But, in the latest version of pandas pd.ewma has been removed. How to calculate using the new method dataframe.ewma?
dataset['26ema'] = dataset['price'].ewma(span=26)
This is giving an error 'AttributeError: 'Series' object has no attribute 'ewma'

Use Series.ewm:
dataset['price'].ewm(span=26)
See GH11603 for the relevant PR and mapping of the old API to new ones.
Minimal Code Example
s = pd.Series(range(5))
s.ewm(span=3).mean()
0 0.000000
1 0.666667
2 1.428571
3 2.266667
4 3.161290
dtype: float64

How to improve performance of Watir element.present? method?

I experience IMO high delays with the Ruby script using Watir. Here's the problem description: I am testing AJAX based application and I wanted to avoid using of sleep to make sure page gets loaded:
class Page
attr_accessor :expected_elements
def loaded?
# code to make sure AJAX page is loaded
end
end
So instead of this:
def loaded?
# static delay sufficient to get page loaded
sleep(MAX_PAGE_LOAD_TIME)
true
end
I wanted to have something like this:
def loaded?
Watir::Wait.until(MAX_PAGE_LOAD_TIME) do
expected_elements.all? do |element|
element.present?
end
end
end
The problem is the evaluation of the block takes too long. The more elements I check for presence the higher this delay gets. I experienced roughly this delay for one iteration:
Firefox -> 130ms
IE -> 615ms
Chrome -> 115 ms
So to check if N elements are present I get N times corresponding delay... Well the consequence is that eventhough MAX_PAGE_LOAD_TIME expires the Watir::Wait::TimeoutError is not thrown because block evaluation has not been finished yet... So I ended up in the situation where the check for elements presence introduces higher delay than the static delay which is sufficient enough to get page loaded.. I tried to improve performance by locating elements by xpath, but the performance gain was not significant..
What am I doing wrong? Is there a way to speed-up execution time for present? method?? Do these delays correspond with your experience - or are they high?
I checked if the problem could be in the browser-server communication, but here the delays are very low.. I got 100ms time difference for the server response including backend DB request. Of course it takes some time to render page based on this response, but for sure it does not take seconds.
My configuration:
- Win7 OS,
- Firefox 17.0.1
- IE 8.0.7601.17514 with IEDriverServer_x64_2.26.2
- Chrome 23.0.1271.97 m with chromedriver_win_23.0.1240.0,
- Ruby 1.9.3p125,
- watir-webdriver (0.6.1),
- selenium-webdriver (2.27.2)
Thank you for your help!
Based on the discussion I post a sample of benchmarking code:
Benchmark.bm do |x|
x.report("text") do
b.span(:text => "Search").present?
end
end
Benchmark.bm do |x|
x.report("xpath") do
b.span(:xpath => "/html/body/div/div/div[2]/div[2]/div/div/div/div/div/div/div/div[2]/div/div/div[2]/div/div/div/div/div/div[2]/div/div/div/div/div/div/div/div[2]/div/div/div[2]/div/div/div/div/div/div[2]/div/div/div/div/div/span/span").present?
end
end
user system total real
text 0.000000 0.000000 0.000000 ( 0.140405)
xpath 0.000000 0.000000 0.000000 ( 0.120005)
Additional benchmarking results:
container_xpath = "/html/body/div/div/div[2]/div[2]/div/div/div/div/div/div/div/div[2]/div/div/div[2]/div/div/div/div/div/div[2]/div/div/div/div/div/div/div/div/div/div/div/div/div/div/div[2]/div/div/div/div[2]/div/div/div"
Benchmark.bm do |x|
x.report("cntnr") do
#c = b.div(:xpath => container_xpath)
#c.present?
end
end
Benchmark.bm do |x|
x.report("lb1") do
#c.div(:xpath => "div[1]/div/div").present?
##c.div(:text => "Company:").present?
end
end
Benchmark.bm do |x|
x.report("lb2") do
#c.div(:xpath => "div[3]/div/div").present?
##c.div(:text => "Contact person:").present?
end
end
Benchmark.bm do |x|
x.report("lb3") do
#c.div(:xpath => "div[5]/div/div").present?
##c.div(:text => "Address:").present?
end
end
And the results were:
Results for container reference and relative xpath:
user system total real
cntnr 0.000000 0.000000 0.000000 ( 0.156007)
lb1 0.000000 0.000000 0.000000 ( 0.374417)
lb2 0.000000 0.000000 0.000000 ( 0.358816)
lb3 0.000000 0.000000 0.000000 ( 0.358816)
Results for container reference and div's text:
user system total real
cntnr 0.000000 0.000000 0.000000 ( 0.140402)
lb1 0.000000 0.000000 0.000000 ( 0.358807)
lb2 0.000000 0.000000 0.000000 ( 0.358807)
lb3 0.000000 0.000000 0.000000 ( 0.374407)
When absolute xpaths were used:
container_xpath = "/html/body/div/div/div[2]/div[2]/div/div/div/div/div/div/div/div[2]/div/div/div[2]/div/div/div/div/div/div[2]/div/div/div/div/div/div/div/div/div/div/div/div/div/div/div[2]/div/div/div/div[2]/div/div/div"
Benchmark.bm do |x|
x.report("cntnr") do
#c = b.div(:xpath => container_xpath)
#c.present?
end
end
lb1_xpath = container_xpath + "/div[1]/div/div"
Benchmark.bm do |x|
x.report("lb1_x") do
b.div(:xpath => lb1_xpath).present?
end
end
lb2_xpath = container_xpath + "/div[3]/div/div"
Benchmark.bm do |x|
x.report("lb2_x") do
b.div(:xpath => lb2_xpath).present?
end
end
lb3_xpath = container_xpath + "/div[5]/div/div"
Benchmark.bm do |x|
x.report("lb3_x") do
b.div(:xpath => lb3_xpath).present?
end
end
Results were:
user system total real
cntnr 0.000000 0.000000 0.000000 ( 0.140404)
lb1_x 0.000000 0.000000 0.000000 ( 0.124804)
lb2_x 0.000000 0.000000 0.000000 ( 0.156005)
lb3_x 0.000000 0.000000 0.000000 ( 0.140405)

Okay, this answer assumes your site is using jquery. If it's not, you'll have to figure out the library in use and modify the method accordingly...
Write a method that waits for the Ajax calls to finish...
def wait_for_ajax(timeout = 10)
timeout.times do
return true if browser.execute_script('return jQuery.active').to_i == 0
sleep(1)
end
raise Watir::Wait::TimeoutError, "Timeout of #{timeout} seconds exceeded on waiting for Ajax."
end
Call that method when you first load the page you're testing. Then iterate through your expected elements to see if they're present (if you have an array of Watir elements to check, make sure you use .each with it, not .all? as you have in your code there).

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Append dataframes while ignoring column names - python-3.x

This piece of code gave me the answer i was looking for : DimensionAll=[] for i in range(0,10): ###code here### DimensionAll.append(MatrixDimension) DimensionAll= pd.concat(DimensionAll)

Related

How to search for specific text in csv within a Pandas, python

How can I import Excel file columns in Python and find correlation coefficient between them?

How can I read and process 100 bytes at a time from a large CSV file?

how to find exponential weighted moving average using dataframe.ewma?

How to improve performance of Watir element.present? method?

Categories

Resources