pandas.read_clipboard only reads whole lines not columns - python-3.x

I transferred all my python3 codes from macOS to Ubuntu 18.04 and in one program I need to use pandas.clipboard(). At this point of time there is a list in the clipboard with multiple lines and columns divided by tabs and each element in quotation marks.
After just trying
import pandas as pd
df = pd.read_clipboard()
I'm getting this error: pandas.errors.ParserError: Expected 8 fields in line 3, saw 11. Error could possibly be due to quotes being ignored when a multi-char delimiter is used.. And line 3 looks like "word1" "word2 and another" "word3" .... Without the quotation marks you count 11 elements and within quotation marks you count 8.
In the next step I tried
import pandas as pd
df = pd.read_clipboard(sep='\t')
and I'm getting no errors but it results only in a Series with each line of the clipboard source in one element.
Yes, maybe it's a solution to write a code for separating each element of a line after this step but because it's working very well under macOS (with just pd.read_clipboard()) I hope that there's a better solution.
Thank you for helping.

I wrote a "turnaround" for my question. It's not the exact solution but because I just need the elements of one column in an array I solved it like that:
import pyperclip
# read clipboard
cb = pyperclip.paste()
# lines in array
cb_arr = cb.splitlines()
column = []
for cb_line in cb_arr:
# words in array
cb_words = cb_line.split("\"")
# pick element of column 1
word = cb_words[1]
column.append(word)
# delete column name
column.pop(0)
print(column)
Maybe it helps someone else, too.

Related

Getting KeyError for pandas df column name that exists

I have
data_combined = pd.read_csv("/path/to/creole_data/data_combined.csv", sep=";", encoding='cp1252')
So, when I try to access these rows:
data_combined = data_combined[(data_combined["wals_code"]=="abk") &(data_combined["wals_code"]=="aco")]
I get a KeyError 'wals_code'. I then checked my list of col names with
print(data_combined.columns.tolist())
and saw the col name 'wals_code' in the list. Here's the first few items from the print out.
[',"wals_code","Order of subject, object and verb","Order of genitive and noun","Order of adjective and noun","Order of adposition and NP","Order of demonstrative and noun","Order of numeral and noun","Order of RC and noun","Order of degree word and adjective"]
Anyone have a clue what is wrong with my file?
The problem is the delimiter you're using when reading the CSV file. With sep=';', you instruct read_csv to use semicolons (;) as the separators for columns (cells and column headers), but it appears from your columns print out that your CSV file actually uses commas (,).
If you look carefully, you'll notice that your columns print out displays actually a list with one long string, not a list of individual strings representing the columns names.
So, use sep=',' instead of sep=';' (or just omit it entirely as , is the default value for sep):
data_combined = pd.read_csv("/path/to/creole_data/data_combined.csv", encoding='cp1252')

Python 3.7 Start reading from a specific point within a csv

Hey I could really use help here. I've tried for 1 hour to find a solution for python but was unable to find it.
I am using Python 3.7
My input is a file provided by a customer - I cannot change it. It is structured in the following way:
It starts with random text not in CSV format and from line 3 on the rest of the file is in csv format.
text line
text line
text line or nothing
Enter
[Start of csv file] "column Namee 1","column Namee 2" .. until 6
"value1","value2" ... until 6 - continuing for many lines.
I wanted to extract the first 3 lines to create a pure CSV file but was unable to find code to only do it for a specific line range. It also seems the wrong solution as I think starting to read from a certain point should be possible.
Then I thought split () is the solution but it did not work for this format. The values are sometimes numbers, dates or strings. You cannot use the seek() method as they start differently.
Right now my dictreader takes the first line as an index and consequently the rest is rendered in chaos.
import csv
import pandas as pd
from prettytable import PrettyTable
with open(r'C:\Users\Hans\Downloads\file.csv') as csvfile:
csv_reader = csv.DictReader (r'C:\Users\Hans\Downloads\file.csv', delimiter=',')
for lines in csvfile:
print (lines)
If some answer for python has been found please link it, I was not able to find it.
Thank you so much for your help. I really appreciate it.
I will insist with the pandas option, given that the documentation clearly states that the skiprows parameter allows to skip n number of lines. I tried it with the example provided by #Chris Doyle (saving it to a file named line_file.csv) and it works as expected.
import pandas as pd
f = pd.read_csv('line_file.csv', skiprows=3)
Output
name num symbol
0 chris 4 $
1 adam 7 &
2 david 5 %
If you know the number of lines you want to skip then just open the file and read that many lines then pass the filehandle to Dictreader and it will read the remaining lines.
import csv
skip_n_lines = 3
with open('test.dat') as my_file:
for _ in range(skip_n_lines):
print("skiping line:", my_file.readline(), end='')
print("###CSV DATA###")
csv_reader = csv.DictReader(my_file)
for row in csv_reader:
print(row)
FILE
this is junk
this is more junk
last junk
name,num,symbol
chris,4,$
adam,7,&
david,5,%
OUTPUT
skiping line: this is junk
skiping line: this is more junk
skiping line: last junk
###CSV DATA###
OrderedDict([('name', 'chris'), ('num', '4'), ('symbol', '$')])
OrderedDict([('name', 'adam'), ('num', '7'), ('symbol', '&')])
OrderedDict([('name', 'david'), ('num', '5'), ('symbol', '%')])

Remove double quotes while printing string in dataframe to text file

I have a dataframe which contains one column with multiple strings. Here is what the data looks like:
Value
EU-1050-22345,201908 XYZ DETAILS, CD_123_123;CD_123_124,2;1
There are almost 100,000 such rows in the dataframe. I want to write this data into a text file.
For this, I tried the following:
df.to_csv(filename, header=None,index=None,mode='a')
But I am getting the entire string in quotes when I do this. The output I obtain is:
"EU-1050-22345,201908 XYZ DETAILS, CD_123_123;CD_123_124,2;1"
But what I want is:
EU-1050-22345,201908 XYZ DETAILS, CD_123_123;CD_123_124,2;1 -> No Quotes
I also tried this:
df.to_csv(filename,header=None,index=None,mode='a',
quoting=csv.QUOTE_NONE)
However, I get an error that an escapechar is required. If i add escapechar='/' into the code, I get '/' in multiple places (but no quotes). I don't want the '/' either.
Is there anyway I can remove the quotes while writing into a text file WITHOUT adding any other escape characters ?
Based on OP's comment, I believe the semicolon is messing things up. I no longer have unwanted \ if using tabs to delimit csv.
import pandas as pd
import csv
df = pd.DataFrame(columns=['col'])
df.loc[0] = "EU-1050-22345,201908 XYZ DETAILS, CD_123_123;CD_123_124,2;1"
df.to_csv("out.csv", sep="\t", quoting=csv.QUOTE_NONE, quotechar="", escapechar="")
Original Answer:
According to this answer, you need to specify escapechar="\\" to use csv.QUOTE_NONE.
Have you tried:
df.to_csv("out.csv", sep=",", quoting=csv.QUOTE_NONE, quotechar="", escapechar="\\")
I was able to write a df to a csv using a single space as the separator and get the "quotes" around strings removed by replacing existing in-string spaces in the dataframe with non-breaking spaces before I wrote it as as csv.
df = df.applymap(lambda x: str(x).replace(' ', u"\u00A0"))
df.to_csv(outpath+filename, header=True, index=None, sep=' ', mode='a')
I couldn't use a tab delimited file for what I was writing output for, though that solution also works using additional keywords to df.to_csv(): quoting=csv.QUOTE_NONE, quotechar="", escapechar="")

I try to sum all numbers in a txt file, putting two workable variables in one line and it returns 0. What's gone wrong?

I used 3 lines of codes which worked well. Then I try to contract them into one line, which I believe can be done by putting two variables together. But for some reason, the contracted codes only returned 0 instead of the actual sum that can be computed before. What's gone wrong in the contracted codes?
hand = open('xxxxxx.txt')
# This is a text file that contains many numbers in random positions
import re
num = re.findall('[0-9]+', hand.read())
# I used regular expression on variable 'num' to extract all numbers from the file and put them into a list
numi = [int(i) for i in num]
# I used variable 'numi' to convert all numbers in string form to integer form
print(sum(numi))
# Successfully printed out the sum of all integers
print(sum([int(i) for i in re.findall('[0-9]+', hand.read())]))
# Here is the problem. I attempted to contract variables 'num' and 'numi' into one line of codes. But I only got 0 instead of the actual sum from it`enter code here`
if you execute all the code like I see up there, it is normal to get 0 because you didn't re-open the file after using it the first time, just re-open the file "hand" or leave the final line that you want to use and delete the three lines before it.
This code works fine for me -
hand = open('xxxxx.txt')
import re
print(sum([int(i) for i in re.findall('[0-9]+', hand.read())]))
You have to close the file and reopen it before running the last line.

Gap Analysis/Report for CSV in Python 3.6.2

Start End
MM0001 MM0009
MM0010 MM0020
MM0030 MM0039
MM0059 MM0071
Good afternoon, I wanted to create code in Python in 3.6.2 that will allow me to essentially look for gaps in rows of consecutive numbers, such as with this one. It would then output to the screen for the missing numbers in a format similar to below:
MM0021 MM0029
MM0040 MM0051
MM0052 MM0058
I've created some code for this program based on an answer I found around here, but I don't believe it's complete, as well as it being done in Python 2.7 I believe. I however used it as a basis for what I was trying to do.
import csv
with open('thedata.csv') as csvfile:
reader = csv.reader (csvfile)
for line, row in enumerate(reader, 1):
if not row:
print 'Start of line', line, 'Contents', row
Any help will be greatly appreciated.
import csv
def out(*args):
print('{},{}'.format(*(str(i).rjust(4, "0") for i in args)))
prev = 0
data = csv.reader(open('thedata.csv'))
print(*next(data), sep=', ') # header
for line in data:
start, end = (int(s.strip()[2:]) for s in line)
if start != prev+1:
out(prev+1, start-1)
prev = end
out(start, end)
it’s really ugly sorry, but should work?
outputs comma separated text
if something doesn’t work, or you want it to save to a file, just comment

Resources