I have a simple csv with the following contents:
Pattern, Mode, Bandwidth
Random, Read, 23.988
Random, Write, 30.628
Seq, Read, 38.000
Seq, Write, 33.785
I want to produce a similar grouped bar chart as this one:
import altair as alt
import pandas as pd
df = pd.read_csv("simple.csv")
alt.Chart(df).mark_bar().encode(
x='Bandwidth:Q',
y='Mode:N',
row='Pattern:N'
)
Just hangs altair (I have to kill the session of jupyter notebook to get out of it).
That said, if I manually put in the data: pd.DataFrame([ ], [], columns = []. The same drawing command seems work, partially.
It looks like you have spaces in your CSV file, so the column names are not 'Mode' and 'Bandwidth', but rather ' Mode' and ' Bandwidth' (with leading spaces).
The best fix would be to remove spaces from your CSV file. If that is not possible, then in pandas, you can pass the skipinitialspace=True argument to pd.read_csv to strip these spaces when reading the data into a dataframe.
never mind, it appears I didn't pass in skipinitialspace=True when I read CSV file, and it messed up the column names.
Related
i'm trying to replace multiple tabs with only one tab using python3 + Pandas in a given .csv File, but i'm not able to find a way to solve this problem; if my function is:
def function(csv_file):
-remove multiple tabs --> means have a \t \t b ==> a \t \b
[...]
the file must be remain a csv file.
How could i do it?
A csv is just a text file that can be parsed with tailored tools, but can also be read as plain text. So, you can use regex to substitute consecutive \t instances.
You still need to provide more details, but take this as a provisional answer.
import re
with open('test.csv', 'r') as fo:
text = fo.read()
print(text)
print(repr(text))
text = re.sub(r'\t+', r'\t', text)
print(text)
print(repr(text))
Output
test sdasdf
asfasdf asdf asfasdf asdf
'test\t\tsdasdf\nasfasdf\tasdf\tasfasdf\t\tasdf'
# after regex
test sdasdf
asfasdf asdf asfasdf asdf
'test\tsdasdf\nasfasdf\tasdf\tasfasdf\tasdf'
Notice the last print does not have any consecutive tabs.
Now you can write back to csv.
import os
with open('test_temp.csv', 'w') as fo:
fo.write(text)
# os.remove('test.csv')
# os.rename('test_temp.csv', 'test.csv')
It is a good idea to write a temp file, remove the original, and finally rename the temp. This is so you have a safe copy at all times for odd situations like corrupt file writes, power outages, or any other contingency.
I have a file that has several lines like this:
(lp0
I200
aV<!DOCTYPE HTML
When I read this file in python, the file is read as it is, like this:
(lp0
I200
aV<!DOCTYPE HTML
but when i read it in pyspark, i got the following value:
(lp0\nI200\naV<!DOCTYPE HTML
How can I get the output of pyspark read to its original value.
I read the file as this:
rdd = sc.wholeTextFiles("file:///home/hadoopuser/gc/data_from_gc/part-04068",use_unicode=False)
Thanks in advance.
Your system is probably reading the file correctly, in both cases... and in both cases, it almost assuredly contains the '\n' (newline) characters (even if you don't see them).
For example, in Python, if you use the print() function, any text with newline characters will display to the screen, but you won't see the actual characters, you will simply see the text, with text wrapping, as shown above.
In some tools, and PySpark may be one of them (again, not seeing your code) if you display the output of a calculation, i.e. by evaluating a Python statement using a Python prompt on the commandline versus printing the text, your result may be displayed to the screen as a string representation of the variable, which will show you the newline characters.
NOTE: If you give us the appropriate snippets of code, we can try to see where things have gone awry and provide better solutions.
For example:
In [4]: h = 'hello\nworld!'
In [5]: h # Here we are simply evaluating the Python Statement
Out[5]: 'hello\nworld!'
In [6]: print(h) # Here we are printing the content of h
hello
world!
I've used the following code in numerous programs and it has always worked...until now.
a = open('Filename.csv', 'r')
ba = a.read()
a.close()
b = list(zip(*(e.split(',') for e in ba)))
It has always split the csv file on the commas. Now I'm trying the same code with a csv file and it is splitting the file on each and every letter of the file, irregardless of letters or number, capital or small case letters.
Is there better code to use to split up a file on the commas?
Oops, I think I just found my stupid mistake, copying code from a couple of different sources doesn't always work the best, it should have been readlines(), not read()...once I saw it, just doing more head pounding, it finally caught my eye.
I'm writing a python3 program that generates a text file that is post-procesed with asciidoc for the final report in html and pdf.
The python program generates thousands files with graphics to be included in the final report. The filenames for the files are generated with tempfile.NamedTemporaryFile
The problem it that the character set used by tempfile is defined as:
characters = "abcdefghijklmnopqrstuvwxyz0123456789_"
then I end with some files with names like "_6456_" and asciidoc interprets the "_" as formatting and inserts some html that breaks the report.
I need to either find a way to "escape" the filenames in asciidoc or control the characters in the temporary file.
My current solution is to rename the temporary file after I close it to replace the "_" with some other character (not in the list of characters used by tempfile to avoid a collision) but i have the feeling that there is a better way to do it.
I will appreciate any ideas. I'm not very proficient with python yet, i think overloading _RandomNameSequence in tempfile will work, but i'm not sure how to do it.
regards.
Hack way, based on manipulating tempfile internals:
class MyRandomSequence(tempfile._RandomNameSequence):
characters = "xyz123"
tempfile._name_sequence = MyRandomSequence()
# make your temporary file
Example:
>>> tempfile.NamedTemporaryFile()
<open file '<fdopen>', mode 'w+b' at 0x1013b5540>
>>> k=_
>>> k.name
'/var/folders/Su/SuMQtmxiE941sUwe8d91lE+++TU/-Tmp-/tmp33x22z'
Maybe you could create a temporary directory using tempfile.tempdir and generate the filenames manually such as file1, file2, ..., filen . This way you easily avoid "_" characters and you can just delete the temporary directory after you are finished with that.
Why don't you create a generator yourself?
Example:
import string
from random import choice
def generate():
size = 9
return ''.join([choice(string.letters + string.digits) for i in range(size)])
Source
CVS diff has the option to display revisions side by side and denote diffs with usual patch symbols like:
import zlib import zlib
> import time
import traceback import traceback
import cElementTree as ElementTree import cElementTree as ElementTree
from util import infopage from util import infopage
> from util.timeout import Timeout
Is there anyway to pipe that output to vimdiff so that it displays those two columns in two side-by-side buffers along with all the diff-highlighting goodness of vimdiff?
I'm aware of tools like cvsvimdiff.vim and the like, but the problem with those is that they only work on one file at a time, whereas the cvs diff output lists multiple files.
Once you have that text in a Vim buffer, you can easily split it into two buffers yourself. Looks like your sample input does the split at 50 characters.
So use <C-v> to visual-block highlight half of the diff, cut it, paste it in a new buffer, remove trailing whitespace and the > separator characters, and there you go. Or write a function to do it, something like this (which assumes the split is always at 50):
function! SplitCVSDiff()
exe "norm gg_\<C-v>51\<Bar>Gd:vnew\<CR>p"
silent! %s/\v\s+(\> )?$//
endfunction
Might have to be made more robust, I'm not familiar with the exact style of output CVS uses. Shouldn't be hard though.
I would write a script say : vimdiff_cvs file.cc
which does this:
Store diff of file.cc locally, delete it, update to repository. Now copy it as ~/.vimdiff/file.cc.repo.
Restorefile.cc by applying the patch
Call vimdiff file.cc ~/.vimdiff/file.cc.repo.