Splitting a character column in R using separate function based on delimiter - string

I am trying to split a character column in R using code below
df %>% separate("x1", c("x1", "x2") ,"-", fill = "right").
The function does not work. The data comes from a csv file and the "-" in the text have been replaced by "...". I have tried using "..." as a delimiter but it still does not work.
Please see a pic of the column I am trying to split.Column x1
I have tried the code above, I am not sure what I am missing. Note that the column x1 contents are separated by "-" in csv but appears as "..." in the tibble loaded using read.csv.

Related

Extract last word in string in R - error faced

First, I wish to extract the last word and first word for the Description column (this column contains at least 3 words) into a newly created column firstword and lastword. However, the word() function is not applied to all the rows. As such, there are many rows with empty lastword, though these rows actually have a last word (as you can see from the Description column). This is shown in the first two lines of codes.
Second, I am also trying to get the third line of code to replace the lastword with firstword, if lastword is empty. However it isn't working.
Is there a way to rectify this?
c1$lastword = word(c1$Description,start=-1) #extract last word
c1$firstword = word(c1$Description,start=1) #extract first word
c1$lastword=ifelse(c1$lastword == " ", c1$firstword, c1$lastword)
I realise that there is white space at the beginning of some of the rows of the Description variable, which isn't shown when viewed in R.
Removing the whitespace using stri_trim() solved the issue.
c1$Description = stri_trim(c1$Description, "left") #remove whitespace

How to remove the FIRST whitespace from a python dataframe on a certain column

I extracted a pdf table using tabula.read_pdf but some of the data entries a) show a whitespace between the values and b) includes two sets of values into one column as shown one columns "Sports 2019/2018" and "Total 2019/2018": https://imgur.com/a/MviV6N9
In order for me to use df_1=df1["Sprots 2019/2018"].str.split(expand=True) to split the two values which are separated by a space, I need to remove the FIRST space shown in the first value so that it doesn't split into three columns.
I've tried df1["Sports 2019/2018"] = df1["Sports 2019/2018"].str.replace(" ", "") but this removes all the spaces, which would then combine the two values.
Is there a way to remove the first whitespace on column "Sports 2019/2018 so that it resembles the values on "Internet 2019/2018'?
df1["Sports 2019/2018"] = df1["Sports 2019/2018"].str.replace(" ", "", n = 1)
n=1 is an argument that will only replace the first character that will find.

Python 3 Pandas write to CSV format column as string

having an issue with handling oddly formatted excel data and writing to CSV in a string format. In my sample data, the excel table I am importing has a column ('Item_Number') and the odd data in the cell looks like: ="0001", ="00201", 2002AA, 1003B.
When I try to output to csv, the results look like: 1, 201, 2002AA, 1003B.
When I try to output to excel, the results are correct: 0001, 00201, 2002AA, 1003B.
All of the dtypes are objects. Am I missing a parameter in my .to_csv() command?
df = pd.read_excel(filename,sheetname='Sheet1', converters= {'Item_Number':str})
df.to_csv('Test_csv.csv')
df.to_excel('Test_excel.xlsx')
Tried different iterations of replacing the "=" and " " " but no response.
df.Item_Number.str.replace('=','')
Currently using the excel output but curious if there is a way to preserve string formatting in CSV. Thanks :)
Opening an excel spreadsheet with Python 3 Pandas that has data that looks like ="0001" will go to the dataframe correctly. CSV will turn it back to "1". Keeping the same format to CSV is apparently a known issue (from my comment above). To keep the formatting I have to add =" " back into the data like this:
df['Item_Number'] = '="' + df['Item_Number'] + '"'
Not sure if there is a cleaner version to that will have an Excel opened CSV file show 0001 without the quotes and equals sign.

Import/convert list of space separated numbers to list of arrays in Matlab

I'm new to Matlab and I want to convert a column with space separated numbers (source is an Excel file) to a list of arrays.
In a first step I want to create a list of arrays like this:
Then I want to transpose the list like this:
Whats the correct command for this conversion?
I know it's a simple question, but I couldn't find a similar one.
First use xlsread to read in the raw text. The text will be read in as a cell array where each row of text is placed in a cell. Once you do this, it's a matter of splitting up the strings by spaces to create an additional cell array of cells per row, then inputting these cells into a function that creates an array of numbers. You can use cellfun combined with strsplit and str2double. Assuming your Excel file is called list.xls, do something like this:
[~,~,RAW] = xlsread('list.xls');
list = cellfun(#str2double, cellfun(#strsplit, RAW, 'uni', 0), 'uni', 0).';
list contains the desired output. I've also transposed the result as this is what you desire. I created an Excel file that's in the same fashion as how you've mentioned in your post. This is what I get when I run the code. First I'll show what list looks like, then we'll examine what the actual contents are:
>> list
list =
[1x4 double] [1x5 double] [1x6 double]
>> celldisp(list)
list{1} =
5405 5414 5420 9999
list{2} =
5405 5414 5430 5341 9999
list{3} =
5405 5419 5419 5419 5412 9999
Here's also what the MATLAB Variable Editor looks like:

Writing lists of values to a csv file in different columns using Python

I need help with writing values to a csv file.
I have 4 lists of values that I would like to write to a csv file, but not in a normal way. I mean, usually the csv module write the values in the same row, but this time I would like to write the values of the lists in different columns, I mean one column and different rows for every list. In this way, all the list 1 data would be in the column A of Excel, all the list 2 data would be in the column B of excel, and so on. Now I was trying a lot of commands and I half did it, but not at all.
My list's names are: It_5minute, Iiso_5min, IHDKR_5min and Iperez_5min.
My actual commands:
with open('Test.csv', 'w') as f:
w = csv.writer(f)
for row in zip(It_5minute, Iiso_5min, IHDKR_5min,Iperez_5min):
w.writerow(row)
With these commands I get the list values in the same column (instead of every list in a different column), each value separated by comma. I have attached an Excel image to clarify the problem. I want each list in a separated column, to be able of do operations with the data in an easy way. Can anybody help me? Thank you very much.
PD: Would be nice to write the name of each list at the top of every column, too.
Just for the fact change with open('Test.csv', 'w') as f: to with open('Test.csv', 'wb') as f: as csv's are binary.
State the delimiter to use clearly (in this case a comma) and whether or not to use quoting just in case (optional)
with open('Test.csv', 'wb') as f:
w = csv.writer(f,delimiter=',', quoting = csv.QUOTE_ALL) #you can replace the delimiter for whatever that suits you
for row in zip(It_5minute, Iiso_5min, IHDKR_5min,Iperez_5min):
w.writerow(row)
In case this doesn't work, you have to manually state the delimiter in the excel text import wizard. You can read how to here
Common Delimiters:
Tab = '\t'
semicolon = ';'
comma = ','
space = ' '
eg: comma selected as the delimiter

Resources