pig latin - load with text qualifier - text

I am trying to load a datafile in a pig latin script,
Data has 2 columns but there is a text qualifier in the 2nd column and sample data is below :
DEVICE_ID,SUPPORTED_TECH
a2334,"GSM900,GSM1500,GSM200"
a54623,"GSM900,GSM1500"
a86646,"GSM1500,GSM200"
When I try loading the date as below, 2nd column is not recognized as 1 column
deviceList = load 'deviceList.csv' Using PigStorage(',') as (DEVICE_ID:chararray, SUPPORTED_TECH:chararray );
How can I define the text qualifier while loading the data set ?

Try this , let me know if you need different output format
input.txt
DEVICE_ID,SUPPORTED_TECH
a2334,"GSM900,GSM1500,GSM200"
a54623,"GSM900,GSM1500"
a86646,"GSM1500,GSM200
PigScript:
A = LOAD 'input.txt' AS line;
deviceList = FOREACH A GENERATE FLATTEN(REGEX_EXTRACT_ALL(line,'^(\\w+),(.*)$')) as (DEVICE_ID:chararray, SUPPORTED_TECH:chararray );
DUMP deviceList;
OutPut:
(DEVICE_ID,SUPPORTED_TECH)
(a2334,"GSM900,GSM1500,GSM200")
(a54623,"GSM900,GSM1500")
(a86646,"GSM1500,GSM200")

Related

Batch Bookmark & Extraction of Bookmarked lines into MULTIPLE new file in EmEditor

I will like to do a follow up question to this question here : How to search and bookmark multiple strings at once in emeditor?
However, #Yutaka gave this response: https://i.stack.imgur.com/HLYoo.png and that is my follow up question.
How can each bookmarked strings saved differently or opened differently in emeditor when use the "extract bookmark lines to new file".
I will appreciate your response.
I am expecting the result of each bookmark strings to open in different files in em editor.
Batch Extract
The Batch Extract button in the Batch Find dialog will output all matched lines to a file, with labels indicating the entry that the lines matched with.
Example output:
===== a =====
a
apple
===== b =====
b
Macro
This macro finds the given array of strings with the extract option, and it saves each result to separate files. Make sure to change the marked lines.
var filePath = document.FullName;
var strings = ["a", "b", "e"]; // Change these
for (var i = 0; i < strings.length; i++) {
editor.OpenFile(filePath);
document.selection.Find(strings[i], eeFindSelectAll | eeFindExtract, 0);
document.Save("%HOMEPATH%\\Downloads\\" + strings[i] + ".txt"); // Change path
}
I tested it on this file. Make sure to save it before running the macro.
a
b
c
d
apple

Writing from Powershell to Excel: How to set the cell format for the value?

I am reading values from different Excel files, and composing a new one containing information from all the others. While doing that, Excel seems to automatically change '.' to a comma ','. How do I prevent that?
I am using Powershell ISE on Win10 and Office365. I tried reading and writing 'value2' and 'text' and writing those. I tried casting the value2 to string when I write it. This did not work. The variables in Powershell hold the correct values as strings. The moment I save the new Excel file, the correct format is gone.
Example: Value is "123.456". I can read it, the Powershell variable shows "123.456". I write it to Excel and open the Excel afterwards, it reads:
123,456 and interprets it as number instead of a text.
How I read the value
[...]
$tmp += ($worksheet.cells.item($intRow,$col).value2)
How I write the value (I tried "value", and "text" for both)
[...]
elseif($value -eq 6){
$sheet.Cells.item($intRow,$columncounter).value2 = ($tmp[$value]).ToString()
}
[...]
This is how I open the excel file for writing:
$objExcel=New-Object -ComObject Excel.Application
$objExcel.Visible=$false
$resultbook = $objExcel.Workbooks.Add()
$sheet = $resultbook.ActiveSheet
$sheet.Name = "Data"
This is how I save the excel file
$resultbook.SaveAs($name)
$resultbook.close()
Expected: Input == Output, example: 1234.5678 --> 1234.5678
Actual Result: Input != Output, example 1234.5678 --> 1234,5678
It works fine for all other strings, texts, numbers except those containing dots.
I presume there must be a way to specify the cell format in the target file, however I did not find any documentation on that.

Exporting data from cell to Excel file

Let's say I have a cell named data like this:
data{1} = vector1;
data{2} = vector2;
...
data{n} = vectorn;
All vectors (with numerical values) in data have the same 1xN size.
Now, I want to export this data file into an .xlsx document where each row is a vector and I want to label each column. The result should be something like this:
label1 label2 ... labelN
vector1(1,1) vector1(1,2) ... vector1(1,N)
... ... ... ...
vectorn(1,1) vectorn(1,2) ... vectorn(1,N)
I tried to do this using:
n=10;
N=5;
for i=1:n
data{i}=rand(1,N);
end
filename='test.xlsx';
xlswrite(filename,data)
but my .xlsx file comes with all the data from data in just one row. And I don't know how to do the labels.
Please help me.
This can be done using vertcat, num2cell, sprintf, strsplit and xlswrite as follows:
modified_data = num2cell(vertcat(data{:})); % Converting 1xn cell into nxN cell
% Generating Column Headers as specified in the question
col_header = strsplit(sprintf('label%d ' , 1:N));
col_header = col_header(1:end-1);
% If N is not much high number (e.g; if N=5), you can input Column Headers as:
% col_header = {'label1','label2','label3','label4','label5'};
filename='test.xlsx'; % Name of the excel file to be written
xlswrite(filename,[col_header; modified_data]); % Writing the excel file
Its because you call rand(1,N) together in one cell (data{i}). For each value in its own cell you have to create a nxN matrix of cells, which is easiest done if you transfrom the entire matirx:
n=10;
N=5;
data=rand(n,N);
celldata=num2cell(data);
filename='test.xlsx';
xlswrite(filename,celldata);
otherwise you have to make two loops but thats not so great performancewise

How to compare matlab array with entries in a data structure

I am trying to write code in Matlab that will allow me to do the following. There is a part of the code that generates an array D and uses an input file to create this structure called EEG which contains a lot of information. Specifically I am interested in a "labels" field of the chanlocs field of the EEG structure. It contains entries like 'F7', 'F8', 'FP1'... and 17 such entries. The array D that is generated also contains entries like this but in a different order.
So for e.g. D = ['F7','F8', 'FP1'] and EEG.chanlocs.labels = ['FP1','F7','F8']
they contain the same entries but they are in a different order and for what I am trying to do the order is important.
What I basically want to do is to have Matlab scan all entries of D and find that particular index of EEG.chanlocs.labels to which that entry corresponds.
Example: If D(1) = 'F7' I want it to return for e.g. i = 2 because F7 is the 2nd entry in EEG.chanlocs.labels. In this way I want it to scan all of D and return the indices in EEG.chanlocs.labels.
What I have tried so far is:
for i=1:17
if any(strcmp(D(:),[EEG.chanlocs(i).labels]))
msgbox(sprintf('i is: %d',i));
else
msgbox(sprintf('Error'));
end
end
But it does not work and it returns weird things... I am not entirely sure what to try...
Can anybody help? Any help would be greatly appreciated!!
Thanks.
Edited:
The following code shows how I obtain D. I give the user 3 prompt windows to input certain data. I then store the inputs from each of these in "data" or "data2" or "data3" and then I put all of them together in D.
uiwait(msgbox(sprintf('Please enter your new references for each electrode.\nFor FP1, FP2, O1 and O2 provide two references.')));
prompt = {'Fp1','F7','T3','T5','O1'};
prompt2 = {'FP2','F8','T4','T6','O2'};
prompt3 = {'C3','CP3','Cz','CPz','C4','CP4'};
dlg_title = 'Input references';
num_lines = 1;
%def = {'20','hsv'};
answer = inputdlg(prompt,dlg_title,num_lines );
answer2 = inputdlg(prompt2,dlg_title,num_lines );
answer3 = inputdlg(prompt3,dlg_title,num_lines );
for i=1:5
data(i,:) = answer(i,:);
data2(i,:) = answer2(i,:);
end
for i=1:6
data3(i,:) = answer3(i,:);
end
D(1:5)=data(:);
D(6:10)=data2(:);
D(11:16)=data3(:);
D=D';

Adding a newline character within a cell (CSV)

I would like to import product descriptions that need to be logically broken according by things like description, dimensions, finishes etc. How can I insert a line break so that when I import the file they will show up?
This question was answered well at Can you encode CR/LF in into CSV files?.
Consider also reverse engineering multiple lines in Excel. To embed a newline in an Excel cell, press Alt+Enter. Then save the file as a .csv. You'll see that the double-quotes start on one line and each new line in the file is considered an embedded newline in the cell.
I struggled with this as well but heres the solution. If you add " before and at the end of the csv string you are trying to display, it will consolidate them into 1 cell while honoring new line.
csvString += "\""+"Date Generated: \n" ;
csvString += "Doctor: " + "\n"+"\"" + "\n";
I have the same issue, when I try to export the content of email to csv and still keep it break line when importing to excel.
I export the conent as this: ="Line 1"&CHAR(10)&"Line 2"
When I import it to excel(google), excel understand it as string. It still not break new line.
We need to trigger excel to treat it as formula by:
Format -> Number | Scientific.
This is not the good way but it resolve my issue.
supposing you have a text variable containing:
const text = 'wonderful text with \n newline'
the newline in the csv file is correctly interpreted having enclosed the string with double quotes and spaces
'" ' + text + ' "'
On Excel for Mac 2011, the newline had to be a \r instead of an \n
So
"\"first line\rsecond line\""
would show up as a cell with 2 lines
I was concatenating the variable and adding multiple items in same row. so below code work for me. "\n" new line code is mandatory to add first and last of each line if you will add it on last only it will append last 1-2 character to new lines.
$itemCode = '';
foreach($returnData['repairdetail'] as $checkkey=>$repairDetailData){
if($checkkey >0){
$itemCode .= "\n".trim(#$repairDetailData['ItemMaster']->Item_Code)."\n";
}else{
$itemCode .= "\n".trim(#$repairDetailData['ItemMaster']->Item_Code)."\n";
}
$repairDetaile[]= array(
$itemCode,
)
}
// pass all array to here
foreach ($repairDetaile as $csvData) {
fputcsv($csv_file,$csvData,',','"');
}
fclose($csv_file);
I converted a pandas DataFrame to a csv string using DataFrame.to_csv() and then I looked at the results. It included \r\n as the end of line character(s). I suggest inserting these into your csv string as your row separation.
Depending on the tools used to generate the csv string you may need escape the \ character (\r\n).

Resources