I have below code in Groovy. Basically what I'm trying is to read the set of Input records and merge them into 1 or more records with common key combination.
The Key combination is as shown below. After reading the input file, I have written the key and fields into HashMap ( see code). But now I need to check the key in the input file , if the key is seen then I have write the output record otherwise I just need to write a output record as without merging. My questions
what is the command to insert a field in Output record ?.
import java.util.Properties;
import java.io.InputStream;
import java.util.HashMap;
import java.util.Map;
for( int i = 0; i < dataContext.getDataCount(); i++ ) {
InputStream is = dataContext.getStream(i);
Properties props = dataContext.getProperties(i);
reader = new BufferedReader(new InputStreamReader(is));
/* This is how to declare HashMap */
def forcastMap = [:]
String Key;
String Shipfrom = "";
String Item = "";
String Fcast = "";
String Shipto = "";
String Planned_Arrival_Date = "";
String Qty = "";
String PrevKey = "";
List<String> line = null
while ((line = reader.readLine()) != null)
{
if(line.length() > 20) //Make sure it is a data line so we can do substring manipulation
{
Shipfrom = line.substring(35,12)
Item = line.substring(50,50)
Fcast = line.substring(10,50)
Shipto = line.substring(75,10)
Planned_Arrival_Date = line.substring(85,8)
Qty = line.substring(90,12)
Key = (Shipfrom + Item + Fcast + Shipto)
forcastMap.put(Key,Planned_Arrival_Date,Qty)
if key != PrevKey {
}
}
}
//dataContext.storeStream(is, props);
}
Related
I've a simple method to read csv and convert it to Excel:
public static void main(String[] args) throws Exception {
CSVReader csvReader = new CSVReader(new FileReader("P:\\employees.csv"));
SXSSFWorkbook sxssfWorkbook = new SXSSFWorkbook();
SXSSFSheet sxssfSheet = sxssfWorkbook.createSheet("Sheet");
String[] dataRow = null;
int rowNum = 0;
while ((dataRow = csvReader.readNext()) != null) {
Row currentRow = sxssfSheet.createRow(rowNum);
for (int i = 0; i < dataRow.length; i++) {
String cellValue = dataRow[i];
currentRow.createCell(i).setCellValue(cellValue);
}
rowNum++;
}
sxssfWorkbook.write(new FileOutputStream("P:\\employees.xlsx"));
}
But there's a problem with cell data type. All my data now represents as text. I want to find columns by their name (for example age, paid_total), not by index, and set numeric (float) data type for these columns. Something like this (sorry for sql-like style, for me it's a simplier to describe): WHEN columnName IN ('age', 'paid_total') SET allColumnType AS NUMERIC. How can I do this? Or it's only possible with indexes?
CSV files always are plain text files without data types. But if you exactly know which column should be which data type, then a type safe Excel sheet can be created. This can be achieved by column indes as well as by column header. To detect types by column header, those headers wolud must be into a separate data structure. But this will always be benefical.
Let's take the example employees.csv from here: https://gist.github.com/kevin336/acbb2271e66c10a5b73aacf82ca82784.
Then following should work:
import java.io.*;
import org.apache.poi.ss.usermodel.*;
import org.apache.poi.xssf.usermodel.*;
import org.apache.poi.xssf.streaming.*;
import org.apache.poi.ss.SpreadsheetVersion;
import org.apache.poi.ss.util.AreaReference;
import org.apache.poi.ss.util.CellReference;
import com.opencsv.CSVReader;
import java.time.format.DateTimeFormatterBuilder;
import java.time.format.DateTimeFormatter;
import java.time.LocalDate;
class CreateExcelFromCSVDifferentDataTypes {
public static void main(String[] args) throws Exception {
try (
SXSSFWorkbook sxssfWorkbook = new SXSSFWorkbook(); FileOutputStream fileout = new FileOutputStream("./employees.xlsx");
CSVReader csvReader = new CSVReader(new FileReader("./employees.csv"));
) {
sxssfWorkbook.setCompressTempFiles(true);
CellStyle dateStyle = sxssfWorkbook.createCellStyle();
dateStyle.setDataFormat(sxssfWorkbook.getCreationHelper().createDataFormat().getFormat("dd-MMM-yy"));
SXSSFSheet sxssfSheet = sxssfWorkbook.createSheet("Sheet");
sxssfSheet.setRandomAccessWindowSize(100);
String[] strHeaders = null;
String[] dataRow = null;
int rowNum = 0;
while ((dataRow = csvReader.readNext()) != null) {
if (rowNum == 0) strHeaders = dataRow;
Row currentRow = sxssfSheet.createRow(rowNum);
for (int i = 0; i < dataRow.length; i++) {
String cellValue = dataRow[i];
if (rowNum > 0 && "HIRE_DATE".equals(strHeaders[i])) {
DateTimeFormatter formatter= new DateTimeFormatterBuilder().parseCaseInsensitive().appendPattern("dd-MMM-yy").toFormatter(java.util.Locale.ENGLISH);
LocalDate localDate = LocalDate.parse(cellValue, formatter);
currentRow.createCell(i).setCellValue(localDate);
currentRow.getCell(i).setCellStyle(dateStyle);
} else if (rowNum > 0 && "SALARY".equals(strHeaders[i])) {
double d = Double.valueOf(cellValue);
currentRow.createCell(i).setCellValue(d);
} else {
currentRow.createCell(i).setCellValue(cellValue);
}
}
rowNum++;
}
sxssfWorkbook.write(fileout);
sxssfWorkbook.dispose();
}
}
}
I am new to groovy. I have the following code where I want to print the Strings on the console, which doesn't work:
import java.io.FileWriter;
import java.util.Arrays;
import java.io.Writer;
import java.util.List;
//Default separator
char SEPARATOR = ',';
//get path of csv file (creates new one if its not exists)
String csvFile = "c:";
println "========================= csvFile";
println csvFile;
String[] params = {"hello"};
writeLine(params, SEPARATOR);
//function write line in csv
public void writeLine(String[] params, char separator)
{
boolean firstParam = true;
println params;
StringBuilder stringBuilder = new StringBuilder();
String param = "";
for (int i = 0; i < params.length; i++)
{
//get param
param = params[i];
println param;
//if the first param in the line, separator is not needed
if (!firstParam)
{
stringBuilder.append(separator);
}
//Add param to line
stringBuilder.append(param);
firstParam = false;
}
//prepare file to next line
stringBuilder.append("\n");
//add to file the line
println stringBuilder.toString();
}
It gives the following output:
in groovy to declare array you have to use square brackets:
String[] params = ["hello"]
btw whole code could be simplified to this:
String[] params = ["hello"]
def writeLine(params, separator=','){
println params.join(separator)
}
writeLine(params)
I want to update 1 cell in an Excel (xlsx) file using OLEDB.
I have attached my code.
The first time you run, it fills in the values because INSERT is working.
The second time you run, it APPENDS the values because the UPDATE fails, and my Catch performs an INSERT. My goal is to have the UPDATE command work. When the UPDATE command executes, I get an error message:
No value given for one or more required parameters.
at System.Data.OleDb.OleDbCommand.ExecuteCommandTextErrorHandling(OleDbHResult hr)
at System.Data.OleDb.OleDbCommand.ExecuteCommandTextForSingleResult(tagDBPARAMS dbParams, Object& executeResult)
at System.Data.OleDb.OleDbCommand.ExecuteCommandText(Object& executeResult)
at System.Data.OleDb.OleDbCommand.ExecuteCommand(CommandBehavior behavior, Object& executeResult)
at System.Data.OleDb.OleDbCommand.ExecuteReaderInternal(CommandBehavior behavior, String method)
at System.Data.OleDb.OleDbCommand.ExecuteNonQuery()
at ConsoleApp3.Program.InsertCSVIntoSheet(String excelPath, String sheet, String csvPath) in C:\Users\jbendler.atlab\source\repos\ConsoleApp3\Program.cs:line 175
This code comes from a demo Console App. This is the program.cs file.
The main part of the code is located at the bottom of the InsertCSVIntoSheet method. The output xlsx file is simply a new workbook, unedited. The input file is simply a text file, named with a .csv extension, that contains the following separated by carriage-return/linefeed - so the file has 5 lines with one character per line:
ABCDE
Thanks.
using System;
using System.Collections.Generic;
using System.Data;
using System.Data.OleDb;
using System.IO;
using System.Linq;
using System.Text;
using System.Threading.Tasks;
using System.Transactions;
namespace ConsoleApp3
{
class Program
{
static void Main(string[] args)
{
InsertCSVIntoSheet(#"c:\temp\book1.xlsx", "NewSheet", #"c:\temp\test.csv");
}
private static void InsertCSVIntoSheet(string excelPath, string sheet, string csvPath)
{
int column;
int row;
int pos;
bool done;
char readUntilChar;
string csvData;
string columnName;
string cell;
string excelSheetName;
List<Tuple<int, int, object>> values = new List<Tuple<int, int, object>>();
string connectionString = CreateOleDbConnectionStringForExcel(excelPath);
OleDbCommand oleDbCommand = new OleDbCommand();
decimal decimalTest;
DateTime dateTimeTest;
int status;
int numColumns;
// Put CSV in to row/column/value Tuples
using (StreamReader reader = new StreamReader(csvPath))
{
csvData = reader.ReadToEnd();
row = 1;
// Split the csv data by new line
foreach (string line in csvData.Split(new string[] { "\r\n" }, StringSplitOptions.None))
{
if (!string.IsNullOrEmpty(line))
{
column = 1;
pos = 0;
cell = string.Empty;
// Split each line by ',' to get each cell. A value wrapped in '"' can include a ','
while (pos < line.Length)
{
cell = string.Empty;
// Check the first character. If it is a '"' then we assume the cell is surrounded
// in '"' and do NOT include the '"' in the output to the excel cell.
if (line[pos] == '"')
{
readUntilChar = '"';
pos++;
}
else
{
readUntilChar = ',';
}
done = line[pos] == readUntilChar;
if (line[pos] == '"')
{
// Skip over second '"' for a blank ("")
pos++;
}
while (!done)
{
cell += line[pos++];
if (pos == line.Length || line[pos] == readUntilChar)
{
if (readUntilChar == '"')
{
// Skip the '"'
pos++;
}
done = true;
}
}
// Skip over the ','
pos++;
if (!string.IsNullOrEmpty(cell))
{
// Test the data to determine the type (check for decimal and DateTime).
if (decimal.TryParse(cell, out decimalTest))
{
values.Add(new Tuple<int, int, object>(row, column, decimalTest));
}
else if (DateTime.TryParse(cell, out dateTimeTest))
{
values.Add(new Tuple<int, int, object>(row, column, dateTimeTest));
}
else
{
// Write out the value as a string
values.Add(new Tuple<int, int, object>(row, column, cell));
}
}
column++;
}
}
row++;
}
}
using (TransactionScope transactionScope = new TransactionScope(TransactionScopeOption.Suppress))
{
excelSheetName = GetExcelSheetNames(connectionString).Where(n => n.ToUpper().StartsWith(sheet.ToUpper())).FirstOrDefault();
//Set the connection string to recognize the header and to operate in Update mode(IMEX= 0)
using (OleDbConnection oleDbConnection = new OleDbConnection(connectionString.Replace("IMEX=1", "IMEX=0")))
{
oleDbConnection.Open();
oleDbCommand = new OleDbCommand();
oleDbCommand.Connection = oleDbConnection;
oleDbCommand.CommandType = CommandType.Text;
if (excelSheetName != null)
{
// Delete Sheet
oleDbCommand.CommandText = "DROP TABLE [" + sheet + "]";
status = oleDbCommand.ExecuteNonQuery();
}
else
{
excelSheetName = sheet + "$";
}
numColumns = values.Max(v => v.Item2);
oleDbCommand.CommandText = "CREATE TABLE [" + sheet + "](";
for (int index = 0; index < numColumns; index++)
{
oleDbCommand.CommandText += "Column" + index.ToString() + " CHAR(255), ";
}
oleDbCommand.CommandText = oleDbCommand.CommandText.Substring(0, oleDbCommand.CommandText.Length - 2) + ")";
status = oleDbCommand.ExecuteNonQuery();
}
using (OleDbConnection oleDbConnection = new OleDbConnection(connectionString.Replace("IMEX=1", "IMEX=0")))
{
oleDbConnection.Open();
oleDbCommand.Connection = oleDbConnection;
// Write out new values
foreach (Tuple<int, int, object> tuple in values)
{
try
{
columnName = GetExcelColumnName(tuple.Item2) + (tuple.Item1 + 1).ToString();
oleDbCommand.CommandText = "UPDATE [" + excelSheetName + columnName + ":" + columnName + "] SET " + "F1" + " = '" + tuple.Item3.ToString() + "'";
status = oleDbCommand.ExecuteNonQuery();
}
catch (OleDbException oledbex)
{
oleDbCommand.CommandText = "INSERT INTO [" + excelSheetName + "] VALUES ('" + tuple.Item3.ToString() + "')";
status = oleDbCommand.ExecuteNonQuery();
}
}
}
}
}
private static List<string> GetExcelSheetNames(string connectionString)
{
OleDbConnection oleDbConnection = null;
DataTable dataTable = null;
List<string> excelSheetNames = null;
using (oleDbConnection = new OleDbConnection(connectionString))
{
oleDbConnection.Open();
dataTable = oleDbConnection.GetOleDbSchemaTable(OleDbSchemaGuid.Tables, null);
}
if (dataTable != null)
{
excelSheetNames = new List<string>(dataTable.Rows.Cast<DataRow>().Where(r => r["TABLE_NAME"].ToString().EndsWith("$") || r["TABLE_NAME"].ToString().EndsWith("$'")).Select(r => r["TABLE_NAME"].ToString().ToUpper()));
}
return excelSheetNames;
}
private static string CreateOleDbConnectionStringForExcel(string sourceFile)
{
var fileInfo = new FileInfo(sourceFile);
switch (fileInfo.Extension.ToUpper())
{
case ".XLS":
return string.Format("Provider=Microsoft.ACE.OLEDB.12.0;;Data Source='{0}';Extended Properties='Excel 8.0;HDR=No;IMEX=1'", sourceFile);
case ".XLSX":
case ".XLSM":
return string.Format("Provider=Microsoft.ACE.OLEDB.12.0;;Data Source='{0}';Extended Properties='Excel 12.0;HDR=No;IMEX=1'", sourceFile);
default:
throw new NotSupportedException("File type not supported.");
}
}
private static string GetExcelColumnName(int columnNumber)
{
string columnName = String.Empty;
int dividend = columnNumber;
int modulo;
while (dividend > 0)
{
modulo = (dividend - 1) % 26;
columnName = Convert.ToChar(65 + modulo).ToString() + columnName;
dividend = (int)((dividend - modulo) / 26);
}
return columnName;
}
}
}
The following code works fine to submit streaming job to cluster.
string statusFolderName = #"/tutorials/wordcountstreaming/status";
var jobcred = new BasicAuthCredential();
jobcred.UserName = "username";
jobcred.Password = "pass";
jobcred.Server = new Uri("https://something.azurehdinsight.net");
// Define the Hadoop streaming MapReduce job
StreamingMapReduceJobCreateParameters myJobDefinition = new StreamingMapReduceJobCreateParameters()
{
JobName = "my word counting job",
StatusFolder = statusFolderName,
Input = "/example/data/gutenberg/davinci.txt",
Output = "/tutorials/wordcountstreaming/output",
Reducer = "wc.exe",
Mapper = "cat.exe"
};
myJobDefinition.Files.Add("/example/apps/wc.exe");
myJobDefinition.Files.Add("/example/apps/cat.exe");
var jobClient = JobSubmissionClientFactory.Connect(jobcred);
// Run the MapReduce job
JobCreationResults mrJobResults = jobClient.CreateStreamingJob(myJobDefinition);
----------------------Mapper---------------------------
namespace wc
{
class wc
{
static void Main(string[] args)
{
string line;
var count = 0;
if (args.Length > 0)
{
Console.SetIn(new StreamReader(args[0]));
}
while ((line = Console.ReadLine()) != null)
{
count += line.Count(cr => (cr == ' ' || cr == '\n'));
}
Console.WriteLine(count);
}
}
}
How do I get the name of the text file as key?
I want the output to show key value. key being the name of the file and value being number of words in the file
I have multiple files.
In order to get the name of text file processed by Mapper as key you can use the below command in your mapper function.
string Key = Environment.GetEnvironmentVariable("map_input_file");
Modify your Mapper code as:
namespace wc
{
class wc
{
static void Main(string[] args)
{
string line;
var count = 0;
if (args.Length > 0)
{
Console.SetIn(new StreamReader(args[0]));
}
while ((line = Console.ReadLine()) != null)
{
count += line.Count(cr => (cr == ' ' || cr == '\n'));
}
string Key = Environment.GetEnvironmentVariable("map_input_file");
var output = String.Format("{0}\t{1}",Key, count);
Console.WriteLine(output);
}
}
}
Hope this helps.
I have one listbox in XAML and i am binging some data with this litbox.like bellow
lstsumitedreport = new ObservableCollection<ClsGetSubmittedReport>();
if (ResultCode == "1")
{
JArray arry = (JArray)obj["GetSubmittedReportComment"];
if (arry != null)
{
total = arry.Count;
for (int i = 0; i < arry.Count; i++)
{
JObject obj1 = (JObject)arry[i];
int reportId = (int)obj1["ReportId"];
string positiveCmnt = (string)obj1["PositiveComment"];
string NagativeCmnt = (string)obj1["negativecomment"];
DateTime dt = (DateTime)obj1["TimeStamp"];
string time = (string)obj1["TimeStampString"];
DateTime dtlocal = DateTime.ParseExact(time, "yyyy/MM/dd HH:mm:ss", CultureInfo.InvariantCulture);
string timestamp = dtlocal.ToString("MM/dd/yyyy hh:mm tt", CultureInfo.InvariantCulture);
lstsumitedreport.Add(new ClsGetSubmittedReport(reportId, positiveCmnt, NagativeCmnt, timestamp, Csi, goodimagepath, bedimagepath, goodimgline, bedimgline, font, PHeight, NHeight));
}
}
TransactionList.ItemsSource = null;
this.TransactionList.ItemsSource = lstsumitedreport;
tbOutstandingCount.Text = total.ToString();
}
Here transection list is my listbox name.
after debuging of bellow line it will take to much time to display data.
this.TransactionList.ItemsSource = lstsumitedreport;
how can i solve this proble