GParsPool.withPool(numberPool) {
connection.withBatch(10000) { stmt ->
inputFile.eachParallel { data ->
//GParsPool.withPool() {
stmt.addBatch("DELETE FROM user WHERE number = ${data.toLong()} ")
println "IN"
//}
}
println "OUT"
Long startTimee = System.currentTimeMillis()
stmt.executeBatch()
println "deleted Batch"
Long endTime = System.currentTimeMillis()
println "Time taken for each batch: " + ((endTime - startTimee) / 1000)
}
}
The Above code is used to delete data from database. I First get the data from file and then match the each file data with database data and perform the delete query. But I have 5533179 records its take to much time. Even I have used the gpars but i get the same performance issue that is given by without using gpars. I have set the numberPool=5 but same issue. Even I increase the numberPool again same issue
why don't you use the SQL in operator? thus you can process the data much faster.
UPDATE:
from the top of my head:
GParsPool.withPool(numberPool) {
Map buffPerThread = [:].withDefaults{ [] }
inputFile.eachParallel { data ->
def buff = buffPerThread[ Thread.currentThread().id ]
buff << data.toLong()
if( 1000 == buff.size() ){
sql.execute 'DELETE FROM user WHERE number in (?)', [ buff ]
buff.clear()
}
}
}
I wouldn't use conn.withBatch here, as the in statement give the desired batching already
Related
Hi everyone,
I am trying to test C programs that use an user input... Like a learning app. So the avaliator(teacher) can write tests and I compile the code with a help of a docker and get back the result of the program that I send. After that I verify if one of the case tests fails..
for that I have two strings, like this:
result = "input_compiled1540323505983: /home/compiler/input/input.c:9: main: Assertion `B==2' failed. timeout: the monitored command dumped core Aborted "
and an array with case tests that is like:
caseTests = [" assert(A==3); // A must have the value of 3;", " assert(B==2); // B must have the value of 2; ", " assert(strcmp(Fulano, "Fulano")==0); //Fulano must be equal to Fulano]
I need to send back from my server something like this:
{ console: [true, true, true ] }
Where each true is the corresponding test for every test in the array of tests
So, I need to test if one string contains the part of another string... and for now I did like this:
criandoConsole = function(arrayErros, arrayResult){
var consol = arrayErros.map( function( elem ) {
var local = elem.match(/\((.*)\)/);
if(arrayResult.indexOf(local) > -1 ) {
return false;
}
else return true;
});
return consol;
}
I am wondering if there are any more efective way of doing that. I am using a nodejs as server. Does anyone know a better way?!
ps: Just do like result.contains(caseTests[0]) did not work..
I know this is changing the problem, but can you simplify the error array to only include the search terms? For example,
result = "input_compiled1540323505983: /home/compiler/input/input.c:9: main: Assertion `B==2' failed. timeout: the monitored command dumped core Aborted ";
//simplify the search patterns
caseTests = [
"A==3",
"B==2",
"strcmp(Fulano, \"Fulano\")==0"
]
criandoConsole = function(arrayErros, arrayResult){
var consol = arrayErros.map( function( elem ) {
if (arrayResult.indexOf(elem) != -1)
return false; //assert failed?
else
return true; //success?
});
return consol;
}
console.log(criandoConsole(caseTests,result));
I have been stuck with this problem for a month.
I've been trying to use firebase to create a FIFO Inventory. I am using firebase cloud function to update the FIFO inventory. However if I stress test the following code just 10 times with for loop for both insert (push) and remove (pop), it breaks because of the concurrent update.
Does anyone have another solution for this?
Insert FIFO/PUSH:
let fifoRef = admin.database().ref('fifo/' + item.itemId + '/').push();
let fifo = {
price: data.items[uniqueId].price,
in: data.items[uniqueId].quantity,
quantity: data.items[uniqueId].quantity,
}
fifoRef.set(fifo);
Get FIFO Value/POP (Here I simply update the quantity for POP):
// get fifo
let fifoReference = 'fifo/' + item.itemId;
let fifoRef = admin.database().ref(fifoReference);
fifoRef.once('value').then(currentData => {
let fifo = currentData.val();
for (let key in fifo) {
let val = fifo[key];
if (val.quantity > 0) {
// get fifo quantity
let fifoQuantityRef = admin.database().ref(fifoReference + '/' + key + '/quantity/');
// get local cache value
let fifoQuantityListener = fifoQuantityRef.on('value', () => {
// transaction start
fifoQuantityRef.transaction(function (quantity) {
if (quantity) {
if (quantity > 0 && quantitySubtotal > 0) {
if (quantity >= quantitySubtotal) {
// minus inventory amount
let amount = calculator(quantitySubtotal + "*" + val.price);
quantity = calculator(quantity + "-" + quantitySubtotal);
quantitySubtotal = 0;
// update quantity
return quantity;
} else {
let amount = calculator(quantity + "*" + val.price);
quantitySubtotal = calculator(quantitySubtotal + "-" + quantity);
return 0;
}
}
}
return quantity;
}, (error, committed, result) => {
fifoQuantityRef.off('value', fifoQuantityListener);
}, true);
});
Brainstorming:
I just need Insight on how to get the VALUE using FIFO. From my understanding Firebase is best used to insert and remove not with transaction. But if I am using only insert and remove, how to create a FIFO? If I create FIFO with the quantity of 1, the data stored will be too large.
I did try to use Google Datastore, however datastore persistence is extremely slow (over 2 seconds for writing). Which is not possible to be used with firebase which have persistence of less than 1 second. The problem arise when PUSH and POP is done within 1 seconds, datastore insert is not persisted yet.
Any other brainstorming ideas?
I have parquet data partitioned by date & hour, folder structure:
events_v3
-- event_date=2015-01-01
-- event_hour=2015-01-1
-- part10000.parquet.gz
-- event_date=2015-01-02
-- event_hour=5
-- part10000.parquet.gz
I have created a table raw_events via spark but when I try to query, it scans all the directories for footer and that slows down the initial query, even if I am querying only one day worth of data.
query:
select * from raw_events where event_date='2016-01-01'
similar problem : http://mail-archives.apache.org/mod_mbox/spark-user/201508.mbox/%3CCAAswR-7Qbd2tdLSsO76zyw9tvs-Njw2YVd36bRfCG3DKZrH0tw#mail.gmail.com%3E ( but its old)
Log:
App > 16/09/15 03:14:03 main INFO HadoopFsRelation: Listing leaf files and directories in parallel under: s3a://bucket/events_v3/
and then it spawns 350 tasks since there are 350 days worth of data.
I have disabled schemaMerge, and have also specified the schema to read as, so it can just go to the partition that I am looking at, why should it print all the leaf files ?
Listing leaf files with 2 executors take 10 minutes, and the query actual execution takes on 20 seconds
code sample:
val sparkSession = org.apache.spark.sql.SparkSession.builder.getOrCreate()
val df = sparkSession.read.option("mergeSchema","false").format("parquet").load("s3a://bucket/events_v3")
df.createOrReplaceTempView("temp_events")
sparkSession.sql(
"""
|select verb,count(*) from temp_events where event_date = "2016-01-01" group by verb
""".stripMargin).show()
As soon as spark is given a directory to read from it issues call to listLeafFiles (org/apache/spark/sql/execution/datasources/fileSourceInterfaces.scala). This in turn calls fs.listStatus which makes an api call to get list of files and directories. Now for each directory this method is called again. This hapens recursively until no directories are left. This by design works good in a HDFS system. But works bad in s3 since list file is an RPC call. S3 on other had supports get all files by prefix, which is exactly what we need.
So for example if we had above directory structure with 1 year worth of data with each directory for hour and 10 sub directory we would have , 365 * 24 * 10 = 87k api calls, this can be reduced to 138 api calls given that there are only 137000 files. Each s3 api calls return 1000 files.
Code:
org/apache/hadoop/fs/s3a/S3AFileSystem.java
public FileStatus[] listStatusRecursively(Path f) throws FileNotFoundException,
IOException {
String key = pathToKey(f);
if (LOG.isDebugEnabled()) {
LOG.debug("List status for path: " + f);
}
final List<FileStatus> result = new ArrayList<FileStatus>();
final FileStatus fileStatus = getFileStatus(f);
if (fileStatus.isDirectory()) {
if (!key.isEmpty()) {
key = key + "/";
}
ListObjectsRequest request = new ListObjectsRequest();
request.setBucketName(bucket);
request.setPrefix(key);
request.setMaxKeys(maxKeys);
if (LOG.isDebugEnabled()) {
LOG.debug("listStatus: doing listObjects for directory " + key);
}
ObjectListing objects = s3.listObjects(request);
statistics.incrementReadOps(1);
while (true) {
for (S3ObjectSummary summary : objects.getObjectSummaries()) {
Path keyPath = keyToPath(summary.getKey()).makeQualified(uri, workingDir);
// Skip over keys that are ourselves and old S3N _$folder$ files
if (keyPath.equals(f) || summary.getKey().endsWith(S3N_FOLDER_SUFFIX)) {
if (LOG.isDebugEnabled()) {
LOG.debug("Ignoring: " + keyPath);
}
continue;
}
if (objectRepresentsDirectory(summary.getKey(), summary.getSize())) {
result.add(new S3AFileStatus(true, true, keyPath));
if (LOG.isDebugEnabled()) {
LOG.debug("Adding: fd: " + keyPath);
}
} else {
result.add(new S3AFileStatus(summary.getSize(),
dateToLong(summary.getLastModified()), keyPath,
getDefaultBlockSize(f.makeQualified(uri, workingDir))));
if (LOG.isDebugEnabled()) {
LOG.debug("Adding: fi: " + keyPath);
}
}
}
for (String prefix : objects.getCommonPrefixes()) {
Path keyPath = keyToPath(prefix).makeQualified(uri, workingDir);
if (keyPath.equals(f)) {
continue;
}
result.add(new S3AFileStatus(true, false, keyPath));
if (LOG.isDebugEnabled()) {
LOG.debug("Adding: rd: " + keyPath);
}
}
if (objects.isTruncated()) {
if (LOG.isDebugEnabled()) {
LOG.debug("listStatus: list truncated - getting next batch");
}
objects = s3.listNextBatchOfObjects(objects);
statistics.incrementReadOps(1);
} else {
break;
}
}
} else {
if (LOG.isDebugEnabled()) {
LOG.debug("Adding: rd (not a dir): " + f);
}
result.add(fileStatus);
}
return result.toArray(new FileStatus[result.size()]);
}
/org/apache/spark/sql/execution/datasources/fileSourceInterfaces.scala
def listLeafFiles(fs: FileSystem, status: FileStatus, filter: PathFilter): Array[FileStatus] = {
logTrace(s"Listing ${status.getPath}")
val name = status.getPath.getName.toLowerCase
if (shouldFilterOut(name)) {
Array.empty[FileStatus]
}
else {
val statuses = {
val stats = if(fs.isInstanceOf[S3AFileSystem]){
logWarning("Using Monkey patched version of list status")
println("Using Monkey patched version of list status")
val a = fs.asInstanceOf[S3AFileSystem].listStatusRecursively(status.getPath)
a
// Array.empty[FileStatus]
}
else{
val (dirs, files) = fs.listStatus(status.getPath).partition(_.isDirectory)
files ++ dirs.flatMap(dir => listLeafFiles(fs, dir, filter))
}
if (filter != null) stats.filter(f => filter.accept(f.getPath)) else stats
}
// statuses do not have any dirs.
statuses.filterNot(status => shouldFilterOut(status.getPath.getName)).map {
case f: LocatedFileStatus => f
// NOTE:
//
// - Although S3/S3A/S3N file system can be quite slow for remote file metadata
// operations, calling `getFileBlockLocations` does no harm here since these file system
// implementations don't actually issue RPC for this method.
//
// - Here we are calling `getFileBlockLocations` in a sequential manner, but it should not
// be a big deal since we always use to `listLeafFilesInParallel` when the number of
// paths exceeds threshold.
case f => createLocatedFileStatus(f, fs.getFileBlockLocations(f, 0, f.getLen))
}
}
}
To clarify Gaurav's answer, that code snipped is from Hadoop branch-2, Probably not going to surface until Hadoop 2.9 (see HADOOP-13208); and someone needs to update Spark to use that feature (which won't harm code using HDFS, just won't show any speedup there).
One thing to consider is: what makes a good file layout for Object Stores.
Don't have deep directory trees with only a few files per directory
Do have shallow trees with many files
Consider using the first few characters of a file for the most changing value (such as day/hour), rather than the last. Why? Some object stores appear to use the leading characters for their hashing, not the trailing ones ... if you give your names more uniqueness then they get spread out over more servers, with better bandwidth/less risk of throttling.
If you are using the Hadoop 2.7 libraries, switch to s3a:// over s3n://. It's already faster, and getting better every week, at least in the ASF source tree.
Finally, Apache Hadoop, Apache Spark and related projects are all open source. Contributions are welcome. That's not just the code, it's documentation, testing, and, for this performance stuff, testing against your actual datasets. Even giving us details about what causes problems (and your dataset layouts) is interesting.
I would like to write a system groovy script which inspects the queued jobs in Jenkins, and extracts the build parameters (and build cause as a bonus) supplied as the job was scheduled. Ideas?
Specifically:
def q = Jenkins.instance.queue
q.items.each { println it.task.name }
retrieves the queued items. I can't for the life of me figure out where the build parameters live.
The closest I am getting is this:
def q = Jenkins.instance.queue
q.items.each {
println("${it.task.name}:")
it.task.properties.each { key, val ->
println(" ${key}=${val}")
}
}
This gets me this:
4.1.next-build-launcher:
com.sonyericsson.jenkins.plugins.bfa.model.ScannerJobProperty$ScannerJobPropertyDescriptor#b299407=com.sonyericsson.jenkins.plugins.bfa.model.ScannerJobProperty#5e04bfd7
com.chikli.hudson.plugin.naginator.NaginatorOptOutProperty$DescriptorImpl#40d04eaa=com.chikli.hudson.plugin.naginator.NaginatorOptOutProperty#16b308db
hudson.model.ParametersDefinitionProperty$DescriptorImpl#b744c43=hudson.mod el.ParametersDefinitionProperty#440a6d81
...
The params property of the queue element itself contains a string with the parameters in a property file format -- key=value with multiple parameters separated by newlines.
def q = Jenkins.instance.queue
q.items.each {
println("${it.task.name}:")
println("Parameters: ${it.params}")
}
yields:
dbacher params:
Parameters:
MyParameter=Hello world
BoolParameter=true
I'm no Groovy expert, but when exploring the Jenkins scripting interface, I've found the following functions to be very helpful:
def showProps(inst, prefix="Properties:") {
println prefix
for (prop in inst.properties) {
def pc = ""
if (prop.value != null) {
pc = prop.value.class
}
println(" $prop.key : $prop.value ($pc)")
}
}
def showMethods(inst, prefix="Methods:") {
println prefix
inst.metaClass.methods.name.unique().each {
println " $it"
}
}
The showProps function reveals that the queue element has another property named causes that you'll need to do some more decoding on:
causes : [hudson.model.Cause$UserIdCause#56af8f1c] (class java.util.Collections$UnmodifiableRandomAccessList)
Can someone give me a clue how can I get the fast access of the excel data. Currently the excel contains more than 200K records and when I retrieve from the X++ code it takes a lot of time to retrieve all the records.
Following are the classes I am using to retrieve the data.
1 - SysExcelApplication, SysExcelWorksheet and SysExcelCells.
I am using the below code to retrieve cells.
excelApp.workbooks().open(filename);
excelWorksheet = excelApp.worksheets().itemFromName(itemName);
excelCells = excelWorkSheet.cells();
///pseudo code
loop
excelCells.item(rowcounter, column1);
similar for all columns;
end of loop
If any of the special property needs to be set here please tell me.
Overall performance will be a lot better (huge!) if you can use CSV files. If you are forced to use Excel files, you can easy and straigforward convert this excel file to a csv file and then read the csv file. If you can't work that way, you can read excel files throug ODBC (using a query string like connecting to a database) that will perform better that the Office API.
First things, reading Excel files (and any other file) will take a while for 200 K records.
You can read an Excel file using ExcelIo, but with no performance guaranties :)
As I see it, you have 3 options (best performance listed first):
Convert your Excel file to CSV file, then read with CommaIo.
Read the Excel file using C#, then call back to X++
Accept the fact and take the time
use CSV, it is faster, below is code example:
/* Excel Import*/
#AviFiles
#define.CurrentVersion(1)
#define.Version1(1)
#localmacro.CurrentList
#endmacro
FilenameOpen filename;
CommaIo file;
Container con;
/* File Open Dialog */
Dialog dialog;
dialogField dialogFilename;
dialogField dialogSiteID;
dialogField dialogLocationId;
DialogButton dialogButton;
InventSite objInventSite;
InventLocation objInventLocation;
InventSiteID objInventSiteID;
InventLocationId objInventLocationID;
int row;
str sSite;
NoYes IsCountingFound;
int iQty;
Counter insertCounter;
Price itemPrice;
ItemId _itemid;
EcoResItemColorName _inventColorID;
EcoResItemSizeName _inventSizeID;
dialog = new Dialog("Please select file");
dialogSiteID = dialog.addField(extendedTypeStr(InventSiteId), objInventSiteId);
dialogLocationId = dialog.addField(extendedTypeStr(InventLocationId), objInventLocationId);
dialogFilename = dialog.addField(extendedTypeStr(FilenameOpen));
dialog.filenameLookupFilter(["#SYS100852","*.csv"]);
dialog.filenameLookupTitle("Please select file");
dialog.caption("Please select file");
dialogFilename.value(filename);
if(!dialog.run())
return;
objInventSiteID = dialogSiteID.value();
objInventLocationID = dialogLocationId.value();
/*----- validating warehouse*/
while
select maxof(InventSiteId) from objInventLocation where objInventLocation.InventLocationId == objInventLocationId
{
If(objInventLocation.InventSiteID != objInventSiteID)
{
warning("Warehouse not belongs to site. Please select valid warehouse." ,"Counting lines import utility");
return;
}
}
filename = dialogFilename.value();
file = new commaIo(filename,'r');
file.inFieldDelimiter(',');
try
{
if (file)
{
ttsbegin;
while(file.status() == IO_Status::OK)
{
con = file.read();
if (con)
{
row ++;
if(row == 1)
{
if(
strUpr(strLtrim(strRtrim( conpeek(con,1) ))) != "ITEM"
|| strUpr(strLtrim(strRtrim( conpeek(con,2) ))) != "COLOR"
|| strUpr(strLtrim(strRtrim( conpeek(con,3) ))) != "SIZE"
|| strUpr(strLtrim(strRtrim( conpeek(con,4) ))) != "PRICE"
)
{
error("Imported file is not according to given format.");
ttsabort;
return;
}
}
else
{
IsCountingFound = NoYes::No;
_itemid = "";
_inventColorID = "";
_inventSizeID = "";
_itemid = strLtrim(strRtrim(conpeek(con,1) ));
_inventColorID = strLtrim(strRtrim(conpeek(con,2) ));
_inventSizeID = strLtrim(strRtrim(conpeek(con,3) ));
itemPrice = any2real(strLtrim(strRtrim(conpeek(con,4) )));
}
}
}
if(row <= 1)
{
ttsabort;
warning("No data found in excel file");
}
else
{
ttscommit;
}
}
}
catch
{
ttsabort;
Error('Upload Failed');
}