I am trying to upload the csv through taxonomy csv upload module but it gives an error of term character limit to 255 on uploading the terms with more characters.
Is there any way to increase this limit so that i could upload my terms.
thanks
The term name, by default, is limited to a total of 255 chars in the mySQL database. There is nothing that can go beyond this, as far as I can see. Since you cannot modify the database, there is no way to make this beyond 255 characters, without first making some major modifications to the database, which could, in turn, give you many issues. If your terms are beyond 255 characters, the best bet is to restrict the names to less than 255.
For reference purposes, the taxonomy csv import module does limit the amount of characters you can upload by default. The following file is where they are located: taxonomy_csv/import/taxonomy_csv.import.parser.api.inc. Inside of this file, you will find the numbers on the following line numbers:
50, 142, 159, 196, 235, 279, 306, and 341
As always, back up your database before changing any settings or structure.
Related
https://nodejs.org/api/readline.html
provides this solution for reading large files like CSVs line by line:
const { createReadStream } = require('fs');
const { createInterface } = require('readline');
(async function processLineByLine() {
try {
const rl = createInterface({
input: createReadStream('big-file.txt'),
crlfDelay: Infinity
});
rl.on('line', (line) => {
// Process the line.
});
await once(rl, 'close');
console.log('File processed.');
} catch (err) {
console.error(err);
}
})();
But I dont want to read the entire file from beginning to end but parts of it say from line number 1 to 10000, 20000 to 30000, etc.
Basically I want to be able to set a 'start' & 'end' line for a given run of my function.
Is this doable with readline & fs.createReadStream?
If not please suggest alternate approach.
PS: It's a large file (around 1 GB) & loading it in memory causes memory issues.
But I don't want to read the entire file from beginning to end but parts of it say from line number 1 to 10000, 20000 to 30000, etc.
Unless your lines are of fixed, identical length, there is NO way to know where line 10,000 starts without reading from the beginning of the file and counting lines until you get to line 10,000. That's how text files with variable length lines work. Lines in the file are not physical structures that the file system knows anything about. To the file system, the file is just a gigantic blob of data. The concept of lines is something we invent at a higher level and thus the file system or OS knows nothing about lines. The only way to know where lines are is to read the data and "parse" it into lines by searching for line delimiters. So, line 10,000 is only found by searching for the 10,000th line delimiter starting from the beginning of the file and counting.
There is no way around it, unless you preprocess the file into a more efficient format (like a database) or create an index of line positions.
Basically I want to be able to set a 'start' & 'end' line for a given run of my function.
The only way to do that is to "index" the data ahead of time so you already know where each line starts/ends. Some text editors made to handle very large files do this. They read through the file (perhaps lazily) reading every line and build an in-memory index of what file offset each line starts at. Then, they can retrieve specific blocks of lines by consulting the index and reading that set of data from the file.
Is this doable with readline & fs.createReadStream?
Without fixed length lines, there's no way to know where in the file line 10,000 starts without counting from the beginning.
It's a large file(around 1 GB) & loading it in memory causes MEMORY ISSUES.
Streaming the file a line at a time with the linereader module or others that do something similar will handle the memory issue just fine so that only a block of data from the file is in memory at any given time. You can handle arbitrarily large files even in a small memory system this way.
A new line is just a character (or two characters if you're on windows), you have no way of knowing where those characters are without processing the file.
You are however able to read only a certain byte range in a file. If you know for a fact that every line contains 64 bytes, you can skip the first 100 lines by starting your read at byte 6400, and you can read only 100 lines by stopping your read at byte 12800.
Details on how to specify start and end points are available in the createReadStream docs.
HI and thanks for any help. Is there a way to work with files larger than 10mg? I have to check for updates on items in a file that would be uploaded, but the file contains all items in the system and is approximately 20MG. This 10MG limit is killing me. I see streaming for file save and appending but not for file reading. So I am open to any suggestions. The provider in this instance doesn't offer the facility to chunk the files. thanks in advance for your help.
If you are using SS2 to process a file from the file cabinet then if you use file.lines.iterator() to process a file the size limit is 10MB per line.
I believe returning a file object from a map reduce script's getInputStage automatically parses the file into lines.
The 10MB file size limit comes into play if you try to create a file larger than 10MB.
If you are trying to read in a an external file via script then one approach that I've used is to proxy the call via an external service. e.g. query an AWS lambda function that checks for and saves the file to S3. Return the file path and size to your SuiteScript. The SuiteScript then asks for "pages" of the file that are less than 10MB and saves those. If you are uploading something like a .csv then the lambda function can send the header with each paged request.
Now I have a question. I need to use weka fiter to handle the data which is the object of Instances class.The codes are here.
CSVLoader loader=new CSVLoader();
loader.setSource(new File(path to file.csv));
Instances data=loader.getDataSet();
data.setClassIndex(data.numAttributes()-1);
LibSVM classifier=new LibSVM();
Evaluation eval=new Evaluation(data);
classifier.setOptions(LIBSVM_OPTIONS);
eval.crossValidateModel(classifier,data,10,new Random(1));
When I run the code, it gave the information:weka.classifer.functions.LibSVM:Cannot handle string attributes!
Some attributes are string type.
I want to use weka filter to handle the data object.There are some string elements in the data.The raw csv file data is like this.
title1,title2,title3,title4,title5,title6
123, 122, 112, 121, 121, 123
121, 123, 121, 123, inf, 121
123, inf, 123, 123, 123, 123
Of course, the csv file is saved in excel.Its name is like abc.csv. There are a lot of numbers in the csv file. But there are some string type elements, for example, inf. Now I have to use weka filter to use a large number to replace the string inf in the csv file. I don't need to handle the csv file first using OPENcsv package and then using CSVLoader to load the new csv file that contains numbers. I need to use weka filter to handle that after using CSVLoader and creating the Instances object.
I searched a lot about it but I can't find the answer. So can I use a weka filter to use a large number to replace the string inf in csv file? Then all attributes are numeric type.
Thanks!
If there is only one specific string that you need to be able to substitute - for example the string "inf" - then according to the CSVLoader class documentation you should be able to handle this using the setMissingValue method. By setting the missing value string to inf, you will import all inf values as missing values. If all the rest of the data in a column is numeric, that column should then get correctly imported as a numeric attribute.
If you really want Weka to treat these replaced values as a large number, you can then apply weka.filters.unsupervised.attribute.ReplaceMissingWithUserConstant. However I would check whether that really makes sense in modelling terms - what does it actually mean when one of the attributes is inf?. At a guess, if the size of the value you substitute affects the result of the model then you probably shouldn't do it.
If your data contains more than one different string that you need to handle, I don't see a straightforward way of doing it with Weka filters. However instead of passing a file to CSVLoader, the docs say you can also pass a java.io.InputStream. If you can't process and save a new csv file for some reason, perhaps you could write a subclass of InputStream that filters out the string values as it reads the file.
I have a .bin file on my hard drive.
It's recl is nx*ny*4. Its dimensions are (241,121). 241 in x dimension. 121 in y dimension.
How would I convert it using fortran to an ascii file that I can open and read numbers off of?
So, far I have tried
real :: g1(241,121)
open(unit=1,file=gaugemax2010.bin',status='old',
form='unformatted',access='direct',recl=nx*ny*4)
open(unit=5,file='g2010.txt',status='unknown',
form='unformatted',access='direct',recl=1)
read(1, rec=1) ((g1(i,j,),i=1,nx,j=1,ny)
write(5, rec=1) (g1(i,j,),i=1,241),h=1,121)
end
and it has not worked
FORM='UNFORMATTED' opens a file for binary content. For pure text you have to specify FORM='FORMATTED'.
For more details on the OPEN statement see here: Opening Binary Files in Fortran: Status, Form, Access
I'm trying to sort ~13,000 documents on my Mac's local CouchDB database by date, but it gets hung up on document 5407 each time. I've tried increasing the time-out tolerance on Futon but to no avail. This is the error message I'm getting:
for row in db.view('index15/by_date_time', startkey=start, endkey=end):
File "/Library/Python/2.6/site-packages/CouchDB-0.8-py2.6.egg/couchdb/client.py", line 984, in iter
File "/Library/Python/2.6/site-packages/CouchDB-0.8-py2.6.egg/couchdb/client.py", line 1003, in rows
File "/Library/Python/2.6/site-packages/CouchDB-0.8-py2.6.egg/couchdb/client.py", line 990, in _fetch
File "/Library/Python/2.6/site-packages/CouchDB-0.8-py2.6.egg/couchdb/client.py", line 880, in _exec
File "/Library/Python/2.6/site-packages/CouchDB-0.8-py2.6.egg/couchdb/http.py", line 393, in get_json
File "/Library/Python/2.6/site-packages/CouchDB-0.8-py2.6.egg/couchdb/http.py", line 374, in get
File "/Library/Python/2.6/site-packages/CouchDB-0.8-py2.6.egg/couchdb/http.py", line 419, in _request
File "/Library/Python/2.6/site-packages/CouchDB-0.8-py2.6.egg/couchdb/http.py", line 239, in request
File "/Library/Python/2.6/site-packages/CouchDB-0.8-py2.6.egg/couchdb/http.py", line 205, in _try_request_with_retries
socket.error: 54
incidentally, this is the same error message that is produced when I have a typo in my script.
I'm using couchpy to create the view as follows:
def dateTimeToDocMapper(doc):
from dateutil.parser import parse
from datetime import datetime as dt
if doc.get('Date'):
# [year, month, day, hour, min, sec]
_date = list(dt.timetuple(parse(doc['Date']))[:-3])
yield (_date, doc)
while this is running, I can open a python shell and using server.tasks() I can see that the indexing is indeed taking place.
>>> server.tasks()
[{u'status': u'Processed 75 of 13567 changes (0%)', u'pid': u'<0.451.0>', u'task': u'gmail2 _design/index11', u'type': u'View Group Indexer'}]
but each time it gets stuck on process 5407 of 13567 changes (it takes ~8 minutes to get this far). I have examined what I believe to be document 5407 and it doesn't appear to be anything out of the ordinary.
Incidentally, if I try to restart the process after it stops, I get this response from server.tasks()
>>> server.tasks()
[{u'status': u'Processed 0 of 8160 changes (0%)', u'pid': u'<0.1224.0>', u'task': u'gmail2 _design/index11', u'type': u'View Group Indexer'}]
in other words, couchDB seems to have recognized that it's already processed the first 5407 of the 13567 changes and now has only 8160 left.
but then it almost immediately quits and gives me the same socket.error: 54
I have been searching the internet for the last few hours to no avail. I have tried initiating the indexing from other locations, such as Futon. As I mentioned, one of my errors was an OS timeout error, and increasing the time_out thresholds in Futon's configuration seemed to help with that.
Please, if anyone could shed light on this issue, I would be very very grateful. I'm wondering if there's a way to restart the process once its already indexed 5407 documents, or better yet if there's a way to prevent the thing from quitting 1/3 of the way through in the first place.
Thanks so much.
From what I gather, CouchDB builds your view contents by sending all documents to your couchpy view server, which runs your Python code on that document. If that code fails for any reason, CouchDB will be notified that something went wrong, which will stop the update of the view contents.
So, there is something wrong with document 5408 that causes your Python code to misbehave. If you need more help, I suggest you post that document here. Alternatively, look into the logs for your couchpy view server: they might contain information about how your code failed.