When I public batch (upload csv files) in Amazon Mechanical Turk, it always pumped errors for the row after my last data observation? - amazon

I am trying to publish a batch on Amazon Mechanical Turk.
All the design part and csv file organizing part have been done by my professor and I. I am pretty sure these parts are correct.
However, my data only has 27921 rows (the last line number in csv is 27921). But after I click publish tab, the MTturk always pumped up an error message regarding the line 27922, which is completely empty in my file!
I have tried to download the template and paste my original data into that template. It didn't work.
The Error is:
Line 27722: Row has 1 column while the header has 2

I just had the exact same problem.
For some reason mturk doesn't identify a new blank line as an end of the file.
I opened the csv file in a text editor (in my case notepad++ but I guess a regular text editor will work aswell) and just deleted the last line.

Related

SSIS: failed to retrieve long data / truncation errors

I'm getting either of those two errors when trying to export data from a set of excel spreadsheets.
Simplified scenario:
two excel spreadsheets containing 1 text column
in file 1 the text is never longer than 200 characters
in the 2nd - it is.
SSIS suppose to import them automatically from a folder - easy and simple, but...
Excel source component decides what data type is used here.
When, using created by me sample file with sample text data, it decides to use DT_WSTR(255) it fails with the second file with the truncation error.
When I force it to use DT_NTEXT (by creating longer text in the sample file) if fails with the 1st file complaining that "Failed to retrieve long data for column"... because the 1st file doesn't contain longer texts...
Has anybody found a solution/work-around for this? I mean - except manually changing the source data?
Any help much appreciated.
We can use Flat File Connection Manager instead of Excel Connection Manager. When we create Flat File Connection Manager we can set data type and length explicitly. To do so first we need to save the excel file as csv file or tab delimited file. Then we can use this file to create Flat File Connection. Drag and drop a Flat File Source in the Data Flow tab. In the Flat File Source Editor dialog box click New button and it will launches Flat File Connection Manager Editor dialog box. In the General tab specify the file full path and click Advanced tab. Then put data type and column width like below image.
Click OK and close the dialog box, this will create our connection manager. Now the connection manager can successfully read the full length data but we have to set the data type & length of the Output Columns so that we can get the data in the output pipeline. To do that right click on the Flat File Source and click Show Advanced Editor option. Then follow the below image instruction.
When we finish we run our package and it run successfully without any truncation error and insert all the data in our target database.

How to transfer each line of a text file to Excel cell?

I need to transfer some pdf table content to Excel. I used the PyMuPDF module to be able to put the PDF content to a .txt file. In which it is easier to work with and I did it successfully.
As you can see in the .txt file I was able to transfer each column and row of the pdf. They are displayed sequentially.
- I need some way to read the txt strings sequentially so I can put each line of the txt into a .xlsx cell.
- Some way to setup triggers to start reading the document sequentially and lines to throw away.
Example: Start reading after a specific word, stop reading when some word is reached. Things like this. Because these documents have headers and unuseful information that are also transcript to the txt file. So I need to ignore some contents of the txt to gather only the useful information to put in the .xlsx cells.
*I'm using the xlrd library, I would like to know how I can work with things here. (optional)
I don't know if it is a problem, but when I use the count method to count the number of lines, it returned only 15 lines. The document has 568 lines in total. It only showed the full empty ones.
with open(nome_arquivo_nota, 'r'):
for line in nome_arquivo_nota:
count += 1
print(count)
= 15 .

excel breaks content in row to another row

i have an excel sheet that i have exported from a website, i have noticed that in some particular rows the content jumps to a new line. i have searched online, but no credible answer to my problem
what is the cause of this and how can it be solved.
i have even tried to copy them one by one to make them be on the same line, but i cant keep on doin that
here is a link to my file.
download
so that you can have a view of what i am talking about
The address field in your file contains newlines in certain records. I suggest you open the file in Notepad and join these lines together before importing the file (make sure you turn word wrap off to see the lines correctly).

save datawindow as text in powerbuilder with some additional text

***Process Date From:
01/05/2012 0:00
Group;Member
Status:****
Rcp Cd Health Num Rcp Name Rcp Dob
1042231 1 MARIA TOVAR DIAS 14-Feb-05
1042256 2 KHALID KHAN 04-Mar-70
1042257 3 SAMREEN ISMAT 25-Mar-80
1042257 5 SAMREEN ISMAT 25-Mar-80
1042257 4 SAMREEN ISMAT 25-Mar-80
I want my Powerbuilder datawindow Save As text look like this Bold text are the additional text want to add and rest is the current save as text result.
Text files cannot contain formatting. There's no way to get bold text in a plain text file. I suggest adding the text to your datawindow header band (bolded, with an expression to make sure it only displays on the first page), then saving the results as HTML.
Well, you didn't mention which version of PB you are using, so I'll assume a recent one in which case you have some better options such as SaveAsAscii and/or SaveAsFormattedText which offer more flexibility in displaying column headers, computed fields, etc.
If you want to add the top section, I would add one or more additional dummy columns (or computed fields) to your dataobject for the additional data. Then either populate the dummy columns manually after retrieve, or via expression in computed field. You could put all of it in one computed field that wraps, or use four different ones (e.g. process_date_label, process_datetime, group_status, status).
The two newer versions of SaveAs will work better for you as they display column header values instead of the column header name. SaveAsAscii came out pretty early somewhere around version 7 of PowerBuilder. SaveAsFormattedText is relatively new and came out somewhere around PB version 11 and it is a lot like SaveAsAscii but it lets you choose file encoding.
If you need more explicit detail let me know but I am sure you can get something to work using SaveAsAscii and extra columns.
Pseudo code
Do the SaveAs to temp file
Open the temp file for read in line mode
Open output file for write (replace) in line mode
Write your additional text lines to the output file (note: you can include CRLF to
write multiple lines at once)
Loop:
Read line from temp file
If EOF exit loop. Note: 0 is not EOF, -100 is EOF
Write line to output file
Close temp file, output file
Delete temp file

CSV Exporting: Preserving leading zeros

I'm working on a .NET application which exports CSV files to open in Excel and I'm having a problem with preserving leading zeros when the file is opened in Excel. I've used the method mentioned at http://creativyst.com/Doc/Articles/CSV/CSV01.htm#CSVAndExcel
This works great until the user decides to save the CSV file within Excel. If the file is opened again in Excel then the leading zeros are lost.
Is there anything I can do when generating the CSV file to prevent this from happening.
This is not a CSV issue.
This is Excel loving to play with CSV files.
Change the extension to something else.
As #GSerg mentions, this is not a CSV issue.
If your users must edit/save in Excel they need to select the entire worksheet, right-click and choose "Format Cells" and from the Category list select "Text" after opening the csv file. This will preserve the leading zeros since the numbers will be treated as simple text.
Alternatively, you could use Open XML SDK 2.0, or some other Excel library, to create an xlsx file from your csv data and programmaticaly set the Cell type to Text in order to take the end users out of the equation...
I found a nice way around this, if you add a space anywhere along the phone number, the cell is then not treated as number and is treated as a text cell in both Excel and Apple's iWork Numbers.
It's the only solution I've found so far that plays nice with Numbers.
Yes I realise the number then has a space, but this is easy to process out of large chunks of data, you just have to select a column and remove all spaces.
Also, if this is web related, most web type things are ok with users entering a space in the number field. E.g you can tap-to-call on mobiles.
The challenge is to get the space in there in the first place.
In use:
01202123456 = 1202123456
but
01202 123456 = 01202 123456
Ok, new discovery.
Using Quick Preview on Mac to view a CSV file the telephone column will display perfectly, but opening the file fully with Numbers or Excel will ruin that column.
On some level Mac OS X is capable of handling that column correctly with no user meddling.
I am now working on the best/easiest way to make a website output a universally accepted CSV with telephone numbers preserved.
But maybe with that info someone else has an idea on how to make Numbers handle the file in the same way that Quick Preview does?

Resources