Pages in this section:
All sections:
Uploading multiple text files, doc, docx, or pdf files
Unstructured documents
With some kinds of import, the exact structure matters a lot. But with text import, you can import documents with absolutely no structure if you wish. Each document is imported as one source. If you want you can provide additional metadata (e.g. year of publication) by also uploading an Excel sheet with this additional data.
Structured documents: question ID
If you wish, and if it makes sense to do so, you can also structure your documents into sections which we call questions, with a question ID for each section.
The app can then use the question ID to help you analyse your data, e.g., to filter only for answers to a particular question.
Note that as usual the IDs must be simple alphanumeric labels with no spaces.
question_id: QH (Optionally some more text here which is not part of the ID)
Yes, my health is better overall …
question_id: QJ (Optionally some more text here)
The floods affected my income…
It does not matter that every question appears inside each document, or even if some questions only appear only in one document.
Note that all the text following a section header will be included as part of that question, until the next question ID or the end of the document.
Note that if you for some reason have the same question ID repeated in the same document, the text will be merged.
Structured documents: source ID
Usually we assume that one document comes from one source. But sometimes you might have sources mixed within longer documents.
If you wish, and if it makes sense to do so, you can also structure your documents by source_id.
source_id: M23 (Optionally some more text here)
My health is better overall …
source_id: M24 (Optionally some more text here)
But my health is worse…
This means that for each text file, where a statement does not have a source, the document title will be used as source ID, otherwise the source ID of the statements is provided by these in-document tags.
Structured documents: question ID and source ID
If you wish, and if it makes sense to do so, you can also structure your documents into two levels of structure:
- source_id
- question_id
and the app expects questions to be nested inside respondents like this:
source_id: M23 (Optionally some more text here)
question_id: QH (Optionally some more text here)
Yes, my health is better overall …
question_id: QJ (Optionally some more text here)
The floods affected my income…
You can have more than one source inside each document.
Structured documents: source IDs nested within question IDs
You can even do it the other way round:
question_id: QH (Optionally some more text here)
source_id: M23 (Optionally some more text here)
Yes, my health is better overall …
source_id: M44 (Optionally some more text here)
No, my health is worse overall …
question_id: QJ (Optionally some more text here)
source_id: M44 (Optionally some more text here)
The floods affected my income…
The proviso is that the default listing of the statements would be questions nested within sources, which is what we otherwise also always have, not the other way round as in the transcript.
You can still use the statements panel to search for individual questions and then the corresponding sections will appear as in the transcript.
PDF files
Simple PDF files import just fine. However badly structured PDFs can be a problem. Sometimes documents with multiple columns will be read incorrectly. The app is unlikely to import tables in a useful way. Pictures are not imported.
Getting your IDs in the right order
Your source and question IDs will be displayed in the app in alphabetical order. So be careful: if you have IDs like D1, D2, …. D10, D11 etc they will be sorted like this:
D1 D10 D11 D12 D13 D14 D2 D3 D4 D5 D6 D7 D8 D9
you probably want to add some zeros to make sure it looks like this:
D01 D02 D03 D04 D05 D06 D07 D08 D09 D10 D11 D12 D13 D14
Forced breaks
Each paragraph (or in the case of PDFs, one page) will be one statement . At the moment there is no way to force a statement to split into smaller pieces or to join up smaller statements. Let us know if this is a problem.
Uploading multiple text files, doc, docx, or pdf filesUnstructured documentsStructured documents: question IDStructured documents: source IDStructured documents: question ID and source IDStructured documents: source IDs nested within question IDsPDF filesGetting your IDs in the right orderForced breaks