Jump to content

"Data cleaning": prepare your pdf files for optimal coding

Recommended Posts

Hi everybody,

i was wondering whether how any participant to this forum has solved the issue of data (text) prep for qualitative analysis.

Specifically, when I am working with pdf files, the format (two columns, text boxes, etc.) makes that sometimes codes cannot capture full sentences without taking part of another column or box. I know that NVIVO is better in dealing with Word docs (eliminating automatically page numbers and footnotes, for example).

What is the standard data prep that you do for literature review? Do you export pdf into Word docs and then you change the formatting there for every single document? I use the OCR from Acrobat Reader, and what it does is it creates text boxes in Word, so it does not completely solve the issue.

Many thanks!


Link to post
Share on other sites

Hello @Manolo Cabran,

Since most of the research databases store their files in PDF format, it is expected to have most of your literature review resources in PDF format. 

On the other hand, while Nvivo tries to determine the order of text on a PDF page, as you described above, sometimes you might having difficulties. The way Nvivo works with PDF files highly relies on PDF formatting and when it comes to OCRed files such difficulties become more frequent.

If the resource is significant for my research and I could not find a better formatted version, I move its content to a text document in Nvivo.


Link to post
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

  • Create New...

Important Information

Privacy Policy