Jump to content

Recommended Posts

I've just obtained a license for NVIVO8 and have begun exploring. It looks great. I've posted a lot to these forums about PDFs, and I wonder if I am the only person who is so eager to have NVIVO do a good job with PDFs, or if there are others who would like to see a very strong capability for this in the future. I don't think NVIVO8 SP1 has really improved PDF-handling all that much, and I wonder what plans QSR has for this.

 

As it is now, it seems that PDFs are essentially converted to rich-text format. As a result, text boxes, captions and images seem to be presenting problems for the conversion process, and a great deal of formatting is lost, making it more difficult to navigate through the converted document. It also appears that any errors made during the OCR process are preserved in the text that NVIVO uses, making reading a bit of an interpretive adventure. Finally, I am having problems with many PDFs, and it seems that any PDFs I have obtained from JSTOR are not converting (possibly because of the graphics on the first page?). Thus, it seems that NVIVO is still facing significant challenges converting PDFs to text, and the text that is produced is somewhat clunky and not very easily readable.

 

The ideal for NVIVO would be that the graphic page images that are part of the PDF be preserved, with the text tied to these images, so that it would be possible to code the PDF exactly as it appears on the original (and indeed on the original document, if it is a scan from a journal, etc.). A good example of the incorporation of PDFs into another program is the PDF viewer that is part of Microsof Outlook 2007 (although it is not possible to mark up the previewed pdf pages in that program).

 

I've pressed this issue because I find NVIVO to be a very powerful tool for analyzing notes, and I'm very eager to be able to combine my coding of fieldnotes with my coding of much of the literature that I'm using for my research. I'd like NVIVO to be the central place where all of my data is organized, and tied together by a set of common codes, rather than having NVIVO be just one more tool which I use in combination with other organizing and coding schemes in separate, unconnected realms of my hard drive

Share this post


Link to post
Share on other sites
Guest PKodiak

I agree. Better PDF support would completely change the way I use NVIVO. It sounds like QSR has plans to really improve PDF management in the program, based on Adam's comments to similar posts in the past. I hope future versions will be able to handle PDFs better. I agree, too, that currently the program is not quite there yet. How the text looks is important for doing the data analysis (can't imagine coding "clunky" text!).

 

Perhaps there are plans for Service Pack 2?

Share this post


Link to post
Share on other sites

I've been looking at other notes-analysis software and other file/data 'tagging' software (including tools like Zotero) to see if there are similar problems or calls for features popping up in those communities, and indeed it does seem so. For instance this thread over at Atlas.ti: http://forum.atlasti.com/showthread.php?p=...inear#post11526 , which is quite recent and goes over similar territory.

Share this post


Link to post
Share on other sites

Zotero is a great program for managing different types of source materials and has a lot of nascent tools for note-taking, annotating, tagging etc. It will be interesting to see how these develop and the various tools they develop to manage PDFs.

 

(Now Firefox 3 is out, it looks like Zotero will add synchronization and sharing features over the next few months. It's a amazing that an open source Firefox plug-in that's only been around for a short time has already out-done all the over-priced reference manager software that's been around for ages.)

Share this post


Link to post
Share on other sites

I am doing a fairly large literature review using Nvivo8. On 70% of my PDF's I have no problem. On 10% I get some corruption - for some reason it converts all the ff or fi in the document to a $ or ". You can fix that in the downloaded file, but it is a hassle. On another 20% they simply do not import. Sometimes I can work around that by going to the pdf and copying the text into a word file and then importing that. But other times the problem is the PDF is locked so that you cannot copy the text.

 

This is such a potentially powerful function that I hope they get it right. The ability to keep all your literature coded in one file can serve you for years.

 

I am finding that the program crashes a lot with many pdf files in it. Right now I can't get it to open at all. It just completely crashes when I open my file, although the volunteering project opens fine. I have tried my usual fix - resetting the SQL files, but it does not work now.

Share this post


Link to post
Share on other sites

I figured out my crash problem. I use a program called Sugarsync to backup online and syncronize my files between my home computer and my school computer. I learned that if the program is in the process of uploading my Nivo data file I cannot open it with Nvivo. Now, that might make sense, but it should give an error message instead of just crashing.

Share this post


Link to post
Share on other sites

I've had similar frustrations and issues. I started using the free trial of NVivo 8, converting serveral NVivo 7 files, so that I could use PDF files (I had previously done the conversion of PDF files using RTF in Word). After several crashes, and getting the service pack, I am very disappointed at the lack of graphics that get imported into NVivo 8. Still a few crashes, but it does let you read in PDF files.

 

I've decided, however, that I am not any better off with NVivo 8, and would like to go back to NVivo 7 (which I have a paid license for). However, I'm stuck in that NVivo 7 can't read NVivo 8 files. Any suggestions or help would be appreciated! Thanks, George

Share this post


Link to post
Share on other sites

yes, yes, and yes. i can second just about everything stated by others in this thread. utilizing pdf's is absolutely imperative to my research, and i find myself spending wasteful hours trying to read jumbled conversions and picking out truncated words or mis-conversions that my text queries don't pick up.

Share this post


Link to post
Share on other sites

Hi - I'm currently thinking of using NVIVO 8 to do a large literature review (involving at least 100 pdf documents and potentially many more). I would be interested to hear about other peoples experiences of importing and querying so many PDFs.

 

Other previous posts mention the problems when importing PDFs with images and unusual formats. For my own purposes, this is not such a concern as I don't need to preserve all of the formatting, I only need the text. I've tested importing several PDFs and it seems to import the text well enough for me.

 

My question is more about how NVIVO performs with large project files incorporating 100s of pdfs. QSR Support say that NVIVO should be able to handle project files of up to 4gb in size. Has anyone got any experience of working with large project files e.g. 300-500 MB or more?

 

I look forward to hearing from others about their experiences,

 

Dan

Share this post


Link to post
Share on other sites

Hi there,

 

the new Version of ATLAS.ti (V6.0 or maybee 6.x now?) is out and brings full native PDF support that will allow to work with PDF files in their native layout! :rolleyes: (keeps the original PDFs layout, graphics, tables and all, so all data always remains in trhe origin layout and complete.) Thats great! As I can see, all other QDA packages make you strip PDFs down to text files which is not a sufficient way to work with PDF.

 

Cheers

 

Joan

Share this post


Link to post
Share on other sites

Wow, Joan, that is worth checking out. I still find this to be a frustrating issue with NVIVO, although I think I prefer the NVIVO user interface to ATLAS.

 

Dan, not sure what to say--no experience with super-large file size for NVIVO. But I've always worried about the structure of the data in NVIVO. Why is it all in one massive file and not in a bunch of little ones (as is the case with other large database programs I'm familiar with). I'm no programmer, so I leave that to the experts. Good luck with the project.

 

It would be great to hear from NVIVO development team about ideas for PDF support. Is QSR considering improving that feature? Or is this pretty much the way it will look for a while?

 

I have been exploring other options for the coded PDF library that many of us seem to want...

I also use Endnote as my citation management system, and I'd always hoped Endnote would find a way to allow you to code PDFs and access them through the Endnote program. But even more useful would be a way to link Endnote to NVIVO, ie., beyond the static method of PDFing a static version of the endnote library and importing it to NVIVO. I am constantly making notes and updating my Endnote library, so that would not be too helpful.

Share this post


Link to post
Share on other sites
I also use Endnote as my citation management system, and I'd always hoped Endnote would find a way to allow you to code PDFs and access them through the Endnote program. But even more useful would be a way to link Endnote to NVIVO, ie., beyond the static method of PDFing a static version of the endnote library and importing it to NVIVO. I am constantly making notes and updating my Endnote library, so that would not be too helpful.

 

I'll put in another plug for Zotero here. You can't annotate PDFs directly (yet) but you can annotate the PDFs in your Zotero collection using other programs from within Zotero, see discussion here.

 

Other features that may be of interest to qual researchers: you can use Zotero to directly annotate web pages and other types of documents: see Highlighting and Annotation. You can also add notes to items and tags and you can retrieve PDF Metadata and you can do cool things with timelines (click links for screencasts). There's also a plugin that allows annotation of video/audio files: see Vertov. And it works with Word and OpenOffice Writer.

 

Zotero is undergoing fairly rapid development (see Roadmap)--it's about to become a great collaboration tool you can use across multiple platforms from anywhere. It's open source so outside developers can easily in creating new features are free contribute code or develop plug-ins. If you search their forums you'll see a number of users pondering how to relate Zotero use to QDA tasks and existing QDA software e.g. Tagging phrases in pdfs.

Share this post


Link to post
Share on other sites

Hi all,

 

I'll definitely check out Zotero again. The last time I looked I didn't get a sense of where the software was really headed (there was a lot of discussion on forums there, and it seemed Zotero was a bit of a prisoner of its open-sourceness...).

 

I've just been checking out the Papers for Mac program (http://mekentosj.com/papers/). This is a reference management program with fully integrated PDF support, and seems to include the ability to make annotations and to analyze metadata/tags. It looks pretty powerful, and is just in its early stages of development.

 

I know that NVIVO doesn't aspire to to be a reference management program, but there seems to be some connection between the PDF issue and the question of how to make NVIVO a tool for qualitative research that could move beyond just coding and analyzing fieldnotes to coding and analyzing text (and not just as in "textual analysis").

 

What do people think?

 

Importantly, given how expensive the QSR license is for NVIVO, what does the NVIVO development team think about all of this? Where to from NVIVO 8? Anybody listening? It would be great to get a sense for what QSR's commitments are, now that NVIVO is a very capable tool. Does the development team have ideas for the future that involve some of what we're discussing here?

 

J

Share this post


Link to post
Share on other sites

Hello all,

 

We at QSR have been taking much interest in this thread. The majority of the time we avoid jumping into a particular thread to ensure members fully debate and discuss the issues at hand.

 

You may be interested to know that QSR International undertook an extensive online qualitative survey and indepth interviews with our users in the middle of last year. We analyzed this information using NVivo and have since used the results to prioritize our future development plans for NVivo.

 

I can confidently state that the majority of the issues raised in this thread are already on our future development plans and that some have already commenced in development. We are not in a position to disclose these details as of yet. More details will follow late in this year and there will be opportunity to provide feedback on the new functionality prior to its release.

 

Thanks and regards

Adam

Share this post


Link to post
Share on other sites

Hi Adam,

 

Thanks for responding! That's good news -- I'm sure we will all look forward to hearing what the future holds.

 

Best regards,

J

Share this post


Link to post
Share on other sites

Old pdf will never be read unless using an OCR software. And the operation is heavy and produced errors (o instead of 0, l instead of i, ...). I am not favorable to transform nvivo into enormous software which makes everything, but badly. Nvivo is for me a specialized tool. For other and general tasks, ms word, excel, End Note, OCR software, express scribe will always be better.

 

I notice that I have a little bit different approach, a less qualitative, a more quantitative perspective. And for me, sources (PDF for example, but even video) can be always transformed in .txt before to be analyzed. I work sometimes with 1000-2000 differents sources, thus what is important for me, it is the speed, the power and the automatisms**. Matrix sometimes set many hours to be compute. Above 300-400 sources in a node or 300-400 nodes in a project, Nvivo becomes slow.

 

Would it be possible to develop nvivo on two axes: the extension of supports (pdf, html, Word [recognition of footnotes will be usefull], mpg, jpg, etc..), but also the possibility of removing this extension (that is working in .txt format) and getting a very fast and powerful tool which were able to deal with big corpuses (quali-quanti).

 

 

 

 

 

*a syntax of commands would be very usefull to make automatic tasks (irrelevant topic):

 

1. first example, list of words:

 

[value]

price

money

cash

wealth

luxury

£

$

...

 

 

 

IF text = [value] THEN CODE (tree node\theme\value) = paragraph@text (or word@text, or source@text)

 

2. second example: import source

 

 

IF READ "c:\corpus\" = [value] THEN IMPORT SOURCE

 

 

3. Count example

 

a = COUNT ([value]) in SOURCE

IF a>3 THEN CODE....

 

 

4. ATTRIBUTES and test

 

test=0

 

FOR first_appearance = 1.1.2009 TO 31.12.2009

 

IF date$attribute = [value] AND test=0 THEN

test=id@attribute

date=date@attributes

END IF

 

NEXT

 

CREATE MEMOS (first appearance, "the first appearance of "[value] "provide from " test " the " date

Share this post


Link to post
Share on other sites
I've been looking at other notes-analysis software and other file/data 'tagging' software (including tools like Zotero

) to see if there are similar problems or calls for features popping up in those communities, and indeed it does seem so. For instance this thread over at Atlas.ti: http://forum.atlasti.com/showthread.php?p=...inear#post11526 , which is quite recent and goes over similar territory.

 

 

 

Have a look at the freeware reference manager Mendeley (www.mendeley.com). They have no problems handling PDF files, making notes and highlightings in the PDF files, and using tags etc. So I must admit that I simply doesn't understand why NVIVO can't import PDF-files as PDF-files. Mendelye can even read the PDF-files and create a reference list based on what it automatically reads in the file.

Share this post


Link to post
Share on other sites
Have a look at the freeware reference manager Mendeley (www.mendeley.com).

 

Søren-- funny, I just voted for, and made a comment on, your suggestion at the Mendeley forum that they should be allowing the tagging of individual text portions, like NVIVO. I noted there that a.nnotate.com does allow text tagging, but is not a reference manager and does not have a very good user interface. Mendeley seems to have a good interface, and I'm interested to see if they can move quickly to improve the software beyond its current abilities. Mendeley does not, for instance, have hierarchical tagging, and does not allow for queries and various sorts of higher end analytical actions that are available in NVIVO.

 

Indeed, what is really fantastic about Mendeley's PDF browser is that the PDFs remain as individual pdf documents on your hard drive. In NVIVO, all documents are brought into one large NVIVO project file and are no longer 'live' documents for use elsewhere.

 

The ultimate software program, in my opinion, would be a sort of mash-up of NVIVO and a reference manager. In this new program, PDFs, word documents, audio files, etc., would all remain in the normal folder structure of the computer, perhaps in a special directory monitored by the program. They could still be edited with the applications that generated them, and they could be shared as word docs, pdfs, etc. with others. When opened with the new mash-up program, they could be coded (ie, with tags / nodes), and the program would be able to search through its monitored directories to do queries, word searches, etc. Tags (nodes) and other metadata could also be written directly to the documents, improving sharing and portability. The new program would also be a reference management program like Zotero or Mendeley. It would contain a reference database with records for all of a user's bibliography, it would allow downloading of bibliographic information, it would allow searching of databases like JSTOR etc. for reference papers, it would allow sharing and collaboration among users. Just like current programs. All of the documents that Zotero monitors could also be linked to entries in the bibliographic database. So, for a library of PDFs monitored by the new program, if the PDFs are research papers, each one would be linked to bibliographic information (indeed, that information would be associated with the file, or, even better, written to it). Note that within these documents, selecting text, annotating, making links, and tagging (with nodes) are all possible, and NVIVO-like searching and analysis are possible. This way the same sorts of analysis that one does on one's fieldnotes is possible on one's bibliographic material as well; furthermore, the same codes and conceptual structures generated through the coding process become useful in looking at the literature, and questions that arise from literature review are instantly connected to observations, jottings, and notes that one has made elsewhere. Linkages connecting the ideas of disparate authors come into focus, and their relevance to the research project is made apparent. As you write, you are able to continuously move back and forth from your own work to your analytical materials.

 

Perhaps collaboration between NVIVO and Mendeley or Zotero would help to move us in that direction?

Share this post


Link to post
Share on other sites

Regarding PDFs, NVIVO, and reference management programs, I added a suggestion to the suggestions forum (see here).

 

Alan, regarding Zotero's annotation ambitions, I hope they do it. I really like the promise of Zotero, and use it as my principal citation manager and PDF collection organizer, but I'm frustrated by the fact that often some very basic functions suggested in their forums are left as old tickets for years. I sometimes wonder if the open source model, without any method of revenue generation, can really work. A slightly different model is that of MediaMonkey, a fantastic itunes alternative (which is in fact much better than itunes itself) where users are charged to download premium versions of the software--that money pays the developers, who also generate revenue by creating add-ons to the software, I think.

 

I do think that NVIVO will cease to be useful, and will eventually expire, if any of the current reference management programs implement features like hierarchical coding structures, tagging of annotations and of selected portions of text (and not just tagging of whole documents, as is currently the case), and annotations which refer to specific portions of text, and if any of them creates a user interface that allows for better scanning and analysis across documents (as NVIVO does, and does well). This is the kind of innovation that would get me to gladly switch to Mendeley, for instance; indeed, if Sente were to add this functionality I'd switch completely to Mac without hesitation. For now, I actually like NVIVO (and am, of course, 'locked in' until someone creates a way of exporting coding to another program, or I have money for a research assistant), and I'd love to see NVIVO move beyond the recent pattern of releases which strike me as somewhat underwhelming and costly -- press releases heralding PDF integration, for instance, when the reality is that PDF text can be incorporated into NVIVO, but not the formatting, and not the actual living PDF documents..., etc. This is one reason I recently suggested that NVIVO take comments and suggestions by forum members more seriously, engage in a dialogue with users, and respond from time to time to users concerns (which are very seldom addressed on the forums, as far as I can see).

Share this post


Link to post
Share on other sites
...Alan, regarding Zotero's annotation ambitions, I hope they do it. I really like the promise of Zotero, and use it as my principal citation manager and PDF collection organizer...

 

I think it will take time for the annotation tools to appear because their vision is quite large. A lot of different projects seem to be collaborating through the Open Annotation Collaboration I mentioned previously. It is worth checking out the videos of AXE integration in Zotero and Pliny here. More on Pliny, which works with PDFs, here. And for relation to QDA see this paper.

 

At the same time that TEI was under development quite a different approach to computer supported research of text was underway. Tools were developed within the Social Sciences to support their approach to textual interpretation called “qualitative analysis” (see some further discussion about some of the significance of these tools within the Social Sciences in Kelle 1997, and some of my thoughts about their significance for humanities scholarship in Bradley 2003). The first prominent piece of software of this kind was the redoubtable Nud*ist program, and this field has continued with more recent, and more complex, software like NVivo and Atlas.ti. Pliny takes up a similar theme. Indeed, several significant ideas from software such as NVivo and Atlas.ti appear in Pliny, but they have been adapted somewhat to create an environment that is intended to match more closely the needs of the humanist rather than the social scientist.

Share this post


Link to post
Share on other sites

The problems with .PDFs are usually down to the .PDF file itself and the quality of OCR being used to create it in the first instance.

Most software packages like NVivo are going to have very mixed results depending upon the quality of .PDFs that are being imported.

 

I'm not particularly vexed by the .PDF problem as I work with NVivo 8 on a laptop and a bloated NVivo program size is a serious performance issue.

I therefore don't import .PDFs as internals because of the size of my library (700+ .PDFs).

I have used proxy externals throughout and use these as containers for the selected parts of .PDFs that are being cited

or referenced. The "See also" links in memos are linked to dummy externals that create the link to the .PDF

itself. As I use Endnote X2, I am quite content to leave the original .PDFs in their Endnote Library folder structure.

 

Perhaps we can understand the reticence of software publishers to add a feature that will provoke complaints about their software

that are not their fault. I would suggest that QSR bundle one of the .PDF cleaning programs to avoid this problem when,

and if, they add more .PDF functionality.

 

:)

Share this post


Link to post
Share on other sites

I go back to this discussion.

I do not think Nvivo should try to do everything. Nvivo is not a database tool as endnote, filemaker, or even zotero, .... It should not be used only to classify subjects and arrange them. I think, that if you have 500 pdf to order, nvivo is not the good software for you. If you have 30 pdf, you can convert them manually. As Nigel wrote, PDF takes too much space. Often, I work with .txt files.

Nvivo should focus more on the textual analysis, implement automatic processing tools (I return to my idea of syntax command) or analysis of co-occurrence (for example factorial correspondance analysis) or chronological tool(when the corpus are chronological ordered: emergence of themes), distinguish and linking actors and themes, mapping concept, ...

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

×