# PubMedMining-vignette

This package has been created for easy and fast term-based text mining of the broad PubMed article repository. To find relevant articles to your research topic, you must:

• Figure out the main terms of your research focus (here fixterms)
• Figure out important terms that might pivot around your focus (here pubterms)
• (optional) define an output for the results files (Default = current location)
• Have stable internet access

The terms are stored as character strings in the according variables “fixterms” and “pubterms”. The desired output pathway can be stored in the “output” variable.

fixterms = c("bike", "downhill")
pubterms = c("dangerous", "extreme", "injuries")
output = getwd() #or "YOUR/DESIRED/PATHWAY"
pubmed_textmining(fixterms, pubterms, output)

Two kinds of results are generated by the function (.txt files):

• PMI-scores: Point-wise mutual information score table for each fix-term with scores for each pub-term
• relevant articles: for each fixterm+pubterm pair, a text file with relevant article titles and publishing year is generated

Definition of Pointwise Mutual Information (PMI) scoring:
Good collocation pairs have high PMI because the probability of co-occurrence is only slightly lower than the probabilities of occurrence of each word. Conversely, a pair of words whose probabilities of occurrence are considerably higher than their probability of co-occurrence gets a small PMI score. If PMi = -Inf, no articles found for the respective collocation pair.