Difference between revisions of "Bibliographic/OOoBib Functional Requirements/Keywords"
Line 114: | Line 114: | ||
== Requirements == | == Requirements == | ||
+ | |||
+ | * Keywords | ||
+ | * Article categories | ||
+ | * Journal category/classification | ||
+ | |||
=== Keywords === | === Keywords === | ||
− | + | ||
− | + | ==== Alias ==== | |
+ | '''alias:''' these are synonyms, i.e., the 2 words are equivalent | ||
+ | |||
+ | ==== Hierarchical Keyword Tree ==== | ||
+ | |||
+ | '''Hierarchical tree structure:''' | ||
+ | * the presence of one term implies automatically another term, although the 2 are not aliases/synonyms, e.g. | ||
+ | ** ''endocarditis'' implies ''infection'', ''bacteremia'', ''heart valves'' and ''medicine'', too; | ||
+ | ** another non-medical example: ''whale'' implies both ''mammal'', ''ocean'' and ''water'' | ||
+ | * '''dynamic trees''' | ||
+ | ** these trees must '''NOT''' be rigid | ||
+ | ** rather, they should be dynamic: a user may want to change the relationships later to optimize some search results and change it again for another search | ||
+ | * '''intersecting trees''' | ||
+ | ** one keyword may belong to more than one tree: | ||
+ | *** ''endicarditis -> heart valves -> cardiology''; and ''endocarditis -> bacteremia -> infection'' | ||
+ | *** a non-medical example: ''whale -> mammal -> animal''; and ''whale -> ocean -> hydrosphere'' | ||
+ | |||
+ | The users should be able to: | ||
+ | * write their own ''trees / tree relationships'' | ||
+ | * store these trees for future use | ||
+ | |||
+ | Because this concept is so important, I will expand the ''endocarditis'' example: | ||
+ | cardiology <- heart valves <- endocarditis <- diagnosis, treatment, epidemiology (all 3 belong to this node) | ||
+ | infection <-| | ||
+ | |- endocarditis <- Staphylococcus aureus, Streptococcus, fastidious organisms | ||
+ | |- bacteremia <- endocarditis <- (various bacteria, see previous tree) | ||
+ | |||
+ | As it is seen, ''endocarditis'' might belong to 3 different trees and I may use any one (or 2 or all 3 of them), depending on what I wish to search. | ||
+ | |||
=== Article Categories === | === Article Categories === |
Revision as of 14:24, 26 September 2006
This document has been placed on the wiki so that members of the OpenOffice community can assist in developing the design and documentation for the enhanced bibliographic facility.
Back to OOoBib Functional Requirements
Keywords
One way to better sort articles is based on Keywords (see my post on keywords).
( tell me the title and date and I will inset a link to the message David Wilson )
However, there is another way I will shortly describe here.
There are a number of categories a research paper can belong to:
- Basic Research
- Theoretical Research (especially in Math/Physics)
- Modeling
- Trials:
- randomized controlled trial
- Meta-analysis
- other trial
- Review
- Guideline
- Correspondence
- Editorial
- Epidemiologic Study
- Case Report
- Images in clinical medicine (some Journals have such a feature/ could be a subgroup of Case Report)
- Questions/ Question-Answers
If there are other relevant categories, feel free to implement them as well.
This is especially useful when searching for all trials on a given matter (e.g. for writing a meta-analysis or writing a review or a guideline), or for a specific case report.
I do have some >2500 of articles saved on my computer and searching for the correct file is a nightmare. It may seem that 2500 articles is a huge number, however in infections diseases this is only a minimum to start with.
It is useful to have a field storing this information. Although custom fields exist, this is a feature that should be standard. It allows searching (and grouping) articles on a more powerful basis.
Submitted as issue number 66353 by discoleo at Openoffice.org.
Implementation ideas
How should this be implemented ? Most bib and document systems I have seem to think that adding a field for keywords is enough and let the user the invent their own categories. I have been involved in IT development and document management systems and have had enough lectures from librarians (ie professional indexers) to know that this just leads to a big unmanageable mess, which librarians are often called in to try to fix.
Also a good keyword system has a good set of aliases defined. One insurance company was providing different compensation for fractured limbs than for broken limbs, because their compensation history search system did not have these aliases defined. The cases and the compensation history diverged as each of the staff used their preferred term.
So --- Should we build pre-defined document category sets that a user could select one for each document collection. i.e. Medical Research, Physical Sciences, Social Sciences etc ? David Wilson
Discussion
After a thorough thought, I believe more and more, that a standardization is both highly useful and needed. While I aknowledge that it will be difficult to get a working standardizaton in the immediate future, this is something that deserves to be worked hard on. I hope that it will get implemented somewhere in the more distant future.
Until we get a working standardization, it is nevertheless pertinent to implement various other mechanisms needed for a more comprehensive keyword solution.
I wish to discuss 2 points:
- limitations of current keywords - how to standardize - how to implement the standardization
Why Standardise
As more and more research data becomes available, it becomes increasingly difficult to efficiently use this data. The problem stems from the simple fact, that you do NOT get what you want. Most of the published data will end somewhere in the nirvana of computer storage, without beeing ever read by those who would benefit most of it. This problem is likely to deepen in the near future, as more and more journals appear and huge amounts of data are published.
To illustrate this further, it is helpful to perform some searches: when entering some common term, the search generates such a huges amount of hits, that it is even impossible to read all the titles. Searching for a funky term might narrow the results, but there are still thousand of hits. I do have indeed serious problems when searching for something. There is so much available literature, that I get easily overwhelmed, although, most of that is not relevant for the work. Refining a search is becoming increasingly difficult, and the time spent on searching can exceed the time needed to read the actuall article.
This fact has been recognized by Pubmed as well, and they have implemented various search strategies to increase the accuracy of the search (see e.g. Clinical Queries, http://www.ncbi.nlm.nih.gov/entrez/query/static/clinical.shtml ). However, this is only a workaround for the actual problem and will become ultimately insufficient, too.
Limitations of Current Keyword Strategies
Before discussing the steps necessary to implement a standardization, it becomes pertinent to point to some limitations of current indexing strategies using keywords.
- currently, an article may have a number of keywords defined
- this list is a plain text list
- this plain structure is one of the reasons for failure of keywords
- to be extensive (aka sensitive), you must define many keywords
- and this undoubtedly reduces the specificity, i.e. when performing a search, many articles actually not needed would be retrieved, too; [it would be also very impractical to store such huge keyword lists]
- to solve this paradox, one needs a hierarchical tree structure:
- one keyword might implie another term as well
- entering both terms as keywords will create however very large keyword lists and generate the problems mentioned above
- therefore the need for a hierarchical tree (see later, Hierarchical Tree): one term points automatically to one (or more) trees, containing various furher search terms/keywords
- the magic of this approach is, that we may change later the structure of these trees to adapt them for the particular search needs (see later)
- these trees wouldn't be defined as a standard, but any user would create his own tree/relation to maximize his search results (both the sensitivity and specificity)
How to standardise
This is a huge task and I belive there is a reason, why there is no standardization to date. Therefore, before starting from scratch, it would be wise to search for work already done:
- search for standards
- contact librarians, other groups
- contact others who might be interested or have done work in this field (e.g. Pubmed; I will try to contact the Pubmed team and hope for an answer)
Some journals already sort their articles based on some specific features (e.g. Circulation - the journal of the American Heart Association; Chest, and others). Therefore, it could be somewhat more easy to implement some of the standardisation, because professional societies do use them. However, other fields are covered less well and could cause some pain.
Probably it is the best thing to ask the professional societies to create such a framework.
How to implement this
In order to be used in practice, the program MUST already suggest to the end-user some appropriate categories. This could be more easily accomplished for major article category, but for more detailed keywords it will become increasingly difficult. (YES, I believe that all the keywords should be standardized, as pointed out, maybe sometime in the future.)
Specific procedure:
- scan journal title: jounals publish in most instances only articles from a very narrow field (except maybe Nature and Science)
- scan title and abstract for some standard words (aka the keywords for that specific field)
- depending on the words found, suggest an article category/ subcategory: e.g. medicine/ surgery/ abdominal surgery/ randomized controlled trial; or veterinary medicine/ dog / infectious diseases/ rabies/ vaccine
I will continue in the next section with a more thorough discussion of this implementation.
Requirements
- Keywords
- Article categories
- Journal category/classification
Keywords
Alias
alias: these are synonyms, i.e., the 2 words are equivalent
Hierarchical Keyword Tree
Hierarchical tree structure:
- the presence of one term implies automatically another term, although the 2 are not aliases/synonyms, e.g.
- endocarditis implies infection, bacteremia, heart valves and medicine, too;
- another non-medical example: whale implies both mammal, ocean and water
- dynamic trees
- these trees must NOT be rigid
- rather, they should be dynamic: a user may want to change the relationships later to optimize some search results and change it again for another search
- intersecting trees
- one keyword may belong to more than one tree:
- endicarditis -> heart valves -> cardiology; and endocarditis -> bacteremia -> infection
- a non-medical example: whale -> mammal -> animal; and whale -> ocean -> hydrosphere
- one keyword may belong to more than one tree:
The users should be able to:
- write their own trees / tree relationships
- store these trees for future use
Because this concept is so important, I will expand the endocarditis example:
cardiology <- heart valves <- endocarditis <- diagnosis, treatment, epidemiology (all 3 belong to this node) infection <-| |- endocarditis <- Staphylococcus aureus, Streptococcus, fastidious organisms |- bacteremia <- endocarditis <- (various bacteria, see previous tree)
As it is seen, endocarditis might belong to 3 different trees and I may use any one (or 2 or all 3 of them), depending on what I wish to search.
Article Categories
The article category should contain both the field of work (e.g. medicine) and the type of article (e.g. review). Therefore we should have:
- category: see Journal Classification below
- article type: see at the top of this page
Journal Classification
This describes what is needed to implement a standardized journal classification.
We need to define/create lists with:
- basic categories: this needs to be defined at the top of the hierarchy; every article belongs to one (or more) of these basic categories
- list of journals: needed for the next point;
- basic category for journals: we will need to apply one or more categories to every journal.
Basic Field / Top Categories
Question: Do we need subcategories OR, more specifically, how do we define subcategories?
Some journals sort the articles based on some standardised subcategories (this would be usually the 3rd-4th item in the hierarchy):
- Chest (respiratory and critical care medicine): see http://www.chestjournal.org/current.shtml (other issues may contain additional entries)
- Circulation (cardiology): see e.g. the contents on http://circ.ahajournals.org/content/vol114/issue6/ (see other issues as well)
- Infectious Diseases:
- some basic categories: see http://jcm.asm.org/content/vol44/issue8/
- many more journals may sort their articles based on such highliting criteria, so some work has been already done by proffesional societies
These lists are incomplete. Please fill in whenever you find additional information.
Various editors sort their publications based on comprehensive speciality lists, e.g. http://www.us.elsevierhealth.com/Medicine
Top Categories
- mathematics
- physics
- quantum mechanics (these would be subcategories, ... or still main categories)
- astrophysics
- others
- biology: part of biomedical sciences?
- biomedical sciences
- non-surgical/internal medicine
- cardiology
- endocrinology
- diabetology
- gastroenterology
- hepatology
- haematology/ hematology
- infectious diseases: should be separate entity?
- pulmology/ respiratory medicine
- nephrology
- neurology
- geriatric medicine: one node higher?
- immunology/ rheumatology: should be separate?
- many subspecialities
- dermatology
- intensive care / critical care
- cognitive sciences/ psichiatry
- paediatrics/ pediatrics
- radiology
- surgery
- abdominal surgery
- cardio-vascular surgery/ cardiothoracic surgery
- emergency medicine
- obstetrics and gynecology
- neurosurgery
- ophthalmology
- orthopedics
- otolaryngolgy/ ent surgery
- plastic surgery
- urology
- many subspecialities
- dentistry
- nursing
- non-surgical/internal medicine
Should these be higher categories
- infectious diseases
- microbiology (could be one hierarchical node higher)
- virology
- parasitology
- tropical medicine
- epidemiology
- microbiology (could be subspeciality of infectious diseases)
- infectious diseases
Feel free to expand this list!!!
Journals
This list will include the full name of the journal, the abbreviated name and the journal category.
Please note, that this list is important NOT only for this feature:
- some journals require the FULL journal name in the bibliography (e.g. JAC requires Journal of Antimicrobial Chemotherapy and not J Antimicrob Chemother)
- others require the abbreviated name (actually most journals fit here)
- some journals have very short aliases (like JAC, CID, NEJM), which I would like to use when entering by hand a bibliographic entry, BUT this is not the official abbreviation and should therefore automatically be converted to the official abbreviation
I have imported 5269 journals from Pubmed (see gawk-script below)
- Journal List Last Updated: September 20, 2006
- the gawk-script will allow to easily update the list
- this list does not contain the URL, nor Journal Category, but I will work to automate that, too
- I believe, the list is too huge, to post it here (but it can be recreated easily with the gawk-script and I can compress it and post it somewhere as an attachment)
Sites With Journal Lists
There are various sites having extensive journal lists:
- Pubmed:
- http://www.ncbi.nlm.nih.gov/entrez/linkout/journals/jourlists.cgi?typeid=1&type=journals&operation=Show
- this is by far the most comprehensive list for medical literature
- this list is restricted largely to medical journals
- it contains the expanded journal name, which is usually more comprehensive than the full journal name needed for citation by some journals (but it may be useful because it gives a better understanding what the journal is about)
- see the gawk script for importing this list
- a full list of journals is available via ftp from ftp://ftp.ncbi.nih.gov/pubmed (some 19,400 entries)
- this list contains the useful journal full name, i.e. the name that is used for citation purposes by some of the journals
- I will try to write a script to automate the importing of journal names (I did not have time until now to do so)
- http://www.ncbi.nlm.nih.gov/entrez/linkout/journals/jourlists.cgi?typeid=1&type=journals&operation=Show
- Oxford University Press: http://www.oxfordjournals.org - I am currently importing the Medical Journals, however any help is welcomed to import the non-medical Journals
- Springer: http://www.springerlink.com
- Blackwell: http://www.blackwellsynergy.com
- Non-medical sites:
- http://www.nal.usda.gov/catalog/download_jia.shtml contains a MS Access DB with ~2,000 journal entries, mostly non-medical
- Others: please expand
Journal List
I have this list as an OOo Writer document, too. (contains tables) I will expand it whenever I have time. One useful addition to this list would be the journal's url.
Full Journal Name | Short Journal Name (Abbreviation) | Custom Shortcut | Journal Category | URL |
Infectious Diseases Journals
Full Journal Name | Short Journal Name (Abbreviation) | Custom Shortcut | Journal Category | URL |
---|---|---|---|---|
American Journal of Infection Control | Am J Infect Control | AJIC | med, infx | http://journals.elsevierhealth.com/periodicals/ymic/issues |
Antimicrobial Agents and Chemotherapy | Antimicrob Agents Chemother | AAC | med, infx, abx | |
Chemotherapy | Chemotherapy | med | ||
Clinical Infectious Diseases | Clin Infect Dis | CID | med, infx | |
Clinical Microbiology Reviews | Clin Microbiol Rev | CMR | med, infx | |
Emerging Infectious Diseases | Emerg Infect Dis | med, infx | ||
European Journal of Clinical Microbiology | Eur J Clin Microbiol | med, infx | ||
European Journal of Clinical Microbiology and Infectious Diseases | Eur J Clin Microbiol Infect Dis | med, infx | ||
Infection | Infection | med, infx | ||
Infection Control Hospital Epidemiology | Infect Control Hospital Epidemiol | med, infx | ||
Infectious Disease Clinics of North America | Infect Dis Clin N Am | med, infx | ||
International Journal of Antimicrobial Agents | Int J Antimicrob Agents | med, infx, abx | ||
Journal of Antimicrobial Chemotherapy | J Antimicrob Chemother | JAC | med, infx, abx | |
Journal of Bacteriology | J Bacteriol | med, infx | ||
Journal of Clinical Microbiology | J Clin Microbiol | JCM | med, infx | |
Journal of Hospital Infection | J Hosp Infect | med, infx | ||
Journal of Infectious Diseases | J Infect Dis | JID | med, infx | |
Journal of Medical Microbiology | J Med Microbiol | JMM | med, infx, microbiol | |
Microbes and Infection | Microbes Infect | med, infx | ||
Microbiological Reviews | Microbiol Rev | med, infx, microbiol | ||
Research in Microbiology | Res Microbiol | med, infx, microbiol | http://www.sciencedirect.com/science/journal/09232508 | |
Review Infectious Diseases | Rev Infect Dis | med, infx | ||
Scandinavian Journal Infectious Diseases | Scand J Infect Dis | med, infx | ||
Veterinary Microbiology | Vet Microbiol | biomed, vet, microbiol | ||
International Journal of Systematic and Evolutionary Microbiology | Int J Syst Evol Microbiol | IJSEM | biomed, med, infx, microbiol | http://ijs.sgmjournals.org |
General Medical Journals
Full Journal Name | Short Journal Name (Abbreviation) | Custom Shortcut | Journal Category | URL |
---|---|---|---|---|
American Journal of Medicine | Am J Med | med, all | http://www.sciencedirect.com/science/journal/00029343 | |
Annals of Internal Medicine | Ann Intern Med | med, intern | http://www.annals.org | |
British Medical Journal | BMJ | BMJ | med, all | http://bmj.bmjjournals.com |
Journal of the American Medical Association | JAMA | JAMA | med, all | http://jama.ama-assn.org |
Lancet | Lancet | med, all | http://www.thelancet.com | |
New England Journal of Medicine | New Engl J Med | NEJM | med, all | http://www.nejm.org |
Statistics
Journal of Statistical Software, http://www.stat.ucla.edu/journals/jss/
All Categories
Full Journal Name | Short Journal Name (Abbreviation) | Custom Shortcut | Journal Category | URL |
---|---|---|---|---|
Nature | Nature | all | ||
Science | Science | all |
Cell/ Molecular Biology
Full Journal Name | Short Journal Name (Abbreviation) | Custom Shortcut | Journal Category | URL |
---|---|---|---|---|
Journal of Biological Chemistry | J Biol Chem | JBC | biomed, cell biol, chem | |
Proceedings of the National Academy of Sciences of the USA | Proc Natl Acad Sci USA | PNAS | biomed, cell biol, all |
Oxford Journals
incomplete - Still need to do a lot of work!!! When I'll finish, I will move these entries in their respective category.
Age and Ageing | Age Ageing | med, geront | http://ageing.oxfordjournals.org/ | |
Alcohol and Alcoholism | Alcohol Alcohol | med, behav | http://alcalc.oxfordjournals.org/ | |
American Journal of Epidemiology | Am J Epidemiol | med, epidem | http://aje.oxfordjournals.org/ | |
Annals of Occupational Hygiene | Ann Occup Hyg | med, epidem, hygiene | http://annhyg.oxfordjournals.org/ | |
Annals of Oncology | Ann Oncol | med, oncol | http://annonc.oxfordjournals.org/ | |
BJA: British Journal of Anaesthesia | Br J Anaesth | BJA | med, ICU | http://bja.oxfordjournals.org/ |
Brain | Brain | med, neuro | http://brain.oxfordjournals.org/ | |
Brief Treatment and Crisis Intervention | Brief Treat Crisis Interven | med, behav | http://brief-treatment.oxfordjournals.org/ | |
British Medical Bulletin | Br Med Bull | med, all | http://bmb.oxfordjournals.org/ | |
Continuing Education in Anaesthesia, Critical Care & Pain | Contin Educ Anaesth Crit Care Pain | med, ICU | http://ceaccp.oxfordjournals.org/ | |
Europace | Europace | med, cardio | http://europace.oxfordjournals.org/ | |
European Heart Journal | Eur Heart J | med, cardio | http://eurheartj.oxfordjournals.org/ | |
The European Journal of Orthodontics | Eur J Orthod | med, dentist | http://ejo.oxfordjournals.org/ | |
The European Journal of Public Health | Eur J Public Health | med, epidem | http://eurpub.oxfordjournals.org/ | |
Evidence-based Complementary and Alternative Medicine | Evid Based Complement Alternat Med | eCAM | med, alt | http://ecam.oxfordjournals.org/ |
Family Practice | Fam Pract | med | http://fampra.oxfordjournals.org | |
Health Education Research | Health Educ Res | med, epidem | http://her.oxfordjournals.org | |
Health Policy and Planning | Health Policy Plan | med, epidem | http://heapol.oxfordjournals.org | |
Health Promotion International | Health Promot Int | med, epidem | http://heapro.oxfordjournals.org | |
Human Reproduction | Hum Reprod | med, gyn | http://humrep.oxfordjournals.org | |
Human Reproduction Update | Hum Reprod Update | med, gyn | http://humupd.oxfordjournals.org |
GAWK HELPER SCRIPT
This page will contain some useful gawk scripts for formatting the different journal lists.
Requirements:
- awk/gawk:
- if you are on a UNIX machine, almost surely you will have it installed on your computer
- if you're on a WINDOWS machine, almost surely you won't have it; you can get gawk for free from http://www.sourceforge.net project gnuwin32
Format PUBMED Journal List
The latest PUBMED Journal List can be downloaded from: http://www.ncbi.nlm.nih.gov/entrez/linkout/journals/jourlists.cgi?typeid=1&type=journals&format=text&operation=Show
Use:
- Save the above list as a plain text file (Limitaion: it does NOT contain the very short Abbreviation, nor the URL or the Journal Category); you also need to manually delete the first line from that text file (it is not a journal entry!!!)
- Save the following script as a file, e.g. this-script-file.awk
- run the gawk script, e.g. gawk -f "this-script-file.awk" your-plain-text-file.txt
- the script will create a new text file, Journals-Pubmed-Extracted.txt, that will contain the list with
- Full Journal Names
- Abbreviations and
- ISSN (the journal entries are UNIQUE)
- I will work to automate the URL import, too
- Journal Category will remain a manual task
# This program EXTRACTS JOURNAL NAMES from the PUBMED JOURNAL TEXT LIST, v1.01 # The latest list can be downloaded from: # http://www.ncbi.nlm.nih.gov/entrez/linkout/journals/jourlists.cgi?typeid=1&type=journals&format=text&operation=Show # I have imported 5269 journals from Pubmed, # Journal List Last updated: September 20, 2006 BEGIN { val = "" cel[1] = "" # ARRAY TAKING THE VALUES # 1: JOURNAL FULL NAME # 2: JOURNAL ABBREVIATION # 3: ISSN } # END BEGIN # START ACTUAL PROGRAM <------------------------------------> # DELETE SPACES / / { gsub(/ +/, " ") } # DELETE MULTIPLE SPACES /^ / { gsub(/^ / , "" ) } # REMOVE TRAILING SPACES / $/ { gsub(/ $/ , "" ) } # REMOVE TRAILING SPACES / [:]/ { gsub(/ [:]/ , ":" ) } # REMOVE SPACE BEFORE ':' {if(length($0) == 0) {next} } # SKIP EMPTY LINES { split($0,cel,"|") # DELETE ENDING "." FROM JOURNAL NAME i = match(cel[1],/[.]$/) if(i > 0) {cel[1] = substr(cel[1],1,i-1) } s = cel[1] "\t" cel[2] "\t" cel[3] if(s == val) {next} # SKIP DUPLICATE ENTRY val = s # STORE PREVIOUS VALUE TO FIND DUPLICATES print s >> "Journals-Pubmed-Extracted.txt" }
Format Data for this wiki
This script will format the table text for use on this wiki page.
Use:
- Save your Journal Table as a plain text file, with the cells separated by tab and the rows as separate lines
- Save the following script as a file, e.g. this-script-file.awk
- run the gawk script, e.g. gawk -f "this-script-file.awk" your-plain-text-file.txt
- the script will create a new text file, Journals-OOo.txt, that will contain the formatted text, suitable to paste into this wiki page
GAWK SCRIPT
BEGIN { intro = "{| cellspacing=\"0\" cellpading=\"5\" border=\"1\"" print intro >> "Journals-OOo.txt" } # END BEGIN # START ACTUAL PROGRAM <------------------------------------> # DELETE SPACES / / { gsub(/ +/, " ") } # DELETE MULTIPLE SPACES /^ / { gsub(/^ / , "" ) } # REMOVE TRAILING SPACES / $/ { gsub(/ $/ , "" ) } # REMOVE TRAILING SPACES {if(length($0) == 0) {next} } # SKIP EMPTY LINES /\t/ {gsub(/\t/,"\n| ")} { print "|-\n| " $0 >> "Journals-OOo.txt" } END { print "|}" >> "Journals-OOo.txt" }