Difference between revisions of "Bibliographic/OOoBib Functional Requirements/Name Sorting"
m |
|||
Line 1: | Line 1: | ||
− | The sorting and cataloguing of names is complex subject. Library cataloguers have developed many rules that they use in their attempts to provide a coherent index structure on an vast and variable range of reference materials. | + | The sorting and cataloguing of names and subjects is complex subject. Library cataloguers have developed many rules that they use in their attempts to provide a coherent index structure on an vast and variable range of reference materials. |
There are two processes that are used by Library cataloguers associated with the sorting of bibliographic data by name and subject. | There are two processes that are used by Library cataloguers associated with the sorting of bibliographic data by name and subject. | ||
− | The first is the ‘standardisation’ of the names and subjects through a process of applying a list of somewhat arbitrary rules to modify them for sorting. But not modifying the for display. | + | The first is the ‘standardisation’ of the names and subjects through a process of applying a list of somewhat arbitrary rules to modify them for sorting. But not modifying the for display. |
The second is the alphabetical sorting process which is subject to the rules of the language and character set used. | The second is the alphabetical sorting process which is subject to the rules of the language and character set used. | ||
− | The list of rules used for the pre-sorting process is complex and some of them I would imagine be difficult to reliably automate. (For example: the work "The 1847 issue of U. S. stamps. " is catalogued as "Eighteen forty-seven issue of U. S. stamps.") | + | The list of rules used for the pre-sorting process is complex and some of them I would imagine be difficult to reliably automate. (For example: the work "The 1847 issue of U. S. stamps. " is catalogued as "Eighteen forty-seven issue of U. S. stamps.") Although some of the name rules such as 'Ignore initial al- in Arabic names' or the treatment of Von van Sir, Lord etc could be automated. |
These rules are language specific. And I might guess country or even institution specific. In the examples I quoted in the reference below, some the rule might even be offensive to some people (For example: the rule for treating “R. Academia nazionale dei Lincei, Rome” is “Ignore foreign royalty (except British)”) | These rules are language specific. And I might guess country or even institution specific. In the examples I quoted in the reference below, some the rule might even be offensive to some people (For example: the rule for treating “R. Academia nazionale dei Lincei, Rome” is “Ignore foreign royalty (except British)”) | ||
− | So proper treatment of name and subject sorting requires, at least national pre-sort processing modules, and national language sorting. | + | So proper treatment of name and subject sorting requires, at the least, national pre-sort processing modules, and national language sorting modules. |
− | The list comes from student exercise in ''The Art of Computer Programming, Volume 3: Sorting and Searching'' by Donald E. Knuth | + | The list below comes from a student exercise in ''The Art of Computer Programming, Volume 3: Sorting and Searching'' by Donald E. Knuth |
<pre> | <pre> |
Revision as of 01:24, 25 October 2006
The sorting and cataloguing of names and subjects is complex subject. Library cataloguers have developed many rules that they use in their attempts to provide a coherent index structure on an vast and variable range of reference materials.
There are two processes that are used by Library cataloguers associated with the sorting of bibliographic data by name and subject.
The first is the ‘standardisation’ of the names and subjects through a process of applying a list of somewhat arbitrary rules to modify them for sorting. But not modifying the for display.
The second is the alphabetical sorting process which is subject to the rules of the language and character set used.
The list of rules used for the pre-sorting process is complex and some of them I would imagine be difficult to reliably automate. (For example: the work "The 1847 issue of U. S. stamps. " is catalogued as "Eighteen forty-seven issue of U. S. stamps.") Although some of the name rules such as 'Ignore initial al- in Arabic names' or the treatment of Von van Sir, Lord etc could be automated.
These rules are language specific. And I might guess country or even institution specific. In the examples I quoted in the reference below, some the rule might even be offensive to some people (For example: the rule for treating “R. Academia nazionale dei Lincei, Rome” is “Ignore foreign royalty (except British)”)
So proper treatment of name and subject sorting requires, at the least, national pre-sort processing modules, and national language sorting modules.
The list below comes from a student exercise in The Art of Computer Programming, Volume 3: Sorting and Searching by Donald E. Knuth
Text of card Remarks R. Academia nazionale dei Lincei, Rome Ignore foreign royalty (except British) 1812; ein historischer roman. Achtzehnhundert zwöf Bibliothèque d´histoire révolutionnaire. Treat apostrophe as space in French Bibliothèque des curiosités. Ignore accents on letters Brown, Mrs. J. Crosby Ignore designation of rank Brown, John Names with dates follow those without Brown, John, mathematician ...the latter are subarranged by Brown, John, of Boston descriptive words Brown, John, 1715-1766 Arrange identical names by birthdate BROWN, JOHN, 1715-1766 Works “about” follow works “by” Brown, John, d. 1811 Sometimes birthdate must be estimated Brown, Dr. John, 1810-1882 Ignore designation of rank Brown-Williams, Reginald Makepeace Hyphen treated as space Brown America. Book titles follow compound names Brown & Dallison’s Nevada directory. & in English becomes and Brownjohn, Alan Den’, Vladimir Éduardovich, 1867 Ignore apostrophe in names The den. Ignore an initial article Den lieben sssen mkdeln. . . . provided its in nominative case Dix, Morgan, 1827-1908 Names before words 1812 ouverture. Dix-huit cent douze Le XIXe sièle français. Dix-neuvième The 1847 issue of U. S. stamps. Eighteen forty-seven .1812 overture. Eighteen twelve I am a mathematician. (by Norbert Weiner) IBM journal of research and development. Initials are like one-letter words ha-I ha-chad. Ignore initial article Ia; a love story. Ignore punctuation in titles International Business Machines Corporation al-Khuw~rizmi, Muhammad ibn ~ fl. 813-846 Ignore initial al- in Arabic names Labour; a magazine for all workers. Labor research association Respell it Labor Labour, see Labor Cross-reference card McCall´s cookbook Ignore apostrophes in English McCarthy, John, 1927 Mc = Mac Machine-independent computer Treat hyphen as space programming. MacMahon, Maj. Percy Alexander, 1854-1929 Ignore designation of rank Mrs. Dalloway. Mistress of mistresses. Mrs. = Mistress Royal society of London St. Petersburger Zeitung. Saint-Saës, Camille, 1835-1921 St. = Saint, even in German Ste. Anne des Monts, Quebec Sainte Seminumerical algorithms. Uncle Toms cabin. U. S. Bureau of the census. U. S. = United States Vandermonde, Alexander Théphile, 1735-1796 Van Valkenburg, Mac Elwyn, 1921- Ignore space after prefix in surnames Von Neumann, John, 1903-1957 The whole art of legerdemain. Who´s afraid of Virginia Woolf? Ignore apostrophe in English Wijngaarden, Adriaan van, 1916- Surname doesn't begin with lower case letter
Most of these rules are subject to certain exceptions, and there are many other rules not illustrated here.