Viewing post #1827764 by Australis

You are viewing a single post made by Australis in the thread called Diacritics.
Image
Oct 1, 2018 5:35 AM CST
Plants SuperMod
Name: Joshua
Melbourne, Victoria, Australia (Zone 10a)
Köppen Climate Zone Cfb
Plant Database Moderator Forum moderator Region: Australia Cat Lover Bookworm Hybridizer
Orchids Lilies Irises Seed Starter Container Gardener Garden Photography
Baja_Costero said:I don't know the ins and outs of searching but I would imagine if you strip the diacritical marks from input strings and target strings (convert all instances of ñ to n for searches, for example) you will get the most comprehensive results. The same would work in theory for accented vowels (a search for o would also return instances of ó).


Unfortunately it's not simple to do this. Computers can't easily "see" that an accented character is similar to the non-accented ones used in the basic ASCII character set. There would need to be a huge reference table matching all the various accented characters with the ASCII set. Whilst this is quite possible, it would take some time (plus the difficulty of deciding which character sets to support) and the cost-to-benefit ratio would make it a very low priorty for Dave.

To make the system easy to maintain and operate, we limit entries in the database to using the ASCII character set. Unfortunately this does mean that some cultivars are not accurately represented in the database, but it does mean most people will be able to find them easily.

zuzu said:... cultivars are alphabetized according to the name in the cultivar field, not the name in the also-sold-as field. Most of the people who are active in forums with custom databases (roses, irises, daylilies, etc.) search for cultivars in the alphabetical listings and don't use the search engine for that purpose. Also, when mods and admins process new plant proposals, they look through the alphabetical listings to see whether the plant is already there under a slightly different spelling. They can't use the search engine for that purpose. Some of these custom databases are quite large. A daylily misalphabetized because of a diacritical mark could be pages and pages away from its appropriate spot in the alphabetical listings.


To add to Zuzu's comment, the reason the a cultivar name with a diacritical mark or accented character may be a long way from where one would expect it is because even the "alphabetical" order we are accustomed to is actually determined by the order of the characters in the character set. The ASCII character set is the typical basis for many of the extended character sets now in use and so accented characters or characters with diacritical marks would actually be listed after the complete ASCII set (i.e. all the upper and lower case English alphabet and numerals).

As Zuzu has already mentioned, the easiest way to allow for names with diacritics is to add them to the also-sold-as field. This allows for both the ASCII version and the correct cultivar name to be searched.
Plant Authorities: Catalogue of Life (Species) --- International Cultivar Registration Authorities (Cultivars) --- RHS Orchid Register --- RHS Lilium Register
My Notes: Orchid Genera HTML PDF Excel --- Lilium Traits HTML PDF --- Lilium Species Crosses HTML PDF Excel --- Lilium Species Diagram
The current profile image is that of Iris 'Volcanic Glow'.

« Return to the thread "Diacritics"
« Return to Plant Database forum
« Return to the Garden.org homepage

Member Login:

( No account? Join now! )

Today's site banner is by Lucius93 and is called "Gerbera"

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.