Morph Codes [updated 6/85] KEY TO CODING USED IN PACKARD'S MORPHOLOGICAL ANALYSIS Coding for CATSS Morphological Output "TYPE" CODES (3 columns maximum, to identify part of speech) N = Noun (up to 3 columns) V = Verbs (up to 3 cols.) N1 = 1st declension (fem. in -H) (I in col.3 = augmented, N1A = stem ending in -A (fem.) whatever the stem) N1M = masc. with nom. in -HS "Progressive" Stems N1S = stem in -H, nom. in -A (f.) V1 = regular present N1T = masc. with nom. in -AS V2 = contracts in -EW N2 = 2nd declension (masc./fem.) V3 = " in -AW N2N = neuters (in -ON) V4 = " in -OW N3 = 3rd declension V5 = regular -MI verbs N3D,E,G,H,I,K,M,N,P,R,S,T,U,V,W V6 = -A stem -MI verbs indicate categories of 3rd V7 = -E stem " " decl. nouns (see other sheet) V8 = -O stem " " N alone= indeclinable proper noun V9 = EI)MI/ and EI)=MI A = Adjective (up to 3 cols.) Aorist Stems A1 = -OS/-H/-ON pattern endings VA = 1st aorist active A1A = -OS/-A/-ON VB = 2nd aorist act. #1 A1B = -OS/-OS/-ON VZ = 2nd " " #2(irreg) A1C = -OUS/-OUS/-OUN VH, VE, VO = -H, -E, A1S = nom. in -A, stem in -H -O stem -MI verbs A3 = 3rd declension patterns VC = #1 aor.& fut.pass.(Q-type) A3E,H,N,U,C as for 3 decl. nouns VD = #2 " " " " (non-Q) R = Pronouns (2 Columns) VV = labial " " " " RA = Article VS = dental " " " " (+ zeta) RD = Demonstrative VQ = guttural" " " " RI = Interrogative/Indefinite Perfect Stems RP = Personal/Possessive (I in col.3 = Plupf.augm.) RR = Relative VX = perfect active RX = O(/STIS VM = " middle VP = labial perf. midd. C = Conjunction VT = dental " " (+ zeta) X = Particle VK = guttural " " I = Interjection Future Stems M = Indeclinable Number VF = regular future P = Preposition VF2 = liquid type(+ zeta) D = Adverb VF3 = E)LAU/NW type VFX = future perfect "PARSE" CODE (up to 6 columns, as needed, to parse each form) Nouns and Pronouns (3 columns) Verbs (to 5 cols, exc.Ptcp) col.1=case: N(om) G(en) D(at) col.1=tense: P(resent) A(ccus) V(ocative) I(mperfect) F(ut) A(or) col.2=number: S(ing) D(ual) P(l) X(Perfect) Y(Pluperfect) col.3=gender: M(asc) F(em) N(eut) col.2=voice: A(ctive) Adjectives (up to 4 columns) M(iddle) P(assive) cols.1-3 as with Nouns (above) col.3=mood: I(ndicative) col.4=degree if irregular: D(Imperative) S(ubjunct) C(ompar) S(uperl) O(pt) N(Infin) P(Ptcp) col.4=person: 1 2 3 col.5=number as Noun col.2 cols.4-6 Ptcpl as for Noun UNPACKING THE LXX MORPHOLOGICAL ANALYSIS FILES As an experiment in compressing the size of files in which the columnar structure produces many blank spaces, the Morphological Analysis files have been "packed" in the following manner: The original structure of the files designated cols. 1-25 for the text word, 26-36 for the analysis coding, and 37 onward for the dictionary form, in which as many as four separate elements might be recorded if a word contains three prefixal forms (rarely; two is more frequent, e.g. I(/STHMI A)PO KATA). The packed files replace any blank space between the first two fields with the symbol "<", and any space at the end of the second field with ">"; if the final field contains one or more prefixed forms, the blank space between the root form and the first prefix is replaced with "@". Examples of records in unpacked and packed forms follow: KAI\ C KAI/ KAI\KAI/ GA/R X GA/R GA/RGA/R O( RA NSM O( O(O( AU)TO\N RD ASM AU)TO/S AU)TO\NAU)TO/S LE/GOUSA V1 PAPNSF LE/GW LE/GOUSALE/GW E)CHLEI/FQHSAN VVI API3P A)LEI/FW E)K E)CHLEI/FQHSANA)LEI/FW@E)K E)CANE/TEILEN VAI AAI3S TE/LLW E)K A)NA E)CANE/TEILENTE/LLW@E)K A)NA A program to unpack the MORPH files is available for IBYCUS and IBM/DOS machines. It reconstructs the column structure so that the text appears in columns 1-25, the analysis in 26-36, the dictionary form from 37 onward, and any prefixes in 56 onward (allowing 6 spaces for each prefix). Its basic algorithm is as follows: Get each record from the file. If the record begins with tilde (~), keep it intact; otherwise, if the record contains "<" and ">", pad the material before the "<" to 25, pad the material between "<" and ">" to 11, and eliminate the "<" and ">" symbols. If the record contains "@", pad the material after ">" and before "@" to 19, eliminate the "@", and retain the remaining material.