The two datasets "mushroom.training" and "mushroom.test" were created using the "Mushroom Database". The original dataset had another attribute that had missing values and has been removed. The training dataset contains 7423 examples and the test dataset 701 examples. The first attribute is the class of each example ("e" or "p") and the rest (21 attributes) are categorical (nominal) attributes. Next, we have the description of the database taken from UCI Knowledge Discovery in Databases Archive. http://kdd.ics.uci.edu/ ==================================================================================================================== 1. Title: Mushroom Database 2. Sources: (a) Mushroom records drawn from The Audubon Society Field Guide to North American Mushrooms (1981). G. H. Lincoff (Pres.), New York: Alfred A. Knopf (b) Donor: Jeff Schlimmer (Jeffrey.Schlimmer@a.gp.cs.cmu.edu) (c) Date: 27 April 1987 3. Past Usage: 1. Schlimmer,J.S. (1987). Concept Acquisition Through Representational Adjustment (Technical Report 87-19). Doctoral disseration, Department of Information and Computer Science, University of California, Irvine. --- STAGGER: asymptoted to 95% classification accuracy after reviewing 1000 instances. 2. Iba,W., Wogulis,J., & Langley,P. (1988). Trading off Simplicity and Coverage in Incremental Concept Learning. In Proceedings of the 5th International Conference on Machine Learning, 73-79. Ann Arbor, Michigan: Morgan Kaufmann. -- approximately the same results with their HILLARY algorithm 4. Relevant Information: This data set includes descriptions of hypothetical samples corresponding to 23 species of gilled mushrooms in the Agaricus and Lepiota Family (pp. 500-525). Each species is identified as definitely edible, definitely poisonous, or of unknown edibility and not recommended. This latter class was combined with the poisonous one. The Guide clearly states that there is no simple rule for determining the edibility of a mushroom; no rule like ``leaflets three, let it be'' for Poisonous Oak and Ivy. 5. Number of Instances: 8124 6. Number of Attributes: 22 (all nominally valued) 7. Attribute Information: (classes: edible=e, poisonous=p) 1. cap-shape: bell=b,conical=c,convex=x,flat=f, knobbed=k,sunken=s 2. cap-surface: fibrous=f,grooves=g,scaly=y,smooth=s 3. cap-color: brown=n,buff=b,cinnamon=c,gray=g,green=r, pink=p,purple=u,red=e,white=w,yellow=y 4. bruises?: bruises=t,no=f 5. odor: almond=a,anise=l,creosote=c,fishy=y,foul=f, musty=m,none=n,pungent=p,spicy=s 6. gill-attachment: attached=a,descending=d,free=f,notched=n 7. gill-spacing: close=c,crowded=w,distant=d 8. gill-size: broad=b,narrow=n 9. gill-color: black=k,brown=n,buff=b,chocolate=h,gray=g, green=r,orange=o,pink=p,purple=u,red=e, white=w,yellow=y 10. stalk-shape: enlarging=e,tapering=t 11. stalk-surface-above-ring: ibrous=f,scaly=y,silky=k,smooth=s 12. stalk-surface-below-ring: ibrous=f,scaly=y,silky=k,smooth=s 13. stalk-color-above-ring: brown=n,buff=b,cinnamon=c,gray=g,orange=o, pink=p,red=e,white=w,yellow=y 14. stalk-color-below-ring: brown=n,buff=b,cinnamon=c,gray=g,orange=o, pink=p,red=e,white=w,yellow=y 15. veil-type: partial=p,universal=u 16. veil-color: brown=n,orange=o,white=w,yellow=y 17. ring-number: none=n,one=o,two=t 18. ring-type: cobwebby=c,evanescent=e,flaring=f,large=l, none=n,pendant=p,sheathing=s,zone=z 19. spore-print-color: black=k,brown=n,buff=b,chocolate=h,green=r, orange=o,purple=u,white=w,yellow=y 20. population: abundant=a,clustered=c,numerous=n, scattered=s,several=v,solitary=y 21. habitat: grasses=g,leaves=l,meadows=m,paths=p, urban=u,waste=w,woods=d 22. stalk-root: bulbous=b,club=c,cup=u,equal=e, rhizomorphs=z,rooted=r,missing=? 8. Missing Attribute Values: 2480 of them (denoted by "?"), all for attribute #22 that has been removed. 9. Class Distribution for training set: -- edible: 3851 (51.8%) -- poisonous: 3572 (48.2%) -- total: 7423 instances