There are gender wars, and then there are casualties. It wasn’t until 2011 that the behemoth toymaker LEGO acknowledged girls’ desire to build with bricks, even though the company had long before made a seemingly effortless pivot to co-branding, video games, and major motion pictures. So it’s little wonder that girls face all-too-real obstacles when […]Read more
Wikipedia has 4,362,397 articles in English. But how many of those are seriously encyclopedic, and what are the most important articles?
We’ve been looking closely at Wikipedia for an upcoming app. We wanted to know the most important articles. We calculated an importance score for every article, based on how richly linked a Wikipedia article is within Wikipedia (the number and quality of links to a page), how many languages an article has been translated into, the brevity of the title, how popular an articles is (web hits), and the number of citations/references of an article (scholarliness).
The following are our results. This is an arbitrary, but interesting ranking, so we wanted to share it:
Top 100 English Wikipedia articles:
France, Germany, Canada, Australia, England, United_States, China, Japan, Russia, London, Italy, India, Animal, Poland, Brazil, Iran, Spain, California, Romania, Europe, Mexico, Sweden, Scotland, Switzerland, Netherlands, Turkey, Israel, Paris, Philippines, Pakistan, Norway, United_Kingdom, Insect, Indonesia, Denmark, Greece, Arthropod, Belgium, Chicago, Syria, Texas, Argentina, Marriage, Singapore, Egypt, Malaysia, Austria, Ukraine, Taiwan, Virginia, Islam, Wales, Finland, Florida, Ireland, Philadelphia, Portugal, Rome, Azerbaijan, Afghanistan, Latin, Bird, Boston, Pennsylvania, YouTube, Hungary, Serbia, Vietnam, Berlin, Plant, Quebec, Buddhism, Croatia, Massachusetts, Christianity, Bulgaria, World_War_II, Thailand, Facebook, Protein, Earth, Africa, Chile, Village, Species, Iraq, Colombia, Burma, Slovenia, Toronto, Moscow, Cuba, Mathematics, BBC, Montreal, Fungus, Peru, Chordate, Estonia
Jesus, Jews, Nigeria, Lepidoptera, Ontario, Slavery, Ohio, Sydney, Illinois, Napoleon, Basketball, Melbourne, Maryland, Internet, Human, Tokyo, Jazz, Lebanon, Mumbai, Nepal, Istanbul, Bangladesh, Agriculture, Google, Asia, Seattle, Hawaii, Beijing, Warsaw, Iceland, Athens, Philosophy, Venezuela, Atlanta, Michigan, Jerusalem, English_language, Detroit, Cyprus, Guitar, Ethiopia, Vienna, NASA, Kenya, Mollusca, Morocco, Minnesota, Cricket, Association_football, Hinduism, Slovakia, Oxygen, Amsterdam, Bacteria, Algeria, Enzyme, Manhattan, Microsoft, Prague, Alaska, Edinburgh, Television, Belarus, Judaism, Milan, Kerala, Latvia, Vancouver, Mammal, Census, Tennis, DNA, Madrid, Economics, New_York_City, Houston, Oregon, New_Zealand, Baseball, Cancer, Copenhagen, Moon, Barcelona, Dublin, NATO, Manchester, Armenia, Wisconsin, Lithuania, Liverpool, Protestantism, Gene, Madagascar, Indiana, Ecuador, Muhammad, Gold, Sun, Law, Alabama
Hangul, Renaissance, Nazism, Physics, Linux, Bible, Budapest, Water, Hydrogen, Albania, Malta, Baltimore, City, Science, Louisiana, Colorado, Birmingham, Soviet_Union, Antarctica, Stockholm, Jordan, World_War_I, Uruguay, Evolution, HIV/AIDS, Jamaica, Singing, Communism, Somalia, Glasgow, Education, Tanzania, Bolivia, Film, Arizona, Pittsburgh, Kentucky, Libya, Luxembourg, Missouri, Wikipedia, Connecticut, Tuberculosis, Ghana, Euro, Kolkata, Sociology, Alberta, Psychology, Twitter, Novel, Sanskrit, Oklahoma, Zimbabwe, Socialism, Shanghai, Kazakhstan, Aristotle, Anime, UNESCO, Dallas, Religion, Dubai, Dog, Ottawa, Mars, Yemen, Venice, Hamburg, Sicily, South_Africa, Greenland, Delhi, Copper, Asteroid, Biology, Quran, Fish, Los_Angeles, Rice, Munich, Seoul, Catholic_Church, CBS, Watt, Chennai, Miami, Cambodia, Archaeology, Actor, Tennessee, Belgrade, Tunisia, New_York, Atheism, Pope, Christmas, Cameroon, Genus, Vermont
Computer, Caribbean, Brooklyn, European_Union, Democracy, Oslo, Utah, DVD, Iron, Bangkok, Florence, Ecology, Aluminium, History, Frog, Music, Moldova, Chemistry, Horse, Language, God, Sudan, Mongolia, Iowa, Uganda, Denver, Austria-Hungary, Lisbon, Automobile, Qatar, Jakarta, Naples, Nevada, Maize, Panama, Fascism, Maine, Kuwait, Arkansas, Cat, Malaria, Haiti, Medicine, Augustus, Star, Kiev, Dinosaur, Hindi, Beetle, Mississippi, Newspaper, San_Francisco, Lutheranism, Sugar, Amphibian, Moth, Brussels, Damascus, Muslim, Album, Cleveland, Piano, Bahrain, Midfielder, Reptile, Eminem, Nicaragua, Cairo, Hong_Kong, Plato, Korea, Germans, Culture, Maharashtra, IBM, South_Korea, Bristol, Petroleum, Homosexuality, NBC, Minneapolis, Macau, Guatemala, Angola, Monaco, Uzbekistan, Manitoba, Manila, Bavaria, Karnataka, United_Nations, Astronomy, Tree, River, Namibia, Belfast, Kansas, Spanish_language, Poetry, Geneva
University, Americas, Frankfurt, Laos, Charlemagne, Electron, Al-Qaeda, Population, Queensland, Virus, Bangalore, Brisbane, Engineering, Blues, Wheat, Submarine, Hollywood, Barack_Obama, Calgary, Cornwall, Sri_Lanka, IPhone, Poverty, Cologne, Blog, Chess, Atom, Steel, Scandinavia, Cardiff, Snake, Shiva, Helsinki, Carbon, Rock_music, Globalization, Zinc, Suicide, Prussia, Mali, Catholicism, Roman_Empire, Fruit, Linguistics, Manga, Fiji, Middle_Ages, Eukaryote, Radio, Brain, Tehran, Canberra, Edmonton, Milk, Coal, Perth, Alps, Liberia, Stroke, Kosovo, Coffee, Anthropology, Cincinnati, Theology, Municipality, Lion, Pneumonia, Crusades, Hertz, Government, Catalonia, Montenegro, Capitalism, Milwaukee, Cattle, Honduras, Wyoming, North_America, Mauritius, French_language, Oman, Food, Electricity, Bucharest, Volleyball, Vikings, Christian, Auckland, Sheep, Lawyer, Liberalism, Telecommunication, Tourism, Ethanol, Elephant, Gujarat, Winnipeg, Kyrgyzstan, Gibraltar, Earthquake
Volcano, Paraguay, Feminism, Turin, Sculpture, MTV, Lake, Senegal, Freemasonry, Painting, Butterfly, Beirut, Saskatchewan, Jupiter, Bhutan, Boxing, Advertising, Silver, Marxism, HIV, Adelaide, Siberia, Marseille, Czechoslovakia, Ottoman_Empire, Brunei, Nebraska, Karachi, Gastropoda, Golf, Urdu, Idaho, Constantinople, Forest, Wine, Mesopotamia, Theatre, Endemism, Baghdad, Oxford, Technology, Nitrogen, Leeds, Anatolia, Delaware, War, Palestine, Belize, Sony, Bollywood, Statistics, Tasmania, Schizophrenia, Johannesburg, Art, Terrorism, Suriname, Stuttgart, Mozambique, Pregnancy, Lead, Racism, Intel, Wii, Toyota, Potato, Vietnam_War, Temperature, Geology, American_Civil_War, Thessaloniki, Greeks, Opera, Biodiversity, Guam, Bermuda, Zambia, Photography, Beer, Extinction, Czech_Republic, Spider, Saudi_Arabia, Balkans, American_football, Rihanna, Barbados, Sport, Desert, Ultraviolet, Cambridge, Anarchism, Email, Baptism, Antisemitism, Java, Kent, Indianapolis, German_language, Politics
Mecca, Drama, Jainism, Sufism, Moses, Metallica, Tibet, Sheffield, Ecosystem, Taliban, Metabolism, Conservatism, Batman, Algorithm, Crete, Cocaine, Alcohol, New_Jersey, Planet, Celts, Zagreb, Honolulu, Coca-Cola, Lyon, Mountain, Venus, Vertebrate, Abortion, Bat, Violin, Romanticism, Maldives, Sofia, Yorkshire, Superman, Honda, Nintendo, Havana, Meat, Anglicanism, Republic, Inflation, Guyana, Ammonia, Jay-Z, Geography, Fossil, Copyright, Neolithic, Sulfur, Sharia, Energy, Helicopter, Mineral, Guangzhou, Genetics, Blood, Ship, Obesity, Diamond, Cold_War, Smallpox, Osaka, Bishop, Yahoo!, Yugoslavia, Chad, Library, Physician, Bratislava, Tajikistan, Andalusia, Asphalt, Ethics, Red, Methodism, HBO, Lima, Professor, Town, Prostitution, Apple, Writer, Puerto_Rico, Blue, Tax, Taoism, Liver, CNN, Time, Sardinia, HTML, Myspace, Architecture, Hydroelectricity, Taipei, Potassium, William_Shakespeare, George_Washington, Pinyin
Uranium, Riga, Hypertension, Ljubljana, Cotton, Bihar, Wiki, Wellington, Calcium, X-ray, ITunes, Soil, Elizabeth_II, Quakers, Macintosh, Mayor, Honey, Flower, Alcoholism, Satire, Country, Assam, Lancashire, Walmart, Soybean, Himalayas, Concrete, Asthma, Mining, Antwerp, Lahore, Baku, Gospel, Montevideo, Feudalism, Castle, Allmusic, WWE, Genoa, Police, Calvinism, Yoga, Primate, Alexandria, Saturn, Eritrea, Saint_Petersburg, Krishna, Homer, Lesbian, Barley, Dresden, Antibacterial, Logic, Baptists, Turkmenistan, Ant, Mitochondrion, Rape, Strasbourg, Leipzig, Judo, Kidney, Bali, Tiger, Nationalism, Mythology, Heart, Disease, Botswana, Seville, Dhaka, Salt, Insurance, Algae, Michael_Jackson, Malayalam, BMW, Unicode, Sodium, Tobacco, Satellite, Oak, Patent, Metro-Goldwyn-Mayer, Banana, Harvard_University, Bank, Rapping, IPad, PHP, Byzantine_Empire, Organism, Vilnius, Mosque, Santiago, Sparta, Marketing, Mahabharata, Slavs
Synthesizer, Transylvania, Talmud, Book, Nokia, Malawi, French_Revolution, Magnesium, Glacier, Rajasthan, Danube, Constitution, Cher, Hewlett-Packard, Cheese, Tea, Crustacean, Liechtenstein, Dorset, Software, Agnosticism, Photosynthesis, Northern_Ireland, Anatomy, Flowering_plant, Nile, Guinea, Infrared, Oceania, Helium, Gothenburg, Rotterdam, Sarajevo, Wi-Fi, North_Korea, Ronald_Reagan, Immigration, Friends, Easter, Apollo, Glass, Goa, Sex, Queens, Cholera, Geometry, Plastic, Ocean, Muscle, Reggae, Microsoft_Windows, FIFA, Andorra, Russians, Tallinn, Autism, EMI, Gravitation, Smartphone, Shark, Pornography, Olympic_Games, Tram, Tornado, York, Xinjiang, Website, Vegetarianism, Influenza, Ancient_Rome, UEFA, Limestone, Database, Sea, Leaf, Zoroastrianism, Universe, Motorcycle, Politician, Museum, Chromosome, Trinity, Samoa, Torah, Hezbollah, Bologna, Bill_Clinton, Death, Rhine, Deforestation, Nickel, Romanization, Vagina, Abraham_Lincoln, Metal, Eucharist, Burundi, Southampton, Akbar, Thermodynamics
No ranking is perfect, and importance is subjective. Some people will want to have more asteroids or car models, others will want more football players or music albums. However, the above listing is relatively stable — meaning if we adjust the relative weights of various factors, the articles will reshuffle a little, but the list looks basically the same.
Another side effect of ranking Wikipedia articles is that we can evaluate the signal to noise ratio. Very loosely speaking, we believe that approximately half a million Wikipedia articles are solid Encyclopedic topics. The remaining 3.8 million tend to include geographical locations (e.g., a town in Siberia), popular culture artifacts (music albums, old TV shows), lesser companies, politicians and sports figures and other people. Often the lowest-ranking articles were wikispam, and were already removed from Wikipedia by dutiful Wikipedia editors.