Data Is Plural — Structured Archive

A collection of interesting data sets to learn from - 1.7 billion Reddit comments, how America injures itself and many others!

data
journalism
science
  1. Home
  2. Google Sheet
  3. Data Is Plural — Structured Archive

Data Is Plural — Structured Archive

A collection of interesting data sets to learn from - 1.7 billion Reddit comments, how America injures itself and many others!

data, journalism, science

edition position headline text links hattips

2015.10.21 1 Every place name in the United States. Sometimes, bureaucracy creates poetry. Since 1890, the U.S. Board on Geographic Names has been cataloguing, standardizing, and promulgating official names for the places we hike, swim, work, and call home. Along the way, it began publishing Geographic Names Information System (GNIS), a searchable and downloadable database containing all of its domestic nomenclature. In Alaska alone, the database lists names for 167 dams, 303 post offices, 666 glaciers, 2,704 capes, and 9,575 streams. My favorite: Confusion Creek. [h/t @emilymbadger] http://geonames.usgs.gov/index.html http://geonames.usgs.gov/domestic/index.html https://www.google.com/maps/place/Confusion+Creek,+Alaska/@68.4510925,-152.0233116,15.94z/data=!4m2!3m1!1s0x50d80cfac6a29911:0xc46bfa2a83d54866 https://twitter.com/emilymbadger/status/653982851386310656

2015.10.21 2 “There’s finally federal data on low-income college graduation rates—but it’s wrong.” The Hechinger Report casts doubt on the Pell grant graduation numbers contained in the Department of Education’s recently-released College Scorecard. Why the discrepancy? “[W]hile schools are required by law to provide the graduation rates of Pell recipients to any applicants who ask, a loophole protects them from having to report the same figures to the government.” Oof. http://hechingerreport.org/theres-finally-federal-data-on-low-income-college-graduation-rates-but-its-wrong/ https://collegescorecard.ed.gov/data/

2015.10.21 3 What police-related data does your city publish? The Police Open Data Census, created by Code for America fellows in Indianapolis, is tracking “currently available open datasets about police interactions with citizens in the US," including officer-involved shootings, use of force, and citizen complaints. The census currently covers 36 police departments. Related: The NYPD says it will start tracking all officer use-of-force incidents — not just gunfire — next year, the New York Times reports. https://codeforamerica.github.io/PoliceOpenDataCensus/ http://www.nytimes.com/2015/10/01/nyregion/new-york-police-will-document-virtually-all-instances-of-force.html

2015.10.21 4 How often do Wikipedia editors edit? The Wikimedia Foundation has published a dataset enumerating monthly revision counts for every editor, across all of its wikis. The foundation is asking for help investigating a few perplexing trends. For example: Why have the number “very active editors” — those with 100+ edits per month — increased while the number of merely “active” editors have plateaued? https://blog.wikimedia.org/2015/09/25/wikipedia-editor-numbers/

2015.10.21 5 Four years of rejected license plates. WNYC, through a freedom-of-information request to the New York DMV, obtained a list of vanity plate approvals and denials from late 2010 to late 2014. Among the denials: “RUBMYDUB,” “S5SS5S5S,” “RFLMAO,” and “CBSNEWS.” (Strangely, “NBC4” was approved. Go figure.) The files and related story were published in August, but the data are timeless. [h/t @veltman] https://github.com/datanews/license-plates http://www.wnyc.org/story/new-yorkers-vanity-license-plates/ https://twitter.com/veltman/status/628972777882652672

2015.10.28 1 Data-shaming the robocallers. If you can’t beat ‘em, post spreadsheets about ‘em. Earlier this month, the Federal Communications Commission started publishing a dataset of complaints against telemarketers and robocalls. The FCC says the file will be updated weekly. It’s already being put to use: A clever programmer has crammed all the offending numbers into a single phone “contact” so that you can block them all at once. [h/t Shale Craig] https://consumercomplaints.fcc.gov/hc/en-us/articles/205239443-Data-on-Unwanted-Calls https://github.com/shalecraig/telemarketing https://twitter.com/__shale__/status/657423817623506944

2015.10.28 2 The demographics of traffic stops. This weekend, the New York Times published a front-page article on “the disproportionate risk of driving while black.” Among other findings: “officers were more likely to conduct [searches] when the driver was black, even though they consistently found drugs, guns or other contraband more often if the driver was white.” The investigation drew on several statewide traffic-stop datasets that track the race and gender of stopped drivers. The “seven states with the most sweeping reporting requirements,” in order of how easy it seems (to me) to get detailed data: Connecticut, North Carolina, Missouri, Nebraska, Maryland, Illinois, and Rhode Island. http://www.nytimes.com/2015/10/25/us/racial-disparity-traffic-stops-driving-black.html http://ctrp3.ctdata.org/ http://trafficstops.ncdoj.gov/Default.aspx?pageid=2 https://www.ago.mo.gov/home/vehicle-stops-report http://www.ncc.nebraska.gov/statistics/trafficstops/ http://www.goccp.maryland.gov/msac/law-enforcement.php http://www.idot.illinois.gov/transportation-system/local-transportation-partners/law-enforcement/illinois-traffic-stop-study http://www.ri.gov/press/view/23152

2015.10.28 3 Where do Americans spend their days? Most population numbers tell you where people live. But legions of Americans commute for work across city, county, and state lines. The Census Bureau’s Commuter-Adjusted Daytime Population Data accounts for these daily migrations. Manhattan’s population (non-tourist) population doubles from 1.5 million to 3 million, by far the largest influx by raw numbers. But Lake Buena Vista, Fla., takes the percentage-growth prize. The city’s entire resident population could fit in two sedans, but its “daytime population” includes 33,000 workers — including a not-insubstantial number dressed as Mickey Mouse. [h/t Steven Romalewski] https://www.census.gov/hhes/commuting/data/daytimepop.html https://en.wikipedia.org/wiki/Lake_Buena_Vista,_Florida https://twitter.com/SR_spatial/status/656827844128034816

2015.10.28 4 Finally, free access to detailed U.S. import/export data. Prior to October 15th, the Census Bureau’s USA Trade Online tool cost $300/year. No longer. The newly-free dataset covers more than 17,000 commodities, including a category for “magic tricks, practical joke articles; parts and accessories.” [h/t Noah Veltman] https://usatrade.census.gov/ http://www.census.gov/newsroom/press-releases/2015/cb15-tps87.html https://en.wikipedia.org/wiki/Harmonized_Tariff_Schedule_for_the_United_States http://www.census.gov/foreign-trade/statistics/graphs/GOTM/201508/index.html https://twitter.com/veltman

2015.10.28 5 Porn. Sexualitics.org is on a mission: “to contribute to human sexuality understanding through a Big Data approach.” Last year, the site posted detailed metadata on 800,000 adult videos, including titles, descriptions, view counts, and tags. It powers Porngram, an only-kinda-safe-for-work charting tool. http://sexualitics.org/ http://sexualitics.github.io/ http://porngram.sexualitics.org/

2015.11.04 1 Maternity leave policies at hundreds of American companies. The 600+ entries in this searchable, sortable database range from 3M to Amazon to Zynga, and list both paid and unpaid leave. The database, run by the women-in-the-workplace website FairyGodBoss.com, culls from published policies and employee tips. An introductory blog post provides more information. https://fairygodboss.com/maternity-leave-resource-center https://www.fairygodboss.com/ http://blog.fairygodboss.com/2015/10/21/our-maternity-leave-database-is-here/

2015.11.04 2 MoMA, mo’ data. This July, the Museum of Modern Art published a dataset containing 120,000 artworks from its catalog, joining the UK’s Tate, the Smithsonian’s Cooper Hewitt, and other forward-thinking museums. The MoMA data contains the names of the artwork and artist, the dates created and acquired, and the medium — but no images. Related: Artist Jer Thorp encourages you to “perform” the data. Also related: Every museum in the United States. [h/t Nadja Popovich] https://github.com/MuseumofModernArt/collection https://github.com/tategallery/collection https://github.com/cooperhewitt/collection http://www.penn.museum/collections/data.php https://www.rijksmuseum.nl/en/api https://www.brooklynmuseum.org/opencollection/api/ https://medium.com/@blprnt/a-sort-of-joy-1d9d5ff02ac9 https://www.imls.gov/research-evaluation/data-collection/museum-universe-data-file https://twitter.com/popovichn

2015.11.04 3 All licensed firearm dealers since 2010. The Bureau of Alcohol, Tobacco, Firearms, and Explosives publishes a searchable and downloadable licensing database. License-holders fall into eleven categories. Among them: run-of-the-mill dealers, ammunition manufacturers, collectors of “curios and relics,” pawnbrokers, and importers of “destructive devices.” The ATF’s website contains monthly and state-by-state archives. [h/t Marc DaCosta] [Correction, 2015-11-04: There are only nine categories of license-holders. The published ATF data includes only eight of them; it does not include "Collector of Curios and Relics." Thanks to @MikeStucka for flagging this mistake.] https://data.atf.gov/Licensees/Federal-Firearms-Licensee-Listing-2010-to-2015/qg4c-kex6 https://www.atf.gov/firearms/curios-relics https://www.atf.gov/firearms/firearms-guides-importation-verification-firearms-national-firearms-act-definitions-1 https://www.atf.gov/firearms/listing-federal-firearms-licensees-ffls-2015 https://twitter.com/marc_dacosta

2015.11.04 4 One thousand ways to say “dog.” Trans-New Guinea is the world’s third-largest language family. But it’s also among the poorest-studied. TransNewGuinea.org, an online database launched in 2013, is trying to change that. It now contains more than 1,000 New Guinea languages and lists 145,000 word translations — including 1,065 entries for “dog.” It even has an API. A recent PLOS ONE journal article provides additional background and statistics. [h/t Simon J. Greenhill] http://transnewguinea.org/ http://transnewguinea.org/langu

Data Is Plural — Structured Archive
Info
Tags Data, Journalism, Science
Type Google Sheet
Published 19/04/2024, 23:14:05

Resources

Data & Narratives Revisited
Journalists Helping Journalists
Bellingcat's Online Investigation Toolkit [bit.ly/bcattools]
Women Experts- India's Foreign Policy
UPDATE: Re Julian Assange in Belmarsh (to 7 Oct 2020)
FA/Bellingcat - Police Brutality and the BLM Protests - Public Dataset
Publishers and journalists on TikTok
Journalist Incidents Covering Floyd Protests