File: count_us_census.py

package info (click to toggle)
chromium 138.0.7204.183-1
  • links: PTS, VCS
  • area: main
  • in suites: trixie
  • size: 6,071,908 kB
  • sloc: cpp: 34,937,088; ansic: 7,176,967; javascript: 4,110,704; python: 1,419,953; asm: 946,768; xml: 739,971; pascal: 187,324; sh: 89,623; perl: 88,663; objc: 79,944; sql: 50,304; cs: 41,786; fortran: 24,137; makefile: 21,806; php: 13,980; tcl: 13,166; yacc: 8,925; ruby: 7,485; awk: 3,720; lisp: 3,096; lex: 1,327; ada: 727; jsp: 228; sed: 36
file content (35 lines) | stat: -rwxr-xr-x 913 bytes parent folder | download | duplicates (21)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
#!/usr/bin/python
import sys
import codecs

def usage():
    return '''
This script converts surname/name data from the US 1990 census into a format zxcvbn
recognizes. To use, first obtain the census files:

http://www2.census.gov/topics/genealogy/1990surnames

download dist.all.last, dist.female.first and dist.male.first

Then run:

%s dist.all.lst      ../data/surnames.txt
%s dist.female.first ../data/female_names.txt
%s dist.male.names   ../data/male_names.txt

for each file.
''' % [sys.argv[0]] * 3

def main(input_filename, output_filename):
    with codecs.open(output_filename, 'w', 'utf8') as f:
        for line in codecs.open(input_filename, 'r', 'utf8'):
            if line.strip():
                name = line.split()[0].lower()
                f.write(name+'\n')

if __name__ == '__main__':
    if len(sys.argv) != 3:
        print usage()
    else:
        main(*sys.argv[1:])
    sys.exit(0)