Python Functions, Files, and Dictionaries1: File and CSV

runestone

open ファンクションはobjectをreturnする。

close operationで操作できなくなる。

文字列リテラル'...', "..."の前にrまたはRをつけると、エスケープシーケンスを展開せずそのままの値が文字列となる。このような文字列はraw文字列（raw strings, r-strings）と呼ばれる。

read file
write file
.csv Format
1. listのelementがlistなら、元のlistをsequenceに選んでfor statement すると、loop variable が元listのelementのlistになる。
2. course_2_assessment_1

read file

fileref = open("olympics.txt", "r")
print(type(fileref))
#fileを読み込んでもそのままではreadできない。detectしただけ？
print(".read()メソッドはstringをreturn")
contents = fileref.read()
print(contents[:3])
print(type(contents))
print(".readlines()メソッドはlistをreturn")
lines = fileref.readlines()
print(lines[:3])
print(type(lines))
print(r"別の方法で取得すると\nがついていないものがstringとして得られる")
for lin in lines[:3]:
    print(lin)
for lin in lines[:3]:
    print(lin.strip())#strip()メソッドでbalnk line, space, indentを消せる。
fileref.close()

pathの指定。

最初に/はいらない

行数を調べるならcount(“\n”)？いやいやreadlines()もできます。

file.readlines()はtext fileの各行をitemとするlistを生成する。

一度.read()した後にまた.read()したければ.open()しなくちゃいけない。

.read()は新しいobjectを生成する。

filename = "squared_numbers.txt"
outfile = open(filename, "w")
for number in range(1, 7):
square = number * number
outfile.write(str(square) + "\n")
outfile.close()
infile = open(filename, "r")
print("ここでprint(infile.read())・・・")
print(infile.read())
print(infile.read()[:10])
print("print(infile.read()[:10])が動かない。")
print("一度.read()されたファイルは、もう一度.open()しないと.read()されない？")
print("id")
print(id(infile.read()))
print(id(infile.read()[:10]))
print("print(infile.read()すらできない")
print(infile.read())

print("もう一度infile = open(filename, "r")するとprint(infile.read()[:10])読み込める")
infile = open(filename, "r")
print("id:"+ str(id(infile))+" id ちゃうやん！")
print(infile.read()[:10])
print(infile.read()[:12])
infile.close()

with <create some object that understands context> as <some name>:

.open()も新しいオブジェクトを生成する。

with open('mydata.txt', 'r') as md:
for line in md:
print("mdのid: " + str(id(md)))
print(line.strip())
print("lineのid: " + str(id(line))+"\n")
print("with open('mydata.txt', 'r') as md:のcode blockの後にmd.close()を描かなくていい")
print("////////")
md = open('mydata.txt', 'r')
for line in md:
print("mdのid: " + str(id(md)))
print(line.strip())
print("lineのid: " + str(id(line))+"\n")
md.close()
print("////////")
with open('mydata.txt', 'r') as md:
contents = md.read()
print("contentsのid: " + str(id(contents)))
for line in md:
print("mdのid: " + str(id(md)))
print(line.strip())
print("lineのid: " + str(id(line))+"\n")

データタイプがtextをsequenceに選んでfor statement すると、loop variable がlineになる。

fileというデータタイプに対して

for line in <name assgined to a file>

はfileの\nを読みとって、lineを１行ずつ認識してくれる！！！！！

with open('mydata.txt', 'r') as md:
for line in md:
print(line.strip())
print(type(md))
print(line.count("\n"))
print(str(type(line)) + "\n")
contents = md.read()
print(contents)

.readlines()してファイル全体の行を読み込むのに時間がかかる場合に有効？？？？謎。

write file

csv formatにtupleを入れる場合は’シングルクォーテーション’で囲む。

newDoc = “newDoc.csv”

ref = open(newDoc, “w”)
ref.write(1)
ref.write(“one”)
ref.write(“\n” + “2”)
ref.write(“\n” + “(3,4)”)
ref.write(“\n” + “3”, “4”, 5)
ref.write(“\n” + ‘”3″, “4”, 5’) #”　使いました。
ref.close()

ref = open(newDoc, “r”)
print(ref.read())

行を変えたい時は”\n”を入れる。

関数から値を受け取ってcsvに入れるなら、一つひとつwriteしないとダメ？

def p1(x):
return x*2
p1 = p1(2)

def p2(x):
return x*3
p2 = p2(2)

newDoc = “newDoc.csv”
ref = open(newDoc, “w”)
#ref.write(“\n” + p1) これはエラー
ref.write(p1) #これは成功。
ref.write(“\n”)
ref.write(p1, p2)#p2が無視される。
ref.write(“\n”)
ref.write(‘p1, p2’)#失敗。
ref.write(“\n”)
ref.write(p1 + p2) #これも成功
ref.write(“\n”)
ref.write(“\n”)
ref.close()

ref = open(newDoc, “r”)
print(ref.read())

.close()メソッドで保存する！！！

.csv Format

csv formatのファイルの拡張子は.csvにしましょう。

拡張子は、OSがどのアプリでそのファイルを開くか検討をつけるもの。

It’s a good idea to follow the conventions. If a file contains CSV formatted data, name it with the extension .csv, not .txt.

拡張子が.csvでなくても、.csv format で書かれていることはありうる。

データによっては成形しないといけない。

row_string = ‘{},{},{}’.format(olympian[0], olympian[1], olympian[2])

は、よくないcodingらしい。

The equivalent string concatenation would be very hard to read. An alternative, also clear way to do it would be with the .join method: row_string = ','.join([olympian[0], str(olympian[1]), olympian[2]]).