Ruby 處理 CSV 教學

Juanito FatasThinking about what to do.

CSV,Character Seperated Values. 也有 Comma Seperated Values 一稱。

但分隔符可以是任何字元,不一定要是逗號。

Ruby 已經有內建處理 CSV 的標準庫了,require 'csv' 即可。

CSV 的官方文件,這裡可以查所有可用的方法:

假設今天有一個 CSV 文件,students.csv,內容如下:

name attendance GPA comment
Bob 144 4.0 Great at everything!
Alice 124 3.9 Great, very talented!
Steve 119 3.5 He is "good", but could be better
David 100 3.0 He needs to attend more classes!

這個 CSV 要怎麼表示呢?

Bob,144,4.0,Great at everything!
Alice,124,3.9,"Great, very talented!"
Steve,119,3.5,"He is ""good"", but could be better"
David,100,3.0,He needs to attend more classes!

Comment 欄位有兩列要注意:

  • Alicecomment 用 1 個雙引號包起來,因為裡面有逗號。

  • Stevecomment 用雙引號包起來,裡面用兩個雙引號表示“雙引號”。

首先要處理 CSV,要先把檔案讀進來,有兩種讀法:

  • 一次全部讀進來,存在記憶體裡。

  • 一次讀一列(row)。

不管用那種讀法,Ruby 都會把每一列視為一個陣列。打開 Pry / Irb 試試。

> require 'csv'
> students = CSV.read 'students.csv'
=> [["Bob", "144", "4.0", "Great at everything!"],
    ["Alice", "124", "3.9", "Great, very talented!"],
    ["Steve", "119", "3.5", "He is \"good\", but could be better"],
    ["David", "100", "3.0", "He needs to attend more classes!"]]

> CSV.foreach('students.csv') do |row|
>   p row
> end
["Bob", "144", "4.0", "Great at everything!"]
["Alice", "124", "3.9", "Great, very talented!"]
["Steve", "119", "3.5", "He is \"good\", but could be better"]
["David", "100", "3.0", "He needs to attend more classes!"]
=> nil

字串:

> CSV.parse "Ruby,1995,Rails,2007"
=> [["Ruby", "1995", "Rails", "2007"]]

區塊:

> CSV.parse("Ruby,1995,Rails,2007") { |row| p row }
["Ruby", "1995", "Rails", "2007"]
=> nil

從上例可以看出,傳入字串的行為類似於 CSV.read;而傳區塊則與 CSV.foreach 類似。

實際上呢,

CSV.read('students.csv')

# 等同於

CSV.parse(File.read('students.csv'))

若個欄位之間,不是用逗號區隔,那該怎麼處理呢?

# students_col.csv

Ruby;1995;First appear
Rails;2007;2.0
Perl6;2048;"Who;Knows;When"

很簡單,在讀取的時候,用 :col_sep 選項指定分隔符即可:

new_students = CSV.read('students_col.csv', { col_sep: ';' })
new_students = CSV.foreach('students_col.csv', { col_sep: ';' }).to_a

除了 :col_sep 選項之外,其他可用選項請參考:http://ruby-doc.org/stdlib-2.1.2/libdoc/csv/rdoc/CSV.html#method-c-new

當 CSV 讀東西進來時,全部被當成字串讀進來,並存在陣列裡。先前的例子:

> str_arr = CSV.parse "Ruby,1995,Rails,2007"
> str_arr[0][1].class
=> String

現在假設我們有些欄位是數字,某幾個欄位要做一些運算,字串不能做運算,該怎麼處理呢?

> fixnum_arr = CSV.parse "Ruby,1995,Rails,2007", converters: :numeric
> fixnum_arr[0][1].class
=> Fixnum

譬如有週支出表:

day expense
Mon 100
Tue 120
Wed 130
Thu 140
Fri 220
Sat 320
Sun 100

我們想要算這週到底花了多少錢:

# week_expense.csv
Mon,100
Tue,120
Wed,130
Thu,140
Fri,220
Sat,320
Sun,100

把每列的第二格存到陣列裡,再加總一下即可。

CSV.foreach('week_expense.csv', converters: :numeric).inject([]) do |acc, row|
  acc << row[1]
end.reduce(:+)
=> 1130

如果沒轉成數字,便會變成字串相加了 String#+

CSV.foreach('week_expense.csv').inject([]) do |acc, row|
  acc << row[1]
end.reduce(:+)
=> "100120130140220320100"

讓我們把原本的 students.csv 讀進來,加一列新資料:

Obie,144,3.8,Great great!

接著寫回去。

首先先讀進來:

students = CSV.read('students.csv')
students << 'Obie,144,3.8,Great great!'.split(',')

再把每列(陣列形式)寫回去:

CSV.open('students.csv', 'w') do |csv|
  students.each { |student| csv << student }
end

(待續)