Writing unicode strings to Excel 2007
I am connecting to a MS SQL server using pyodbc
. Furthermore, I am trying to write to an Excel 2007/10 .xlsx
file using openpyxl
.
This is my code (Python 2.7):
import pyodbc
from openpyxl import Workbook
cnxn = pyodbc.connect(host = 'xxx',database='yyy',user='zzz',password='ppp')
cursor = cnxn.cursor()
sql = "SELECT TOP 10 [customer clientcode] AS Customer,
[customer dchl] AS DChl,
[customer name] AS Name,
...
[name3] AS [name 3]
FROM mydb
WHERE [customer dchl] = '03'
ORDER BY [customer id] ASC"
#load data
cursor.execute(sql)
#get colnames from openpyxl
columns = [column[0] for column in cursor.description]
#using optimized_write cause it will be about 120k rows of data
wb = Workbook(optimized_write = True, encoding='utf-8')
ws = wb.create_sheet()
ws.title = '03'
#append column names to header
ws.append(columns)
#append rows to
for row in cursor:
ws.append(row)
wb.save(filename = 'test.xlsx')
cnxn.close()
This works, at least up until the point I encounter a customer with, for example, the name: "mún"
. My code does not fail, everything writes to Excel and all is well. That is until I actually open the Excel file- this causes an error saying that the file is corrupted and needs to be repaired. Upon repairing the file, all data is lost.
I know the code works for customers with regular names (only ASCII), it is as soon as there's an accented character or anything that the Excel file gets corrupted.
I have tried to print a single row (with a difficult cust. name). This is the result:
row
is a tuple, and this one of the indices: 'Meexf9s Tilburg'
So either writing the xf9 (ú)
character causes an error, or MS Excel cannot cope with it. I have tried various ways of encoding a row to unicode ( unicode(row,'utf-8')
or u''.join(row)
) etc., though nothing works. Either I try something idiotic resulting in an error, or the Excel file still errors.
Any ideas?
In the end I found two solutions:
First one was converting the row given by the cursor to a list, and decoding the elements within the list:
for row in cursor:
l = list(row)
l[5] = l[5].decode('ISO-8859-1')
(do this for all neccesary cols)
ws.append(l)
I figured this would have been hell, cause there were 6 columns needing conversion to unicode, and there were 120k rows, though everything went quite fast actually! In the end it became apparent that I could/should just cast the data in the sql statement to unicode ( cast(x as nvarchar) AS y) which made the replacements unnecessary. I did not think of this at first cause i thought that it was actually supplying the data in unicode. My bad.
您可以使用encode()
将unicode转换为字符串:
l=[u'asd',u'qw',u'fdf',u'sad',u'sadasd']
l[4]=l[4].encode('utf8')
链接地址: http://www.djcxy.com/p/69916.html
上一篇: 在LinkedIn上发布数据时出错?