How do I read a date in Excel format in Python?
You can use xlrd.
From its documentation, you can read that dates are always stored as numbers; however, you can use xldate_as_tuple
to convert it to a python date.
Note: the version on the PyPI seems more up-to-date than the one available on xlrd's website.
Read date from excel as string Python
I don't think you were supposed to cast the date as a string. Note I made a couple of other changes:
- used for loops instead of while loops
- please don't use single letter var names (like
i
, andj
) xldate_as_datetime
makes more sense thanxldate_as_tuple
note if you want the date displayed in a specific format try strftime
import xlrd
workbook = xlrd.open_workbook(file)
worksheet = workbook.sheet_by_name('Sheet1')
num_rows = worksheet.nrows
num_cols = worksheet.ncols
values = []
for row in range(1, num_rows):
row_values = []
for col in range(num_cols):
if col == 3:
datetime_ = xldate.xldate_as_datetime(worksheet.cell_value(row, col), workbook.datemode)
row_values.append(datetime_)
else:
row_values.append(worksheet.cell_value(row, col))
values.append(row_values)
Read date from Excel file and get a delta with today's date
You can use pd.Timestamp.today
and then extract the amount of days from the timedelta which it returns with dt.days
:
# Example dataframe
df = pd.DataFrame({'Date':['20-6-2019', '21-6-2019', '25-6-2019']})
df['Date'] = pd.to_datetime(df['Date'])
Date
0 2019-06-20
1 2019-06-21
2 2019-06-25
(df['Date'] - pd.Timestamp.today()).dt.days
0 3
1 4
2 8
Name: Date, dtype: int64
Note in this case 3 days for 20-6-2019
is correct, since there are 3 full days between date now and the 20th of june.
Converting to Excel "Date" format (within Excel file) using python and pandas from another date format from html table
You need to use pd.ExcelWriter
to create a writer
object, so that you can change to Date format WITHIN Excel; however, this problem has a couple of different aspects to it:
- You have non-date values in your date column, including "Legend:", "Cash rate decreased", "Cash Rate increased", and "Cash rate unchanged".
- As mentioned in the comments, you must pass
format='%d %b %Y'
topd.to_datetime()
as that is the Date format you are converting FROM. - You must pass
errors='coerce'
in order to returnNaT
for those that don't meet the specified format - For the
pd.to_datetime()
line of code, you must add.dt.date
at the end, because we use adate_format
parameter and not adatetime_format
parameter in creating thewriter
object later on. However, you could also excludedt.date
and change the format of thedatetime_format
parameter. - Then, do
table = table.dropna()
to drop rows with any columns withNaT
- Pandas does not change the Date format WITHIN Excel. If you want to do that, then you should use
openpyxl
and create awriter
object and pass thedate_format
. In case someone says this, you CANNOT simply do:pd.to_datetime(table['Effective Date'], format='%d %b %Y', errors='coerce').dt.strftime('%m/%d/%y')
or.dt.strftime('%d/%m/%y')
, because that creates a "General" date format in EXCEL. - Output is ugly if you do not widen your columns, so I've included code for that as well. Please note that I am on a USA locale, so passing
d/m/yyyy
creates a "Custom" format in Excel.
NOTE: In my code, I have to pass m/d/yyyy
in order for a "Date" format to appear in EXCEL. You can simply change to date_format='d/m/yyyy'
since my computer has a different locale than you (USA) that Excel utilizes for "Date" format.
Source + More on this topic:
import pandas as pd
import html5lib
import datetime
import locale
import pytz
import lxml as lx
import openpyxl as oxl
url = "https://www.rba.gov.au/statistics/cash-rate/"
tables = pd.read_html(url)
table = tables[0]
table['Effective Date'] = pd.to_datetime(table['Effective Date'], format='%d %b %Y', errors='coerce').dt.date
table = table.dropna()
table.to_excel('rates.xlsx')
writer = pd.ExcelWriter("rates.xlsx",
engine='xlsxwriter',
date_format='m/d/yyyy')
# Convert the dataframe to an XlsxWriter Excel object.
table.to_excel(writer, sheet_name='Sheet1')
# Get the xlsxwriter workbook and worksheet objects in order to set the column
# widths, to make the dates clearer.
workbook = writer.book
worksheet = writer.sheets['Sheet1']
worksheet.set_column('B:E', 20)
# Close the Pandas Excel writer and output the Excel file.
writer.save()
Properly parsing excel format for dates
Unfortunately, stovfl's solution was not actually generalized to all xlsx formats. After much searching through Microsoft documentation I was finally able to find this page that documents some of the rules of excel number_format.
Important things to note:
- mm and m refer to minutes ONLY if the most immediate preceding code is a hh or h (hours) or if the most immediate following code is ss or s (seconds), otherwise mm and m refer to months.
- Most characters that are not codes must be preceded with a backslash
- Characters surrounded by quotation marks are interpreted literally (not as codes)
- There's a big list of characters that display without any escaping or quotation marks
- There is the concept of
sections
, separated by a semicolon. For the purpose of this solution I chose to ignore sections (because if the sections are actually used the resulting output will not look like a date). - Certain codes in excel have no equivalent code in strftime. For example,
mmmmm
displays the first letter of the month. For my solution I chose to replace these codes with similar strftime codes (formmmmm
I chose%b
, which displays an abbreviation of the month). I noted these codes in comments
Anyways, I just built a function that given an excel number_format date string, returns the python strftime equivalent. I hope this can help someone looking for a way to get "What you see is what you get" from excel to python.
EXCEL_CODES = {
'yyyy': '%Y',
'yy': '%y',
'dddd': '%A',
'ddd': '%a',
'dd': '%d',
'd': '%-d',
# Different from excel as there is no J-D in strftime
'mmmmmm': '%b',
'mmmm': '%B',
'mmm': '%b',
'hh': '%H',
'h': '%-H',
'ss': '%S',
's': '%-S',
# Possibly different from excel as there is no am/pm in strftime
'am/pm': '%p',
# Different from excel as there is no A/P or a/p in strftime
'a/p': '%p',
}
EXCEL_MINUTE_CODES = {
'mm': '%M',
'm': '%-M',
}
EXCEL_MONTH_CODES = {
'mm': '%m',
'm': '%-m',
}
EXCEL_MISC_CHARS = [
'$',
'+',
'(',
':',
'^',
'\'',
'{',
'<',
'=',
'-',
'/',
')',
'!',
'&',
'~',
'}',
'>',
' ',
]
EXCEL_ESCAPE_CHAR = '\\'
EXCEL_SECTION_DIVIDER = ';'
def convert_excel_date_format_string(excel_date):
'''
Created using documentation here:
https://support.office.com/en-us/article/review-guidelines-for-customizing-a-number-format-c0a1d1fa-d3f4-4018-96b7-9c9354dd99f5
'''
# The python date string that is being built
python_date = ''
# The excel code currently being parsed
excel_code = ''
prev_code = ''
# If the previous character was the escape character
char_escaped = False
# If we are in a quotation block (surrounded by "")
quotation_block = False
# Variables used for checking if a code should be a minute or a month
checking_minute_or_month = False
minute_or_month_buffer = ''
for c in excel_date:
ec = excel_code.lower()
# The previous character was an escape, the next character should be added normally
if char_escaped:
if checking_minute_or_month:
minute_or_month_buffer += c
else:
python_date += c
char_escaped = False
continue
# Inside a quotation block
if quotation_block:
if c == '"':
# Quotation block should now end
quotation_block = False
elif checking_minute_or_month:
minute_or_month_buffer += c
else:
python_date += c
continue
# The start of a quotation block
if c == '"':
quotation_block = True
continue
if c == EXCEL_SECTION_DIVIDER:
# We ignore excel sections for datetimes
break
is_escape_char = c == EXCEL_ESCAPE_CHAR
# The am/pm and a/p code add some complications, need to make sure we are not that code
is_misc_char = c in EXCEL_MISC_CHARS and (c != '/' or (ec != 'am' and ec != 'a'))
# Code is finished, check if it is a proper code
if (is_escape_char or is_misc_char) and ec:
# Checking if the previous code should have been minute or month
if checking_minute_or_month:
if ec == 'ss' or ec == 's':
# It should be a minute!
minute_or_month_buffer = EXCEL_MINUTE_CODES[prev_code] + minute_or_month_buffer
else:
# It should be a months!
minute_or_month_buffer = EXCEL_MONTH_CODES[prev_code] + minute_or_month_buffer
python_date += minute_or_month_buffer
checking_minute_or_month = False
minute_or_month_buffer = ''
if ec in EXCEL_CODES:
python_date += EXCEL_CODES[ec]
# Handle months/minutes differently
elif ec in EXCEL_MINUTE_CODES:
# If preceded by hours, we know this is referring to minutes
if prev_code == 'h' or prev_code == 'hh':
python_date += EXCEL_MINUTE_CODES[ec]
else:
# Have to check if the next code is ss or s
checking_minute_or_month = True
minute_or_month_buffer = ''
else:
# Have to abandon this attempt to convert because the code is not recognized
return None
prev_code = ec
excel_code = ''
if is_escape_char:
char_escaped = True
elif is_misc_char:
# Add the misc char
if checking_minute_or_month:
minute_or_month_buffer += c
else:
python_date += c
else:
# Just add to the code
excel_code += c
# Complete, check if there is still a buffer
if checking_minute_or_month:
# We know it's a month because there were no more codes after
minute_or_month_buffer = EXCEL_MONTH_CODES[prev_code] + minute_or_month_buffer
python_date += minute_or_month_buffer
if excel_code:
ec = excel_code.lower()
if ec in EXCEL_CODES:
python_date += EXCEL_CODES[ec]
elif ec in EXCEL_MINUTE_CODES:
if prev_code == 'h' or prev_code == 'hh':
python_date += EXCEL_MINUTE_CODES[ec]
else:
python_date += EXCEL_MONTH_CODES[ec]
else:
return None
return python_date
Tested with python 3.6.7 using openpyxl 3.0.1
How to convert date format when reading from Excel - Python
import datetime
df = pd.DataFrame({'data': ["11/14/2015 00:00:00", "11/14/2015 00:10:00", "11/14/2015 00:20:00"]})
df["data"].apply(lambda x: datetime.datetime.strptime(x, '%m/%d/%Y %H:%M:%S').strftime('%b-%y'))
EDIT
If you'd like to work with df.columns
you could use map
function:
df.columns = list(map(lambda x: datetime.datetime.strptime(x, '%m/%d/%Y %H:%M:%S').strftime('%b-%y'), df1.columns))
You need list if you are using python 3.x because it's iterator by default.
Convert date from excel in number format to date format python
from datetime import datetime
excel_date = 42139
dt = datetime.fromordinal(datetime(1900, 1, 1).toordinal() + excel_date - 2)
tt = dt.timetuple()
print(dt)
print(tt)
As mentioned by J.F. Sebastian, this answer only works for any date after 1900/03/01
EDIT: (in answer to @R.K)
If your excel_date
is a float number, use this code:
from datetime import datetime
def floatHourToTime(fh):
hours, hourSeconds = divmod(fh, 1)
minutes, seconds = divmod(hourSeconds * 60, 1)
return (
int(hours),
int(minutes),
int(seconds * 60),
)
excel_date = 42139.23213
dt = datetime.fromordinal(datetime(1900, 1, 1).toordinal() + int(excel_date) - 2)
hour, minute, second = floatHourToTime(excel_date % 1)
dt = dt.replace(hour=hour, minute=minute, second=second)
print(dt)
assert str(dt) == "2015-05-15 00:13:55"
how to read date in a specific format in openpyxl
If you are on Windows, try
value = datetime.date.strftime(sh['A1'].value, "%#d-%b-%y")
If on Linux, try
value = datetime.date.strftime(sh['A1'].value, "%-d-%b-%y")
Output:
'1-Jan-22'
Related Topics
How to Open a Password Protected Excel File Using Python
How to Sum Dictionaries Values With Same Key Inside a List
How to Delete a Specific Line in a File
How to Suppress Scientific Notation When Printing Float Values
Valueerror: Cannot Reshape Array of Size 30470400 into Shape (50,1104,104)
Easiest Way to Replace a String Using a Dictionary of Replacements
How Does \R (Carriage Return) Work in Python
How to Close a Tkinter Window by Pressing a Button
How to Make Type Cast for Python Custom Class
How to Compare 2 Indexes in Same List in Python
How to Stop a Running Function Without Exiting the Tkinter Window Entirely
How to Compute the Gradients of Image Using Python
Getting Value in a Dataframe in Pyspark
Move Seaborn Plot Legend to a Different Position