🔵 🔵 🔵


Primary

၊၊||၊|။

pandas.to_datetime() ⚬|Documentation|1st|20251021204643-00-⌔

pandas.to_datetime — pandas 2.3.3 documentation#pandas.to_datetime

pandas.to_datetime(arg, errors='raise', dayfirst=False, yearfirst=False, utc=False, format=None, exact=<no_default>, unit=None, origin='unix', cache=True)

Convert argument to datetime.

This function converts a scalar, array-like, Series or DataFrame/dict-like to a pandas datetime object.

Parameters:
arg: int, float, str, datetime, list, tuple, 1-d array, Series, DataFrame/dict-like

The object to convert to a datetime. If a DataFrame is provided, the method expects minimally the following columns: "year", "month", "day". The column “year” must be specified in 4-digit format.

errors: {‘raise’, ‘coerce’}, default ‘raise’
  • If 'raise', then invalid parsing will raise an exception.
  • If 'coerce', then invalid parsing will be set as NaT.
dayfirst: bool, default False

Specify a date parse order if arg is str or is list-like. If True, parses dates with the day first, e.g. "10/11/12" is parsed as 2012-11-10.

Warning: dayfirst=True is not strict, but will prefer to parse with day first.

yearfirst: bool, default False

Specify a date parse order if arg is str or is list-like.

  • If True parses dates with the year first, e.g. "10/11/12" is parsed as 2010-11-12.
  • If both dayfirst and yearfirst are True, yearfirst is preceded (same as dateutil).

Warning: yearfirst=True is not strict, but will prefer to parse with year first.

utc: bool, default False

Control timezone-related parsing, localization and conversion.

  • If True, the function always returns a timezone-aware UTC-localized Timestamp, Series or DatetimeIndex. To do this, timezone-naive inputs are localized as UTC, while timezone-aware inputs are converted to UTC.
  • If False (default), inputs will not be coerced to UTC. Timezone-naive inputs will remain naive, while timezone-aware ones will keep their time offsets. Limitations exist for mixed offsets (typically, daylight savings), see Examples section for details.

See also: pandas general documentation about timezone conversion and localization.

format: str, default None

The strftime to parse time, e.g. "%d/%m/%Y". See strftime documentation for more information on choices, though note that "%f" will parse all the way up to nanoseconds. You can also pass:

  • “ISO8601”, to parse any ISO8601 time string (not necessarily in exactly the same format);
  • “mixed”, to infer the format for each element individually. This is risky, and you should probably use it along with dayfirst.

Note: If a DataFrame is passed, then format has no effect.

exact: bool, default True

Control how format is used:

  • If True, require an exact format match.
  • If False, allow the format to match anywhere in the target string.

Cannot be used alongside format='ISO8601' or format='mixed'.

unit: str, default ‘ns’

The unit of the arg (D,s,ms,us,ns) denote the unit, which is an integer or float number. This will be based off the origin. Example, with unit='ms' and origin='unix', this would calculate the number of milliseconds to the unix epoch start.

origin: scalar, default ‘unix’

Define the reference date. The numeric values would be parsed as number of units (defined by unit) since this reference date.

  • If 'unix' (or POSIX) time; origin is set to 1970-01-01.
  • If 'julian', unit must be 'D', and origin is set to beginning of Julian Calendar. Julian day number 0 is assigned to the day starting at noon on January 1, 4713 BC.
  • If Timestamp convertible (Timestamp, dt.datetime, np.datetimt64 or date string), origin is set to Timestamp identified by origin.
  • If a float or integer, origin is the difference (in units determined by the unit argument) relative to 1970-01-01.
cache: bool, default True

If True, use a cache of unique, converted dates to apply the datetime conversion. May produce significant speed-up when parsing duplicate date strings, especially ones with timezone offsets. The cache is only used when there are at least 50 values. The presence of out-of-bounds values will render the cache unusable and may slow down parsing.

Returns:
datetime

If parsing succeeded. Return type depends on input (types in parenthesis correspond to fallback in case of unsuccessful timezone or out-of-range timestamp parsing):

  • scalar: Timestamp (or datetime.datetime)
  • array-like: DatetimeIndex (or Series with object dtype containing datetime.datetime)
  • Series: Series of datetime64 dtype (or Series of object dtype containing datetime.datetime)
  • DataFrame: Series of datetime64 dtype (or Series of object dtype containing datetime.datetime)
Raises:
ParserError

When parsing a date from string fails.

ValueError

When another datetime conversion error happens. For example when one of ‘year’, ‘month’, day’ columns is missing in a DataFrame, or when a Timezone-aware datetime.datetime is found in an array-like of mixed time offsets, and utc=False, or when parsing datetimes with mixed time zones unless utc=True. If parsing datetimes with mixed time zones, please specify utc=True.

See also:
DataFrame.astype

Cast argument to a specified dtype.

to_timedelta

Convert argument to timedelta.

convert_dtypes

Convert dtypes.

Notes:

Many input types are supported, and lead to different output types:

  • scalars can be int, float, str, datetime object (from stdlib datetime module or numpy). They are converted to Timestamp when possible, otherwise they are converted to datetime.datetime. None/NaN/null scalars are converted to NaT.
  • array-like can contain int, float, str, datetime objects. They are converted to DatetimeIndex when possible, otherwise they are converted to Index with object dtype, containing datetime.datetime. None/NaN/null entries are converted to NaT in both cases.
  • Series are converted to Series with datetime64 dtype when possible, otherwise they are converted to Series with object dtype, containing datetime.datetime. None/NaN/null entries are converted to NaT in both cases.
  • DataFrame/dict-like are converted to Series with datetime64 dtype. For each row a datetime is created from assembling the various dataframe columns. Column keys can be common abbreviations like [‘year’, ‘month’, ‘day’, ‘minute’, ‘second’, ‘ms’, ‘us’, ‘ns’]) or plurals of the same.

The following causes are responsible for datetime.datetime objects being returned (possibly inside an Index or a Series with object dtype) instead of a proper pandas designated type (Timestamp, DatetimeIndex or Series with datetime64 dtype):

  • when any input element is before Timestamp.min or after Timestamp.max, see timestamp limitations.
  • when utc=False (default) and the input is an array-like or Series containing mixed naive/aware datetime, or aware with mixed time offsets. Note that this happens in the (quite frequent) situation when the timezone has a daylight savings policy. In that case you may wish to use utc=True.
Examples:

Handling various input formats

Assembling a datetime from multiple columns of a DataFrame. The keys can be common abbreviations like [‘year’, ‘month’, ‘day’, ‘minute’, ‘second’, ‘ms’, ‘us’, ‘ns’]) or plurals of the same

>>> df = pd.DataFrame({"year": [2015, 2016], "month": [2, 3], "day": [4, 5]})
>>> pd.to_datetime(df)
0   2015-02-04
1   2016-03-05
dtype: datetime64[us]

Using a unix epoch time

>>> pd.to_datetime(1490195805, unit="s")
Timestamp('2017-03-22 15:16:45')
>>> pd.to_datetime(1490195805433502912, unit="ns")
Timestamp('2017-03-22 15:16:45.433502912')

Warning: For float arg, precision rounding might happen. To prevent unexpected behavior use a fixed-width exact type.

Using a non-unix epoch origin

>>> pd.to_datetime([1, 2, 3], unit="D", origin=pd.Timestamp("1960-01-01"))
DatetimeIndex(['1960-01-02', '1960-01-03', '1960-01-04'],
             dtype='datetime64[s]', freq=None)

Differences with strptime behavior

"%f" will parse all the way up to nanoseconds.

>>> pd.to_datetime("2018-10-26 12:00:00.0000000011", format="%Y-%m-%d %H:%M:%S.%f")
Timestamp('2018-10-26 12:00:00.000000001')

Non-convertible date/times

Passing errors='coerce' will force an out-of-bounds date to NaT, in addition to forcing non-dates (or non-parseable dates) to NaT.

>>> pd.to_datetime("invalid for Ymd", format="%Y%m%d", errors="coerce")
NaT

Timezones and time offsets

The default behaviour (utc=False) is as follows:

  • Timezone-naive inputs are converted to timezone-naive DatetimeIndex:
>>> pd.to_datetime(["2018-10-26 12:00:00", "2018-10-26 13:00:15"])
DatetimeIndex(['2018-10-26 12:00:00', '2018-10-26 13:00:15'],
             dtype='datetime64[us]', freq=None)
  • Timezone-aware inputs with constant time offset are converted to timezone-aware DatetimeIndex:
>>> pd.to_datetime(["2018-10-26 12:00 -0500", "2018-10-26 13:00 -0500"])
DatetimeIndex(['2018-10-26 12:00:00-05:00', '2018-10-26 13:00:00-05:00'],
             dtype='datetime64[us, UTC-05:00]', freq=None)
  • However, timezone-aware inputs with mixed time offsets (for example issued from a timezone with daylight savings, such as Europe/Paris) are not successfully converted to a DatetimeIndex. Parsing datetimes with mixed time zones will raise a ValueError unless utc=True:
>>> pd.to_datetime(
...     ["2020-10-25 02:00 +0200", "2020-10-25 04:00 +0100"]
... )
ValueError: Mixed timezones detected. Pass utc=True in to_datetime
or tz='UTC' in DatetimeIndex to convert to a common timezone.
  • To create a Series with mixed offsets and object dtype, please use Series.apply() and datetime.datetime.strptime():
>>> import datetime as dt
>>> ser = pd.Series(["2020-10-25 02:00 +0200", "2020-10-25 04:00 +0100"])
>>> ser.apply(lambda x: dt.datetime.strptime(x, "%Y-%m-%d %H:%M %z"))
0    2020-10-25 02:00:00+02:00
1    2020-10-25 04:00:00+01:00
dtype: object
  • A mix of timezone-aware and timezone-naive inputs will also raise a ValueError unless utc=True:
>>> from datetime import datetime
>>> pd.to_datetime(
...     ["2020-01-01 01:00:00-01:00", datetime(2020, 1, 1, 3, 0)]
... )
ValueError: Mixed timezones detected. Pass utc=True in to_datetime
or tz='UTC' in DatetimeIndex to convert to a common timezone.

Setting utc=True solves most of the above issues:

  • Timezone-naive inputs are localized as UTC
>>> pd.to_datetime(["2018-10-26 12:00", "2018-10-26 13:00"], utc=True)
DatetimeIndex(['2018-10-26 12:00:00+00:00', '2018-10-26 13:00:00+00:00'],
             dtype='datetime64[us, UTC]', freq=None)
  • Timezone-aware inputs are converted to UTC (the output represents the exact same datetime, but viewed from the UTC time offset +00:00).
>>> pd.to_datetime(["2018-10-26 12:00 -0530", "2018-10-26 12:00 -0500"], utc=True)
DatetimeIndex(['2018-10-26 17:30:00+00:00', '2018-10-26 17:00:00+00:00'],
             dtype='datetime64[us, UTC]', freq=None)
  • Inputs can contain both string or datetime, the above rules still apply
>>> pd.to_datetime(["2018-10-26 12:00", datetime(2020, 1, 1, 18)], utc=True)
DatetimeIndex(['2018-10-26 12:00:00+00:00', '2020-01-01 18:00:00+00:00'],
             dtype='datetime64[us, UTC]', freq=None)

Printed 2026-06-28.

(echo:: @ )

Link to original

Secondary

• • •