pandas.read_csv - Pandas 1.5.3 documentation (2023)

Pandas.read_csv(filepath_or_buffer,*,Sept=_NoDefault.no_default,delimiter=none,Header='close',names=_NoDefault.no_default,index_col=none,uses=none,to squeeze=none,prefix=_NoDefault.no_default,mangle_dupe_cols=TRUE,dtyp=none,Motor=none,Converter=none,true values=none,Wrong values=none,skipinitialspace=INCORRECT,skip lines=none,jumping foot=0,sews=none,na_values=none,keep_default_na=TRUE,on_filter=TRUE,detailed=INCORRECT,skip_blank_lines=TRUE,parse_dates=none,infer_datetime_format=INCORRECT,keep_date_col=INCORRECT,date_parser=none,tagerst=INCORRECT,cache_dates=TRUE,Iterator=INCORRECT,piece size=none,Compression='close',thousands=none,Decimal='.',line termination=none,quotation marks=''',quote=0,double quote=TRUE,Fluchtchar=none,Comment=none,coding=none,coding error='strictly',dialect=none,error_bad_lines=none,warn_bad_lines=none,on_bad_lines=none,delim_whitespace=INCORRECT,little memory=TRUE,memory_map=INCORRECT,float_precision=none,storage options=none)[Those]#

Read a comma separated values ​​(csv) file in DataFrame.

Also supports optionally iterating or splitting the file into chunks.

For more help, see the online docs forIO-Tools.

filepath_or_bufferstr, path object or file-like object

Any valid string path is acceptable. The string could be a URL. ValidURL schemes include http, ftp, s3, gs, and file. A host is expected for file URLs. A local file could be:file://localhost/path/to/table.csv.

If you want to pass a path object, pandas accepts anyos.PathLike.

With file-like object we refer to objects with aread()method, such as B. a file handle (e.g. via builtinopenfunction) orStringIO.

Septstr, Standard ‘,’

to use separators. If sep is None, the C engine cannot automatically detect the delimiter, but the Python parsing engine can, which means that the latter is used and the delimiter is automatically detected by Python's built-in sniffer tool.csv.-Sniffer. Also, delimiters can be longer than 1 character and different'\s+'are interpreted as regular expressions and also enforce the use of the Python parsing engine. Note that regex delimiters tend to ignore quoted data. Regex example:'\r\t'.

delimiterstr, Standardnone

Alias ​​for Sept.

Headerint, Liste von int, None, Standard „infer“

Row number(s) to use as column names and start of data. The default behavior is to infer the column names: if no names are passed, the behavior is identical toHeader = 0and column names are derived from the first line of the file, if column names are passed explicitly the behavior is identical toHeader = None. Passed explicitlyHeader = 0to replace existing names. The header can be a list of integers specifying the row positions for a multiple index on the columns, e.g. [0,1,3]. Intermediate lines not specified are skipped (e.g. 2 is skipped in this example). Note that this parameter ignores commented lines and blank lines whenskip_blank_lines=True, SoHeader = 0denotes the first line of data and not the first line of the file.

namesArray-artig, optional

List of column names to use. If the file contains a header, you should pass it explicitlyHeader = 0to overwrite the column names. Duplicates in this list are not allowed.

index_colint, str, sequence of int/str or False, optional, defaultnone

Column(s) to use as row labelsdata frame, either as a string name or as a column index. If a sequence of int / str is specified, a multiindex is used.

Note:index_col=Falsecan be used to force pandas to do thisnotUse the first column as an index, e.g. if you have a bad file with delimiters at the end of each line.

usesList-like or callable, optional

Returns a subset of the columns. If it is a list, all elements must be either positional information (i.e. integer indexes into the document columns) or strings corresponding to the column names, either provided by the user innamesorderived from the document header(s). Ifnamesare specified, the document header(s) are ignored. For example, a valid list-likeusesparameters would be[0, 1, 2]or['Foo', 'Bar', 'baz'].Element order is ignored, sousecols=[0, 1]is the same as[1, 0].How to instantiate a DataFrame fromDatawith preserved element orderpd.read_csv (data, usecols=['foo', 'bar'])[['foo', 'Bar']]for columns in['Foo', 'Bar']order orpd.read_csv (data, usecols=['foo', 'Barbara', 'foo']]for['Bar', 'Foo']Command.

If callable, the callable is evaluated against the column names, returning names where the callable evaluates to True. An example of a valid callable argument would beLambda X: x.upper() In['AAA', 'BBB', 'DDD']. Using this parameter results in much faster parsing time and lower memory usage.

to squeezebool, default False

If the analyzed data contains only one column, return a series.

Deprecated since version 1.4.0:append.squeeze("columns")to callread_csvto squeeze the data.

prefixstr, optional

Prefix to add to column numbers when there is no heading, e.g. 'X' for X0, X1, ...

Deprecated since version 1.4.0:After the call, use a list comprehension of the DataFrame's columnsread_csv.

mangle_dupe_colsbool, default True

Duplicate columns are specified as "X", "X.1", ... "X.N" and not as "X" ... "X". Passing False will result in data being overwritten if there are duplicate names in the columns.

Deprecated since version 1.5.0:Not implemented, and instead a new argument is added to specify the pattern for duplicate column names

dtypEnter column name or dictation -> type, optional

Data type for data or columns. E.g. {'a': np.float64, 'b': np.int32, 'c': 'Int64'} UseStrorObjectalong with suitablena_valuesPreferences to preserve and not interpret dtype. If converters are specified, they are applied INSTEAD of dtype conversion.

Neu in Version 1.5.0:Added support for defaultdict. Provide as input a defaultdict, where the default determines the dtype of columns not explicitly listed.

(Video) 3 Tips to Read Very Large CSV as Pandas Dataframe | Python Pandas Tutorial

Motor{'c', 'python', 'pyarrow'}, optional

Parser engine to use. The C and Pyarrow engines are faster while the Python engine is currently more comprehensive. Multithreading is currently only supported by the Pyarrow engine.

Neu in Version 1.4.0:The "pyarrow" engine was added as oneExperimental-engine, and some features are not supported or may not work properly with this engine.

ConverterDiktat, optional

Dictation of functions to convert values ​​in specific columns. Keys can be either integers or column labels.

true valuesListe, optional

Values ​​to be considered as True.

Wrong valuesListe, optional

Values ​​to be considered False.

skipinitialspacebool, default False

Skip spaces after the delimiter.

skip lineslistlike, int or callable, optional

Line numbers to skip (0-indexed) or number of lines to skip (int) at the beginning of the file.

If callable, the callable function is evaluated against the row indices and returns True if the row should be skipped and False otherwise. An example of a valid callable argument would beLambda X: X In [0, 2].

jumping footInteger, default 0

Number of lines to skip at the end of the file (Not supported with engine='c').

sewsint, optional

Number of file lines to read. Useful for reading parts of large files.

na_valuesscalar, str, list-like or dict, optional

Additional strings to be recognized as NA/NaN. If dict passed, specific NA values ​​per column. By default, the following values ​​are interpreted as NaN: '', '#N/A', '#N/A N/A', '#NA', '-1.#IND', '-1.#QNAN' , -NaN ', '-nan', '1.#IND', '1.#QNAN', '<NA>', 'N/A', 'NA', 'NULL', 'NaN', 'n/a' ,'nan', 'zero'

keep_default_nabool, default True

Whether or not to include the default NaN values ​​when parsing the data. Depending on whetherna_valuesis passed, the behavior is as follows:

  • Ifkeep_default_nais true andna_valuesare specified,na_valuesis appended to the default NaN values ​​used for parsing.

  • Ifkeep_default_nais true andna_valuesare not specified, only the default NaN values ​​are used for parsing.

  • Ifkeep_default_nais wrong, andna_valuesare specified, only the specified NaN valuesna_valuesare used for parsing.

  • Ifkeep_default_nais wrong, andna_valuesare not specified, nostrings are parsed as NaN.

Notice that ifon_filterwhich is passed as Falsekeep_default_naAndna_valuesParameters are ignored.

on_filterbool, default True

Detection of missing value markers (empty strings and the value of na_values). Indata without NAs, passing na_filter=False can improve performance when reading a large file.

detailedbool, default False

Specify the number of NA values ​​placed in non-numeric columns.

skip_blank_linesbool, default True

If True, blank lines are skipped instead of being interpreted as NaN values.

parse_datesbool or list of int or names or list of listens or dict, default False

The behavior is as follows:

(Video) It's Here - Pandas 2.0 Extended First Look on Live Stream

  • boolean. If True -> try parsing the index.

  • List of int or names. e.g. If [1, 2, 3] -> try to parse columns 1, 2, 3 each as a separate date column.

  • list of lists. e.g. If [[1, 3]] -> Combine columns 1 and 3 and parse as a single date column.

  • dictation, e.g. {'foo' : [1, 3]} -> Parse columns 1, 3 as a date and call the result 'foo'

If a column or index cannot be represented as an array of dates and times, for example because of an unparsable value or a mix of time zones, the column or index is returned unchanged as an object data type. Use for non-standard datetime parsingpd.to_datetimeafterpd.read_csv. To parse an index or column with a mix of time zones, specifydate_parserare partially appliedpandas.to_datetime()withutc=true. SeeParsing a CSV file with mixed time zonesfor more.

Note: There is a shortcut for iso8601 formatted dates.

infer_datetime_formatbool, default False

If true andparse_datesis enabled, pandas will try to infer the format of the datetime strings in the columns, and if it can be inferred, switch to a faster method of parsing. In some cases, this can increase parsing speed by 5x to 10x.

keep_date_colbool, default False

If true andparse_datesspecifies to combine multiple columns and then keep the original columns.

date_parserfunction, optional

Function to convert a sequence of string columns into an array of datetime instances. The default usagesdateutil.parser.parserto do the conversion. Pandas will try to calldate_parserin three different ways to proceed to the next when an exception occurs: 1) Pass in one or more arrays (as defined byparse_dates) as arguments; 2) concatenate (row by row) the string values ​​from the columns defined byparse_datesinto a single array and pass that; and 3) calldate_parseronce for each row of one or more strings (corresponding to the columns defined by ).parse_dates) as arguments.

tagerstbool, default False

Dates in DD/MM format, international and European format.

cache_datesbool, default True

If True, use a cache of unique converted dates to apply the datetimeconversion. Can result in significant speedup when parsing duplicate date strings, especially those with time zone offsets.

Neu in Version 0.25.0.

Iteratorbool, default False

Returns the TextFileReader object for iteration or chunk retrievalget_chunk().

Changed in version 1.2:TextFileReaderis a context manager.

piece sizeint, optional

Return the TextFileReader object for iterationIO Tools-Dokumentationfor more information aboutIteratorAndpiece size.

Changed in version 1.2:TextFileReaderis a context manager.

Compressionstr or dict, default "infer"

For spontaneous decompression of data on the hard disk. If 'infer' and 'filepath_or_buffer' are path-like then detect compression of the following extensions: '.gz', '.bz2', '.zip', '.xz', '.zst', '.tar', ' .tar.gz', '.tar.xz' or '.tar.bz2' (otherwise no compression). When using 'zip' or 'tar', the ZIP file may only contain one file to be read. Setnonefor no decompression. Can also be a dictation with a key'Method'setto one of {'Zipper','gzip','bz2','zstd','Teer'} and other key-value pairs are passedzipfile.ZipDatei,gzip.GzipDatei,bz2. BZ2Datei,zstandard.ZstdDecompressororTarfile.TarFile. As an example, the following could be passed for z-standard decompression using a custom compression dictionary:compression = {'method': 'zstd', 'dict_data': my_compression_dict}.

Neu in Version 1.5.0:Added support for.Teerfiles.

Changed in version 1.4.0:Zstandard support.

thousandsstr, optional

thousands separator.

(Video) Installing Python Pandas on Macbook

Decimalstr, Standard „.“

Character to be recognized as a decimal point (e.g. use "," for European dates).

line terminationstr (length 1), optional

Character to split the file into lines. Only valid with C parser.

quotation marksstr (length 1), optional

The character used to denote the beginning and end of a quoted item. Quotation marks can contain the delimiter and are ignored.

quoteint or csv.QUOTE_* instance, default 0

Controller quoting behavior percsv.QUOTE_*constants. Use one of QUOTE_MINIMAL (0), QUOTE_ALL (1), QUOTE_NONNUMERIC (2), or QUOTE_NONE (3).

double quotebool, StandardTRUE

When quotechar is specified and quoting is notQUOTE_NONE, indicate whether two consecutive quotechar elements INSIDE afield should be interpreted as one or notquotation marksElement.

Fluchtcharstr (length 1), optional

A string used to escape other characters.

Commentstr, optional

Specifies that the rest of the row should not be parsed. If found at the beginning of a line, the line is ignored entirely. This parameter must be a single character. Like blank lines (as long asskip_blank_lines=True), fully commented lines are ignored by the parameterHeaderbut not throughskip lines. For example whenComment='#', presses#read\na,b,c\n1,2,3withHeader = 0causes "a,b,c" to be treated as a header.

codingstr, optional

Encoding for UTF on read/write (e.g. "utf-8").List of Python standard encodings.

Changed in version 1.2:IfcodingIsnone,error = "replace"is handed overopen(). Otherwise,error = "strict"is handed overopen().This behavior was previously only atengine="python".

Changed in version 1.3.0:coding erroris a new argument.codingno longer affects the handling of coding errors.

coding errorstr, optional, Standard „streng“

How coding errors are handled.List of possible values.

Neu in Version 1.3.0.

dialectstr or csv. dialect, optional

If specified, this parameter overrides values ​​(default or not) for the following parameters:delimiter,double quote,Fluchtchar,skipinitialspace,quotation marks, Andquote. If values ​​need to be overwritten, a ParserWarning is issued. See the csv.Dialect documentation for more details.

error_bad_linesbool, optional, Standardnone

By default, rows with too many fields (e.g. a CSV row with too many commas) will result in an exception being thrown and no DataFrame will be returned. If False, these "bad rows" are deleted from the returned DataFrame.

Deprecated since version 1.3.0:Theon_bad_linesThe parameter should instead be used to specify behavior when a faulty line occurs.

warn_bad_linesbool, optional, Standardnone

If error_bad_lines is False and warn_bad_lines is True, a warning is issued for each bad line.

Deprecated since version 1.3.0:Theon_bad_linesThe parameter should instead be used to specify behavior when a faulty line occurs.

on_bad_lines{'error', 'warn', 'skip'} or callable, default 'error'

Specifies what to do when a bad row (a row with too many fields) is found. Allowed values ​​are:

  • 'error', throws an exception if an erroneous line is encountered.

  • "Warn", issue a warning when a bad line is encountered and skip that line.

    (Video) Setup A Python Environment Locally from Scratch - Part 2

  • "skip", skipping bad lines without raising or warning when encountered.

Neu in Version 1.3.0.

Neu in Version 1.4.0:

  • callable, function with signature(bad line: List[str]) -> lists[str] | nonethat will handle a single bad line.bad lineis a list of strings divided by theSept.When the function returnsnone, the erroneous line is ignored. If the function returns a new list of strings with more elements than expected, aParserWarnungis issued when dropping additional items. Only supported ifengine="python"

delim_whitespacebool, default False

Indicates whether spaces (eg.' 'or' ') is used as Sep. Corresponds to settingsep='\s+'. If this option is set to True, nothing should be passed for thedelimiterParameter.

little memorybool, default True

Internally process the file in chunks, resulting in less memory usage when parsing but potentially mixed type inference. To ensure that there are no mixed types, either set False or include the typedtypParameter. Note that the entire file is read into a single DataFrame regardless, use thepiece sizeorIteratorParameters to return the data in blocks. (Only valid with C parser).

memory_mapbool, default False

If a file path is providedfilepath_or_buffer, map the file object directly to storage and access the data directly from there. Using this option can improve performance by eliminating I/O overhead.

float_precisionstr, optional

Specifies which converter the C engine should use for floating point values. The options arenoneor "high" for the ordinary converter, "legacy" for the original Pandas converter with lower precision, and "round_trip" for the round-trip converter.

Changed in version 1.2.

storage optionsDiktat, optional

Additional options that make sense for a specific storage connection, e.g. host, port, username, password, etc. For HTTP(S) URLs, the key-value pairs are forwardedurllib.request.Requestas header options. For other URLs (e.g. starting with "s3://" and "gcs://"), the key-value pairs are please seefsspezAndscreamingFor more details and more examples of storage options, seeHere.

Neu in Version 1.2.

DataFrame or TextParser

A comma-separated values ​​(csv) file is returned as a two-dimensional data structure with labeled axes.

See also


Write DataFrame to a comma separated values ​​(csv) file.


Read a comma separated values ​​(csv) file in DataFrame.


Reads a table with formatted fixed-width rows into the DataFrame.


(Video) How to Build a Complete Python Package Step-by-Step


1. How to install Pandas on Python 3.10 Windows 10
(Amit Thinks)
2. 16x52 HD 16x Magnification Zoom Monocular by ARCHEER Review
(Peter von Panda)
3. How to Download & Install Pandas for Python on Windows10/11 Latest Ver.2023
(Chary Tech info)
4. Customizing The ArcGIS Pro Interface - June 2022
(Panda Consulting)
5. 🗒️+🐼 ¿Cómo cambiar los nombres de las columnas de un Dataframe? | ⏰ En menos de 60 segundos | Shorts
6. How To Activation Panda Mouse Pro In 2023 😍 New Trick || How To Play Free Fire Keyboard⌨️ And Mouse🖱
Top Articles
Latest Posts
Article information

Author: Rob Wisoky

Last Updated: 01/01/2023

Views: 5825

Rating: 4.8 / 5 (48 voted)

Reviews: 87% of readers found this page helpful

Author information

Name: Rob Wisoky

Birthday: 1994-09-30

Address: 5789 Michel Vista, West Domenic, OR 80464-9452

Phone: +97313824072371

Job: Education Orchestrator

Hobby: Lockpicking, Crocheting, Baton twirling, Video gaming, Jogging, Whittling, Model building

Introduction: My name is Rob Wisoky, I am a smiling, helpful, encouraging, zealous, energetic, faithful, fantastic person who loves writing and wants to share my knowledge and understanding with you.