Only read selected columns
Say the data are in file data.txt
, you can use the colClasses
argument of read.table()
to skip columns. Here the data in the first 7 columns are "integer"
and we set the remaining 6 columns to "NULL"
indicating they should be skipped
> read.table("data.txt", colClasses = c(rep("integer", 7), rep("NULL", 6)),
+ header = TRUE)
Year Jan Feb Mar Apr May Jun
1 2009 -41 -27 -25 -31 -31 -39
2 2010 -41 -27 -25 -31 -31 -39
3 2011 -21 -27 -2 -6 -10 -32
Change "integer"
to one of the accepted types as detailed in ?read.table
depending on the real type of data.
data.txt
looks like this:
$ cat data.txt
"Year" "Jan" "Feb" "Mar" "Apr" "May" "Jun" "Jul" "Aug" "Sep" "Oct" "Nov" "Dec"
2009 -41 -27 -25 -31 -31 -39 -25 -15 -30 -27 -21 -25
2010 -41 -27 -25 -31 -31 -39 -25 -15 -30 -27 -21 -25
2011 -21 -27 -2 -6 -10 -32 -13 -12 -27 -30 -38 -29
and was created by using
write.table(dat, file = "data.txt", row.names = FALSE)
where dat
is
dat <- structure(list(Year = 2009:2011, Jan = c(-41L, -41L, -21L), Feb = c(-27L,
-27L, -27L), Mar = c(-25L, -25L, -2L), Apr = c(-31L, -31L, -6L
), May = c(-31L, -31L, -10L), Jun = c(-39L, -39L, -32L), Jul = c(-25L,
-25L, -13L), Aug = c(-15L, -15L, -12L), Sep = c(-30L, -30L, -27L
), Oct = c(-27L, -27L, -30L), Nov = c(-21L, -21L, -38L), Dec = c(-25L,
-25L, -29L)), .Names = c("Year", "Jan", "Feb", "Mar", "Apr",
"May", "Jun", "Jul", "Aug", "Sep", "Oct", "Nov", "Dec"), class = "data.frame",
row.names = c(NA, -3L))
If the number of columns is not known beforehand, the utility function count.fields
will read through the file and count the number of fields in each line.
## returns a vector equal to the number of lines in the file
count.fields("data.txt", sep = "\t")
## returns the maximum to set colClasses
max(count.fields("data.txt", sep = "\t"))
Read specific columns with pandas or other python module
An easy way to do this is using the pandas
library like this.
import pandas as pd
fields = ['star_name', 'ra']
df = pd.read_csv('data.csv', skipinitialspace=True, usecols=fields)
# See the keys
print df.keys()
# See content in 'star_name'
print df.star_name
The problem here was the skipinitialspace
which remove the spaces in the header. So ' star_name' becomes 'star_name'
how to skip reading certain columns in readr
There is an answer out there, I just didn't search hard enough:
https://github.com/hadley/readr/issues/132
Apparently this was a documentation issue that has been corrected. This functionality may eventually get added but Hadley thought it was more useful to be able to just update one column type and not drop the others.
Update: The functionality has been added
The following code is from the readr documentation:
read_csv("iris.csv", col_types = cols_only( Species = col_factor(c("setosa", "versicolor", "virginica"))))
This will read only the Species column of the iris data set. In order to read only a specific column you must also pass the column specification i.e. col_factor
, col_double
, etc...
How to select specific columns from read_csv which start with specific word?
You read the file twice: once for the headers only and once for the actual data:
df = pd.read_csv('data.csv', usecols=lambda col: col.startswith('A_') or col.startswith('X_'))
How to read specific columns from mulitple CSV files, and skip columns that do not exist in some of the files using Python Pandas
You could try to read only the columns names from the csv file and check them with your desired columns as follows:
import csv
desired_col = ["user_id", "event_type"] # I selected only two values
for file_name in csv_files:
csv_cols = next(csv.reader(open(file_name))) # read only the csv columns names
cols = [col for col in desired_col if col in csv_cols]
df = pd.read_csv(file_name, usecols=cols)
Then, each time you read a new csv file, you need first to read the names of columns and then check desired_columns against csv_columns.
How plot and symbolize only selected columns from csv in plotting in d3?
Map the data to filter out columns not included in keys
:
d3.csv("ratings.csv").then(data => {
const keys = ['date', 'Dixit', 'Dominion'];
const filteredData = data.map(item =>
keys.reduce((obj, key) => ({...obj, [key]: item[key]}), {}));
...
});
Related Topics
Append Data Frames Together in a for Loop
How to Keep Columns When Grouping/Summarizing
Subtracting Two Columns to Give a New Column in R
How to Remove the Negative Values from a Data Frame in R
Creating a for Loop to Subset Data on R
Remove Total Value for One Column in Powerbi
How to Add a Row to Data Frame Based on a Condition
Filter a Data Frame According to Minimum and Maximum Values
Selecting Only Duplicates Based on Multiple Columns in R
Convert Multiple Columns of Numeric Data to Dates in R
How to Append a Sequential Number for Every Element in a Data Frame
Create and Assign Multiple New Dataframe Columns in Ifelse Statement
Ggplot2: Setting Geom_Bar Baseline to 1 Instead of Zero
Add X and Y Axis to All Facet_Wrap
Replace Column Values With Na Based on a Different Column or Row Position With Tidyverse
How to Find the Difference in Value in Every Two Consecutive Rows in R