Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Empty lines doesn't work if the data ends with a empty line #10

Open
EmilHvitfeldt opened this issue Apr 19, 2021 · 0 comments
Open

Empty lines doesn't work if the data ends with a empty line #10

EmilHvitfeldt opened this issue Apr 19, 2021 · 0 comments

Comments

@EmilHvitfeldt
Copy link
Owner

EmilHvitfeldt commented Apr 19, 2021

library(ggpage)
library(tidytext)
library(tidyverse)

text <- "Modeling as a statistical practice can encompass a wide variety of activities. 
This book focuses on supervised or predictive modeling for text, using text data 
to make predictions about the world around us. We use the tidymodels framework 
for modeling, a consistent and flexible collection of R packages developed to 
encourage good statistical practice.

Supervised machine learning using text data involves building a statistical 
model to estimate some output from input that includes language. The two types 
of models we train in this book are regression and classification. Think of 
regression models as predicting numeric or continuous outputs, such as 
predicting the year of a United States Supreme Court opinion from the text of 
that opinion. Think of classification models as predicting outputs that are 
discrete quantities or class labels, such as predicting whether a GitHub issue 
is about documentation or not from the text of the issue. Models like these can
be used to make predictions for new observations, to understand what features 
or characteristics contribute to differences in the output, and more. We can 
evaluate our models using performance metrics to determine which are best, which 
are acceptable for our specific context, and even which are fair."

tibble(text = text) %>%
  unnest_tokens(text, text, token = function(x) str_split(x, "\n")) %>%
  ggpage_quick()
#> Warning: Use of `data_1$x_space_right` is discouraged. Use `x_space_right`
#> instead.
#> Warning: Use of `data_1$x_page` is discouraged. Use `x_page` instead.
#> Warning: Use of `data_1$x_space_left` is discouraged. Use `x_space_left`
#> instead.
#> Warning: Use of `data_1$x_page` is discouraged. Use `x_page` instead.
#> Warning: Use of `data_1$line` is discouraged. Use `line` instead.
#> Warning: Use of `data_1$y_page` is discouraged. Use `y_page` instead.
#> Warning: Use of `data_1$line` is discouraged. Use `line` instead.
#> Warning: Use of `data_1$y_page` is discouraged. Use `y_page` instead.

text <- "Modeling as a statistical practice can encompass a wide variety of activities. 
This book focuses on supervised or predictive modeling for text, using text data 
to make predictions about the world around us. We use the tidymodels framework 
for modeling, a consistent and flexible collection of R packages developed to 
encourage good statistical practice.

Supervised machine learning using text data involves building a statistical 
model to estimate some output from input that includes language. The two types 
of models we train in this book are regression and classification. Think of 
regression models as predicting numeric or continuous outputs, such as 
predicting the year of a United States Supreme Court opinion from the text of 
that opinion. Think of classification models as predicting outputs that are 
discrete quantities or class labels, such as predicting whether a GitHub issue 
is about documentation or not from the text of the issue. Models like these can
be used to make predictions for new observations, to understand what features 
or characteristics contribute to differences in the output, and more. We can 
evaluate our models using performance metrics to determine which are best, which 
are acceptable for our specific context, and even which are fair.
"

tibble(text = text) %>%
  unnest_tokens(text, text, token = function(x) str_split(x, "\n")) %>%
  ggpage_quick()
#> Warning: Use of `data_1$x_space_right` is discouraged. Use `x_space_right`
#> instead.
#> Warning: Use of `data_1$x_page` is discouraged. Use `x_page` instead.
#> Warning: Use of `data_1$x_space_left` is discouraged. Use `x_space_left`
#> instead.
#> Warning: Use of `data_1$x_page` is discouraged. Use `x_page` instead.
#> Warning: Use of `data_1$line` is discouraged. Use `line` instead.
#> Warning: Use of `data_1$y_page` is discouraged. Use `y_page` instead.
#> Warning: Use of `data_1$line` is discouraged. Use `line` instead.
#> Warning: Use of `data_1$y_page` is discouraged. Use `y_page` instead.

Created on 2021-04-18 by the reprex package (v1.0.0)

@EmilHvitfeldt EmilHvitfeldt changed the title Empty lines doesn Empty lines doesn't work if the data ends with a empty line Apr 19, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant