Get Started
Install and Load¶
First, download and install Python from python.org.
Then, install the package:
pip install git+https://github.com/receptiviti/receptiviti-python.git
Each time you start a Python session, load the package:
import receptiviti
Set Up API Credentials¶
You can find your API key and secret on your dashboard.
You can set these credentials up in Python permanently or temporarily:
Permanent¶
Open or create a ~/.env
file, Then add these environment variables with your key and secret:
RECEPTIVITI_KEY=""
RECEPTIVITI_SECRET=""
These can be read in with the receptiviti.readin_env()
function, which is automatically called if credentials are not otherwise provided (and the dotenv
argument is True
).
Temporary¶
Add your key and secret, and run at the start of each session:
import os
os.environ["RECEPTIVITI_KEY"]="32lettersandnumbers"
os.environ["RECEPTIVITI_SECRET"]="56LettersAndNumbers"
Confirm Credentials¶
Check that the API is reachable, and your credentials are recognized:
receptiviti.status()
Status: OK Message: 200: Hello there, World!
<Response [200]>
If your credentials are not recognized, you'll get a response like this:
receptiviti.status(key=123, secret=123)
Status: ERROR Message: 401 (1411): Unrecognized API key pair. This call will not count towards your plan.
<Response [401]>
results = receptiviti.request("texts to score")
Or a character vector:
results = receptiviti.request(["text one", "text two"])
Or from a DataFrame
:
import pandas
data = pandas.DataFrame({"text": ["text a", "text b"]})
# directly
results = receptiviti.request(data["text"])
# by column name
results = receptiviti.request(data, text_column="text")
Text in files¶
You can enter paths to files containing separate texts in each line:
# single
results = receptiviti.request("files/file.txt")
# multiple
results = receptiviti.request(
files = ["files/file1.txt", "files/file2.txt"]
)
Or to a comma delimited file with a column containing text.
Here, the text_column
argument specifies which column contains text:
# single
results = receptiviti.request("files/file.csv", text_column="text")
# multiple
results = receptiviti.request(
files = ["files/file1.csv", "files/file2.csv"],
text_column="text"
)
Or you can point to a directory containing text files:
results = receptiviti.request(directory = "files")
By default .txt
files will be looked for, but you can specify
.csv
files with the file_type
argument:
results = receptiviti.request(
directory = "files",
text_column="text", file_type="csv"
)
results = receptiviti.request("texts to score")
results.iloc[:, :3]
text_hash | summary.word_count | summary.words_per_sentence | |
---|---|---|---|
0 | acab8277267d0efee0828f94e0919ddf | 3 | 3 |
Here, the first column (text_hash
) is the MD5 hash of the text,
which identifies unique texts, and is stored in the main cache.
The entered text can also be included with the return_text
argument:
results = receptiviti.request("texts to score", return_text=True)
results[["text_hash", "text"]]
text_hash | text | |
---|---|---|
0 | acab8277267d0efee0828f94e0919ddf | texts to score |
You can also select frameworks before they are all returned:
results = receptiviti.request("texts to score", frameworks="liwc")
results.iloc[:, :5]
text_hash | analytical_thinking | clout | authentic | emotional_tone | |
---|---|---|---|---|---|
0 | acab8277267d0efee0828f94e0919ddf | 0.99 | 0.5 | 0.01 | 0.257742 |
By default, a single framework will have column names without the framework name,
but you can retain these with framework_prefix=True
:
results = receptiviti.request(
"texts to score",
frameworks="liwc", framework_prefix=True
)
results.iloc[:, :4]
text_hash | liwc.analytical_thinking | liwc.clout | liwc.authentic | |
---|---|---|---|---|
0 | acab8277267d0efee0828f94e0919ddf | 0.99 | 0.5 | 0.01 |
Aligning Results¶
Results are returned in a way that aligns with the text you enter originally, including any duplicates or invalid entries.
This means you can add the results object to original data:
data = pandas.DataFrame({
"id": [1, 2, 3, 4],
"text": ["text a", float("nan"), "", "text a"]
})
results = receptiviti.request(data["text"])
# combine data and results
data.join(results).iloc[:, :5]
id | text | text_hash | summary.word_count | summary.words_per_sentence | |
---|---|---|---|---|---|
0 | 1 | text a | 42ff59040f004970040f90a19aa6b3fa | 2.0 | 2.0 |
1 | 2 | NaN | NaN | NaN | NaN |
2 | 3 | NaN | NaN | NaN | |
3 | 4 | text a | 42ff59040f004970040f90a19aa6b3fa | 2.0 | 2.0 |
You can also provide a vector of unique IDs to be returned with results so they can be merged with other data:
results = receptiviti.request(["text a", "text b"], ids=["a", "b"])
results.iloc[:, :4]
id | text_hash | summary.word_count | summary.words_per_sentence | |
---|---|---|---|---|
0 | a | 42ff59040f004970040f90a19aa6b3fa | 2 | 2 |
1 | b | 4db2bfd2c8140dffac0060c9fb1c6d6f | 2 | 2 |
# merge with a new dataset
data = pandas.DataFrame({
"id": ["a1", "b1", "a2", "b2"],
"type": ["a", "b", "a", "b"]
})
data.join(results.set_index("id"), "type").iloc[:, :5]
id | type | text_hash | summary.word_count | summary.words_per_sentence | |
---|---|---|---|---|---|
0 | a1 | a | 42ff59040f004970040f90a19aa6b3fa | 2 | 2 |
1 | b1 | b | 4db2bfd2c8140dffac0060c9fb1c6d6f | 2 | 2 |
2 | a2 | a | 42ff59040f004970040f90a19aa6b3fa | 2 | 2 |
3 | b2 | b | 4db2bfd2c8140dffac0060c9fb1c6d6f | 2 | 2 |
Saved Results¶
Results can also be saved to a .csv
file:
receptiviti.request("texts to score", "~/Documents/results.csv", overwrite=True)
results = pandas.read_csv("~/Documents/results.csv")
results.iloc[:, :4]
id | text_hash | summary.word_count | summary.words_per_sentence | |
---|---|---|---|---|
0 | 1 | acab8277267d0efee0828f94e0919ddf | 3 | 3 |
Preserving Results¶
The receptiviti.request
function tries to avoid sending texts to the API as much as possible:
- As part of the preparation process, it excludes duplicates and invalid texts.
- If enabled, it checks the primary cache to see if any texts have already been scored.
- The primary cache is an Arrow database located by the
cache
augment. - Its format is determined by
cache_format
. - You can skip checking it initially while still writing results to it with
cache_overwrite=True
. - It can be cleared with
clear_cache=True
.
- The primary cache is an Arrow database located by the
- It will check for any responses to previous, identical requests.
- Responses are stored in the
receptiviti_request_cache
directory of your system's temporary directory (tempfile.gettempdir()
). - You can avoid using this cache with
request_cache=False
. - This cache is cleared after a day.
- Responses are stored in the
If you want to make sure no texts are sent to the API, you can use make_request=False
.
This will use the primary and request cache, but will fail if any texts are not found there.
If a call fails before results can be written to the cache or returned, all received responses will still be in the request cache, but those will be deleted after a day.
Handling Big Data¶
The receptiviti
function will handle splitting texts into bundles, so the limit on how many texts
you can process at once will come down to your system's amount of random access memory (RAM).
Several thousands of texts should be fine, but getting into millions of texts, you may not be able
to have all of the results loaded at once. To get around this, you can fully process subsets
of your texts.
A benefit of processing more texts at once is that requests can be parallelized, but this is more RAM intensive, and the primary cache is updated less frequently (as it is updated only at the end of a complete run).
You could also parallelize your own batches, but be sure to set cores
to 1
(to disable
the function's parallelization) and do not enable the primary cache (to avoid attempting to
read from the cache while it is being written to by another instance).
Not using the primary cache is also more efficient, but you may want to ensure you are not sending duplicate texts between calls. The function handles duplicate texts within calls (only ever sending unique texts), but depends on the cache to avoid sending duplicates between calls.