Get Started

Install and Load¶

First, download and install Python from python.org.

Then, install the package:

pip install git+https://github.com/receptiviti/receptiviti-python.git

Each time you start a Python session, load the package:

In [2]:

Copied!

import receptiviti
import receptiviti

Set Up API Credentials¶

You can find your API key and secret on your dashboard.

You can set these credentials up in Python permanently or temporarily:

Permanent¶

Open or create a ~/.env file, Then add these environment variables with your key and secret:

RECEPTIVITI_KEY=""
RECEPTIVITI_SECRET=""

These can be read in with the receptiviti.readin_env() function, which is automatically called if credentials are not otherwise provided (and the dotenv argument is True).

Temporary¶

Add your key and secret, and run at the start of each session:

import os
os.environ["RECEPTIVITI_KEY"]="32lettersandnumbers"
os.environ["RECEPTIVITI_SECRET"]="56LettersAndNumbers"

Confirm Credentials¶

Check that the API is reachable, and your credentials are recognized:

In [3]:

Copied!

receptiviti.status()
receptiviti.status()

Status: OK
Message: 200: Hello there, World!

Out[3]:

<Response [200]>

If your credentials are not recognized, you'll get a response like this:

In [4]:

Copied!

receptiviti.status(key=123, secret=123)
receptiviti.status(key=123, secret=123)

Status: ERROR
Message: 401 (1411): Unrecognized API key pair. This call will not count towards your plan.

Out[4]:

<Response [401]>

Enter Your Text¶

Loaded Text¶

If your texts are already in Python, you can enter them directly.

These can be in a single character:

In [5]:

Copied!

results = receptiviti.request("texts to score")
results = receptiviti.request("texts to score")

Or a character vector:

In [6]:

Copied!

results = receptiviti.request(["text one", "text two"])
results = receptiviti.request(["text one", "text two"])

Or from a DataFrame:

In [7]:

Copied!





import pandas
data = pandas.DataFrame({"text": ["text a", "text b"]})

# directly
results = receptiviti.request(data["text"])

# by column name
results = receptiviti.request(data, text_column="text")
import pandas
data = pandas.DataFrame({"text": ["text a", "text b"]})

# directly
results = receptiviti.request(data["text"])

# by column name
results = receptiviti.request(data, text_column="text")

Text in files¶

You can enter paths to files containing separate texts in each line:

In [8]:

Copied!





# single
results = receptiviti.request("files/file.txt")

# multiple
results = receptiviti.request(
  files = ["files/file1.txt", "files/file2.txt"]
)
# single
results = receptiviti.request("files/file.txt")

# multiple
results = receptiviti.request(
  files = ["files/file1.txt", "files/file2.txt"]
)

Or to a comma delimited file with a column containing text. Here, the text_column argument specifies which column contains text:

In [9]:

Copied!





# single
results = receptiviti.request("files/file.csv", text_column="text")

# multiple
results = receptiviti.request(
  files = ["files/file1.csv", "files/file2.csv"],
  text_column="text"
)
# single
results = receptiviti.request("files/file.csv", text_column="text")

# multiple
results = receptiviti.request(
  files = ["files/file1.csv", "files/file2.csv"],
  text_column="text"
)

Or you can point to a directory containing text files:

In [10]:

Copied!

results = receptiviti.request(directory = "files")
results = receptiviti.request(directory = "files")

By default .txt files will be looked for, but you can specify .csv files with the file_type argument:

In [11]:

Copied!





results = receptiviti.request(
  directory = "files",
  text_column="text", file_type="csv"
)
results = receptiviti.request(
  directory = "files",
  text_column="text", file_type="csv"
)

Use Results¶

Returned Results¶

Results are returned as a DataFrame, with a row for each text, and columns for each framework variable:

In [12]:

Copied!

results = receptiviti.request("texts to score")
results.iloc[:, :3]
results = receptiviti.request("texts to score")
results.iloc[:, :3]

Out[12]:

	text_hash	summary.word_count	summary.words_per_sentence
0	acab8277267d0efee0828f94e0919ddf	3	3

Here, the first column (text_hash) is the MD5 hash of the text, which identifies unique texts, and is stored in the main cache.

The entered text can also be included with the return_text argument:

In [13]:

Copied!

results = receptiviti.request("texts to score", return_text=True)
results[["text_hash", "text"]]
results = receptiviti.request("texts to score", return_text=True)
results[["text_hash", "text"]]

Out[13]:

	text_hash	text
0	acab8277267d0efee0828f94e0919ddf	texts to score

You can also select frameworks before they are all returned:

In [14]:

Copied!

results = receptiviti.request("texts to score", frameworks="liwc")
results.iloc[:, :5]
results = receptiviti.request("texts to score", frameworks="liwc")
results.iloc[:, :5]

Out[14]:

	text_hash	analytical_thinking	clout	authentic	emotional_tone
0	acab8277267d0efee0828f94e0919ddf	0.99	0.5	0.01	0.257742

By default, a single framework will have column names without the framework name, but you can retain these with framework_prefix=True:

In [15]:

Copied!





results = receptiviti.request(
  "texts to score",
  frameworks="liwc", framework_prefix=True
)
results.iloc[:, :4]
results = receptiviti.request(
  "texts to score",
  frameworks="liwc", framework_prefix=True
)
results.iloc[:, :4]

Out[15]:

	text_hash	liwc.analytical_thinking	liwc.clout	liwc.authentic
0	acab8277267d0efee0828f94e0919ddf	0.99	0.5	0.01

Aligning Results¶

Results are returned in a way that aligns with the text you enter originally, including any duplicates or invalid entries.

This means you can add the results object to original data:

In [16]:

Copied!





data = pandas.DataFrame({
  "id": [1, 2, 3, 4],
  "text": ["text a", float("nan"), "", "text a"]
})
results = receptiviti.request(data["text"])

# combine data and results
data.join(results).iloc[:, :5]
data = pandas.DataFrame({
  "id": [1, 2, 3, 4],
  "text": ["text a", float("nan"), "", "text a"]
})
results = receptiviti.request(data["text"])

# combine data and results
data.join(results).iloc[:, :5]

Out[16]:

	id	text	text_hash	summary.word_count	summary.words_per_sentence
0	1	text a	42ff59040f004970040f90a19aa6b3fa	2.0	2.0
1	2	NaN	NaN	NaN	NaN
2	3		NaN	NaN	NaN
3	4	text a	42ff59040f004970040f90a19aa6b3fa	2.0	2.0

You can also provide a vector of unique IDs to be returned with results so they can be merged with other data:

In [17]:

Copied!

results = receptiviti.request(["text a", "text b"], ids=["a", "b"])
results.iloc[:, :4]
results = receptiviti.request(["text a", "text b"], ids=["a", "b"])
results.iloc[:, :4]

Out[17]:

	id	text_hash	summary.word_count	summary.words_per_sentence
0	a	42ff59040f004970040f90a19aa6b3fa	2	2
1	b	4db2bfd2c8140dffac0060c9fb1c6d6f	2	2

In [18]:

Copied!





# merge with a new dataset
data = pandas.DataFrame({
  "id": ["a1", "b1", "a2", "b2"],
  "type": ["a", "b", "a", "b"]
})
data.join(results.set_index("id"), "type").iloc[:, :5]
# merge with a new dataset
data = pandas.DataFrame({
  "id": ["a1", "b1", "a2", "b2"],
  "type": ["a", "b", "a", "b"]
})
data.join(results.set_index("id"), "type").iloc[:, :5]

Out[18]:

	id	type	text_hash	summary.word_count	summary.words_per_sentence
0	a1	a	42ff59040f004970040f90a19aa6b3fa	2	2
1	b1	b	4db2bfd2c8140dffac0060c9fb1c6d6f	2	2
2	a2	a	42ff59040f004970040f90a19aa6b3fa	2	2
3	b2	b	4db2bfd2c8140dffac0060c9fb1c6d6f	2	2

Saved Results¶

Results can also be saved to a .csv file:

In [19]:

Copied!

receptiviti.request("texts to score", "~/Documents/results.csv", overwrite=True)
results = pandas.read_csv("~/Documents/results.csv")
results.iloc[:, :4]
receptiviti.request("texts to score", "~/Documents/results.csv", overwrite=True)
results = pandas.read_csv("~/Documents/results.csv")
results.iloc[:, :4]

Out[19]:

	id	text_hash	summary.word_count	summary.words_per_sentence
0	1	acab8277267d0efee0828f94e0919ddf	3	3

Preserving Results¶

The receptiviti.request function tries to avoid sending texts to the API as much as possible:

As part of the preparation process, it excludes duplicates and invalid texts.
If enabled, it checks the primary cache to see if any texts have already been scored.
- The primary cache is an Arrow database located by the cache augment.
- Its format is determined by cache_format.
- You can skip checking it initially while still writing results to it with cache_overwrite=True.
- It can be cleared with clear_cache=True.
It will check for any responses to previous, identical requests.
- Responses are stored in the receptiviti_request_cache directory of your system's temporary directory (tempfile.gettempdir()).
- You can avoid using this cache with request_cache=False.
- This cache is cleared after a day.

If you want to make sure no texts are sent to the API, you can use make_request=False. This will use the primary and request cache, but will fail if any texts are not found there.

If a call fails before results can be written to the cache or returned, all received responses will still be in the request cache, but those will be deleted after a day.

Handling Big Data¶

The receptiviti.request function will handle splitting texts into bundles, so the limit on how many texts you can process at once will come down to your system's amount of random access memory (RAM). Several thousands of texts should be fine, but getting into millions of texts, you may not be able to have all of the results loaded at once. To get around this, you can fully process subsets of your texts.

A benefit of processing more texts at once is that requests can be parallelized, but this is more RAM intensive, and the primary cache is updated less frequently (as it is updated only at the end of a complete run).

You could also parallelize your own batches, but be sure to set cores to 1 (to disable the function's parallelization) and do not enable the primary cache (to avoid attempting to read from the cache while it is being written to by another instance).

Not using the primary cache is also more efficient, but you may want to ensure you are not sending duplicate texts between calls. The function handles duplicate texts within calls (only ever sending unique texts), but depends on the cache to avoid sending duplicates between calls.